System
Funds Authorization.
Cloud funds authorization at a banking core ISV.
A two-tier authorization service rebuilt to meet cloud throughput and latency targets at orders-of-magnitude lower infrastructure cost, without changing the client-facing API.
Problem
The cloud version of the service was prohibitively expensive to scale.
A retail-banking ISV was building the cloud-native version of its funds authorization service. The existing on-prem implementation followed the standard two-tier pattern: a compute layer hosting the authorization business logic, a database layer storing the customer and balance data needed by that logic, the two separated by a network.
In a cloud deployment, the cost of scaling this architecture to the required throughput and latency was prohibitive, the kind of cost that determines whether the cloud version of the product is commercially viable at all. The ISV engaged Neeve to benchmark an alternative built on Rumi.
Architectural Constraint
You cannot fetch your way out of this.
The bottleneck is the network between compute and data. Each authorization request requires several data fetches, each synchronous, each variable in size, each on a millisecond-or-better budget. The result is a low Transactions-Per-CPU-Core ceiling, and the only way to scale around it is to add cores to both tiers, horizontally.
Faster databases, faster servers, faster networks each move the cost in the right direction, but none of them change the slope of the curve. To meet the cloud cost targets, the data fetch time needed to drop by multiple orders of magnitude.
Rumi solution
Compute and state in the same memory space.
The new implementation runs the Account Service as a hyperconverged Rumi node. The compute that processes the authorization request and the customer data it needs sit in the same memory space. The data fetch is effectively free.
Existing authorization business logic was ported into Rumi and unit-tested independently of the platform. The API exposed to clients was unchanged, so client systems migrated without code changes. Reliability, including primary/backup consensus and zero-loss replay across failures, comes from the platform rather than the application.
Operational Outcomes
Two orders of magnitude on cost; one on latency.
- 3 weeksinitial port, sufficient for performance testing
- 25 → 1,364Transactions per CPU Core (55×)
- >300 ms → 5 msend-to-end latency (60×)
- $1M → $9kannual AWS cost at 60,000 req/sec (110×)
- 300 → 4servers (75×)
- Zeroloss recovery across network, process, machine, DC failures
Of the 5 ms end-to-end latency, 98–99% is attributable to Kafka in the request path. The Rumi-resident processing itself is sub-millisecond.