GitHub - Bulletdev/bulletonrails-ruby: rinha de backend 2026 🐓

>            ██████╗ ██╗   ██╗██╗     ██╗     ██████╗████████╗ 
>            ██╔══██╗██║   ██║██║     ██║     ██╔═══╝╚══██╔══╝
>            ██████╔╝██║   ██║██║     ██║     █████╗    ██║  
>            ██╔══██╗██║   ██║██║     ██║     ██╔══╝    ██║  
>            ██████╔╝╚██████╔╝██████╗ ██████╗ ██████╗   ██║  
>            ╚═════╝  ╚═════╝ ╚═════╝ ╚═════╝ ╚═════╝   ╚═╝  
              on Rails - Rinha de Backend 2026

╔═════════════════════════════════════════════════════════════════╗
║  bulletonrails-ruby - Rinha de Backend 2026                     ║
╠═════════════════════════════════════════════════════════════════╣
║  Fraud detection API using IVF approximate KNN - Ruby/Roda      ║
║  Roda + Iodine + Numo + FAISS · 1 CPU / 350 MB · 2 instances    ║
╚═════════════════════════════════════════════════════════════════╝

┌─────────────────────────────────────────────────────────────────┐
│  01 · What is this                                              |
│  02 · How it works                                              │
│  03 · Tech stack                                                │
│  04 · Architecture                                              │
│  05 · Quick start                                               │
│  06 · Validation                                                │
│  07 · Benchmark results                                         │
│  08 · Submission                                                │
└─────────────────────────────────────────────────────────────────┘

01 · What is this

Submission for Rinha de Backend 2026 - a competition to build a fraud detection API under extreme resource constraints.

The challenge: build an API that receives a card transaction and decides - in real time - whether it is fraudulent, using vector similarity search against 100k labeled reference transactions.

This submission proves that Ruby can compete in performance-sensitive scenarios when you pick the right tools. No Rails. No ActiveRecord.

No framework overhead. Just Roda, Iodine, Numo::NArray for vectorized math, and FAISS IVF approximate nearest neighbor search.

02 · How it works

POST /fraud-score
        │
        ▼
  VectorNormalizer
  ─────────────────
  Transform 14 fields from the payload into a normalized
  Numo::SFloat vector using formulas from the spec.
  Sentinel -1 for absent last_transaction.
        │
        ▼
  KnnSearcher (IVF)
  ─────────────────
  Approximate KNN via FAISS IndexIVFFlat (nlist=64, nprobe=16).
  Sequential cluster scan → SIMD-friendly, 0 false negatives.
  index.freeze → no_gvl path → GVL released during C search.
        │
        ▼
  FraudScorer
  ─────────────────
  fraud_score = fraud_neighbors / 5
  approved    = fraud_score < 0.6
        │
        ▼
  { "approved": bool, "fraud_score": float }

The 14 dimensions

idx	field	formula
0	`amount`	`clamp(amount / 10_000)`
1	`installments`	`clamp(installments / 12)`
2	`amount_vs_avg`	`clamp((amount / avg_amount) / 10)`
3	`hour_of_day`	`utc_hour / 23`
4	`day_of_week`	`(wday + 6) % 7 / 6` - Mon=0, Sun=6
5	`minutes_since_last_tx`	`clamp(minutes / 1440)` or `-1` if null
6	`km_from_last_tx`	`clamp(km / 1000)` or `-1` if null
7	`km_from_home`	`clamp(km_from_home / 1000)`
8	`tx_count_24h`	`clamp(tx_count / 20)`
9	`is_online`	`1.0` or `0.0`
10	`card_present`	`1.0` or `0.0`
11	`unknown_merchant`	`0.0` if known, `1.0` if not
12	`mcc_risk`	lookup from `mcc_risk.json` (default 0.5)
13	`merchant_avg_amount`	`clamp(merchant_avg / 10_000)`

The reference dataset (100k vectors) is loaded once at startup as a Numo::SFloat[100_000, 14] matrix and an IVF index is built from it. Both live in C heap - never touched by the Ruby GC on the hot path.

03 · Tech stack

╔══════════════════════╦════════════════════════════════════════════════════╗
║  LAYER               ║  CHOICE                                            ║
╠══════════════════════╬════════════════════════════════════════════════════╣
║  Language            ║  Ruby 3.4                                          ║
║  HTTP framework      ║  Roda 3.103 - tree routing, ~0.1ms overhead        ║
║  HTTP server         ║  Iodine 0.7.58 - facil.io epoll, 1w x 4t           ║
║  KNN search          ║  FAISS 0.6.0 - IVF nlist=64 nprobe=16, no_gvl      ║
║  Numeric core        ║  numo-narray-alt 0.10 - C++-compat fork, Float32   ║
║  JSON                ║  Oj 3.17 - 3-5x faster than stdlib JSON            ║
║  Load balancer       ║  haproxy 2.9-alpine - HTTP mode, httpchk /ready    ║
║                      ║                                                    ║
║   Container          ║        Docker Compose - bridge network             ║
║                      ║        linux/amd64 + linux/arm64                   ║
╚══════════════════════╩════════════════════════════════════════════════════╝

Why not Rails? Rails consumes 150-200 MB per instance. With 2 instances + haproxy, that exceeds the 350 MB total budget. Roda runs in ~60 MB per instance and adds zero overhead for 2 static endpoints.

Why Iodine instead of Puma? Iodine is built on facil.io, a C event loop using epoll. It handles accept/read/write asynchronously, overlapping I/O with computation. At 650 req/s, this reduces per-request overhead vs Puma's threaded model: p99 dropped from 5.87ms to 4.62ms.

Why FAISS IVF instead of HNSW? HNSW has a structural floor: graph traversal with random memory access costs ~2.5ms (~17 node visits, constant cache misses). IVF (Inverted File Index) replaces graph traversal with sequential cluster scan: quantize the query to the nearest centroid, then scan only 1/nlist of the vectors in sequential memory - SIMD-friendly and cache-coherent. Result: single-query p99 drops from ~2.5ms to ~341 µs (7x) with FP=0 FN=0 at nlist=64 nprobe=16. The FAISS gem releases the GVL when the index is frozen (Rice no_gvl path), enabling true thread parallelism identical to the previous hnswlib behavior.

04 · Architecture

                          :9999
  k6 / test engine ──── haproxy (LB)
                           │   round-robin
              ┌────────────┴────────────┐
              ▼                         ▼
         [ api 1 ]                  [ api 2 ]
       Roda + Iodine             Roda + Iodine
      1 worker, 4 threads       1 worker, 4 threads
              │                         │
     Numo::SFloat[100k,14]    Numo::SFloat[100k,14]
     IVF index (C heap)       IVF index (C heap)
     LABELS[100k]             LABELS[100k]
     (async background load) (async background load)

Resource allocation

╔══════════════╦══════════╦════════════╗
║  Service     ║  CPUs    ║  Memory    ║
╠══════════════╬══════════╬════════════╣
║  haproxy     ║  0.10    ║  20 MB     ║
║  api1        ║  0.45    ║  160 MB    ║
║  api2        ║  0.45    ║  160 MB    ║
╠══════════════╬══════════╬════════════╣
║  TOTAL       ║  1.00    ║  340 MB    ║
╚══════════════╩══════════╩════════════╝

Limit: 1 CPU / 350 MB. Used: 1.00 CPU / 340 MB.

05 · Quick start

▶ see details (click to expand)

# Clone
git clone https://github.com/bulletdev/bulletonrails-ruby
cd bulletonrails-ruby

# Build and run (IVF index trains at startup, ~20-30s)
docker compose up --build -d

# Wait for ready (HAProxy only routes after /ready returns 200)
until curl -sf http://localhost:9999/ready; do sleep 3; done && echo "ready"

# Test a legitimate transaction
curl -s -X POST http://localhost:9999/fraud-score \
  -H 'Content-Type: application/json' \
  -d '{
    "id": "tx-1329056812",
    "transaction":      { "amount": 41.12, "installments": 2, "requested_at": "2026-03-11T18:45:53Z" },
    "customer":         { "avg_amount": 82.24, "tx_count_24h": 3, "known_merchants": ["MERC-003", "MERC-016"] },
    "merchant":         { "id": "MERC-016", "mcc": "5411", "avg_amount": 60.25 },
    "terminal":         { "is_online": false, "card_present": true, "km_from_home": 29.23 },
    "last_transaction": null
  }'
# Expected: {"approved":true,"fraud_score":0.0}

# Test a fraudulent transaction
curl -s -X POST http://localhost:9999/fraud-score \
  -H 'Content-Type: application/json' \
  -d '{
    "id": "tx-3330991687",
    "transaction":      { "amount": 9505.97, "installments": 10, "requested_at": "2026-03-14T05:15:12Z" },
    "customer":         { "avg_amount": 81.28, "tx_count_24h": 20, "known_merchants": ["MERC-008", "MERC-007", "MERC-005"] },
    "merchant":         { "id": "MERC-068", "mcc": "7802", "avg_amount": 54.86 },
    "terminal":         { "is_online": false, "card_present": true, "km_from_home": 952.27 },
    "last_transaction": null
  }'
# Expected: {"approved":false,"fraud_score":1.0}

06 · Validation

Validate vectors against the spec via the live endpoint (the container runs at ~132 MB; spawning a second Ruby process with docker compose exec would exceed the 160 MB limit):

▶ see details (click to expand)

# legit - expected: {"approved":true,"fraud_score":0.0}
curl -s -X POST http://localhost:9999/fraud-score \
  -H 'Content-Type: application/json' \
  -d '{
    "id": "tx-1329056812",
    "transaction":      { "amount": 41.12, "installments": 2, "requested_at": "2026-03-11T18:45:53Z" },
    "customer":         { "avg_amount": 82.24, "tx_count_24h": 3, "known_merchants": ["MERC-003", "MERC-016"] },
    "merchant":         { "id": "MERC-016", "mcc": "5411", "avg_amount": 60.25 },
    "terminal":         { "is_online": false, "card_present": true, "km_from_home": 29.23 },
    "last_transaction": null
  }'

# fraud - expected: {"approved":false,"fraud_score":1.0}
curl -s -X POST http://localhost:9999/fraud-score \
  -H 'Content-Type: application/json' \
  -d '{
    "id": "tx-3330991687",
    "transaction":      { "amount": 9505.97, "installments": 10, "requested_at": "2026-03-14T05:15:12Z" },
    "customer":         { "avg_amount": 81.28, "tx_count_24h": 20, "known_merchants": ["MERC-008", "MERC-007", "MERC-005"] },
    "merchant":         { "id": "MERC-068", "mcc": "7802", "avg_amount": 54.86 },
    "terminal":         { "is_online": false, "card_present": true, "km_from_home": 952.27 },
    "last_transaction": null
  }'

Expected output:

{"approved":true,"fraud_score":0.0}
{"approved":false,"fraud_score":1.0}

The scripts/validate.rb script exists for offline use (e.g. in a container with extra memory headroom). Expected output when run outside resource constraints:

Dataset loaded: 100000 vectors

--- legit tx-1329056812 ---
  vector:  OK [0.0041, 0.1667, 0.05, 0.7826, 0.3333, -1.0, -1.0, 0.0292, 0.15, 0.0, 1.0, 0.0, 0.15, 0.006]
  result:  OK approved=true, fraud_score=0.0

--- fraud tx-3330991687 ---
  vector:  OK [0.9506, 0.8333, 1.0, 0.2174, 0.8333, -1.0, -1.0, 0.9523, 1.0, 0.0, 1.0, 1.0, 0.75, 0.0055]
  result:  OK approved=false, fraud_score=1.0

========================================
ALL VALIDATIONS PASSED

---

07 · Benchmark results

Score formula: final = score_p99 + score_det

score_p99 = max(-3000, min(3000, 1000 * log10(1000ms / p99)))

╔═════════════════════════════════════════════════════════════════════════════╗
║  EVOLUTION                                                                  ║
╠══════════╦═══════════╦══════════════╦═══════════════╦═══════════════════════╣
║  Run     ║  Server   ║  p99         ║  p99_score    ║  final_score          ║
╠══════════╬═══════════╬══════════════╬═══════════════╬═══════════════════════╣
║  1       ║  Puma     ║  OOM kill    ║  -            ║  -705.91              ║
║  2       ║  Puma     ║  14600ms     ║  -3000 (cut)  ║  -1327.89             ║
║  3       ║  Puma     ║  12704ms     ║  -3000 (cut)  ║  -1335.93             ║
║  4 HNSW  ║  Puma     ║  5.87ms      ║  +2231.50     ║  +4977.97             ║
║  5 HNSW  ║  Iodine   ║  4.62ms      ║  +2335.67     ║  +5082.14             ║
║  6 ef200 ║  Iodine   ║  5.43ms      ║  +2264.83     ║  +5264.83             ║
║  7 stable║  Iodine   ║  5.54ms      ║  +2256.75     ║  +5256.75             ║
║  8 gc    ║  Iodine   ║  6.34ms      ║  +2198.08     ║  +5198.08             ║
║  9 yjit  ║  Iodine   ║  ~3.7ms      ║  +2427        ║  +5427                ║
║ 10 resp  ║  Iodine   ║  ~3.7ms      ║  ~2427        ║  ~5427                ║
║ 11 faiss ║  Iodine   ║  ~1.5ms est  ║  ~2825 est    ║  ~5825 est (current)  ║
╚══════════╩═══════════╩══════════════╩═══════════════╩═══════════════════════╝

Best benchmark - Run 11 (FAISS IVF nlist=64 nprobe=16 — spike bench, waiting oficial run)

╔═══════════════════════════════════════════════════════╗
║  IVF search p99 (spike)  341 µs  (HNSW: ~2500 µs)     ║
║  p99 total (estimate)    ~1.5 ms                      ║
║  p99_score (estimate)    ~2825   (max 3000)           ║
║  detection_score         3000.00  (max 3000)  PERFECT ║
║  final_score (estimate)  ~5825  /  6000 max  (97.1%)  ║
╠═══════════════════════════════════════════════════════╣
║  IVF nlist=64 nprobe=16 vs exact search:              ║
║  false_positives   0                                  ║
║  false_negatives   0                                  ║
║  GVL released      yes (index.freeze → no_gvl)        ║
║  RSS (spike proc)  ~100 MB / 160 MB limit             ║
╚═══════════════════════════════════════════════════════╝

Run 9 (last official run):
╔═══════════════════════════════════════════════════════╗
║  p99 (best run)        3.27 ms                        ║
║  p99 (median 10 runs)  3.78 ms                        ║
║  p99_score (median)    ~2427  (max 3000)              ║
║  detection_score       3000.00  (max 3000)  PERFECT   ║
║  final_score (best)    5485.57  /  6000 max  (91.4%)  ║
║  final_score (median)  ~5427   /  6000 max  (90.5%)   ║
╚═══════════════════════════════════════════════════════╝

Run 8 changes - GC compaction

config.ru: added GC.compact after DatasetLoader.load!; compacts the Ruby heap after the 100k-record JSON parse before Iodine starts threads; eliminates GC pressure spike that caused p99=20ms regression under peak load on api2
Serving RSS: api1=134MB, api2=136MB, both well within 160MB limit

Run 9 changes - YJIT + hot-path allocation reduction

╔═══════════════════════════════════════════════════════╗
║  p99 improvement       ~40% vs baseline               ║
║  score improvement     +229 pts (median) vs Run 8     ║
║  detection             still perfect (0 FP, 0 FN)     ║
╚═══════════════════════════════════════════════════════╝

Dockerfile: --yjit --yjit-exec-mem-size=8 - YJIT enabled with 8 MB code cache; default 48 MB was competing with GC for the 26 MB headroom under the 160 MB limit, causing GC pressure spikes. 8 MB is enough to JIT the hot paths (VectorNormalizer, Roda routing, FraudScorer) without bloating RSS.
Dockerfile: ENV MALLOC_ARENA_MAX=2 - limits glibc malloc arenas, reduces allocator fragmentation under multi-threaded load.
DatasetLoader: labels stored as integers (1 = fraud, 0 = legit) instead of strings; eliminates per-request string comparison and block overhead in FraudScorer.
VectorNormalizer: NORM['...'] hash lookups extracted to frozen Float constants (MAX_AMOUNT, MAX_KM, etc.); eliminates 9 Hash#[] calls per request on the hot path.

Run 10 changes - pre-mounted HTTP responses

FraudScorer: RESPONSES array built at startup with K+1 pre-serialized JSON strings; since fraud_score = fraud_count / K has exactly K+1 possible values, every response is known at boot time. Eliminates Hash allocation + Oj serialization on every request. Built dynamically from K and THRESHOLD - safe if the spec changes values. Gain is below p99 noise floor on this setup (~sub-µs per request); included for architectural correctness (same pattern used by the top C implementation).

Run 11 changes - FAISS IVF replaces hnswlib HNSW

KnnSearcher: hnswlib HNSW (ef=200, m=16, ~2.5ms floor) replaced with FAISS IndexIVFFlat (nlist=64, nprobe=16). IVF clusters the 100k vectors into 64 groups; each query scans the 16 closest clusters sequentially (~25k vectors). Sequential memory access is SIMD-friendly and avoids the random cache-miss pattern of graph traversal.
DatasetLoader: quantizer stored as @quantizer ivar - IndexIVFFlat holds a non-owning C pointer to the quantizer; without the ivar, Ruby GC would collect it and cause a segfault. Calling index.freeze enables Rice's no_gvl path, releasing the GVL during search (same behavior as hnswlib).
Gemfile: hnswlib + numo-narray → faiss + numo-narray-alt (C++-compatible fork required by Rice/FAISS binding; same Numo::SFloat API, no changes to VectorNormalizer).
Dockerfile: builder adds libblas-dev liblapack-dev cmake libgomp1; runtime adds libblas3 liblapack3 libgomp1. System libfaiss-dev is NOT installed - its headers conflict with the gem's bundled FAISS source (vendor/faiss/); gem compiles from bundled source instead.
Spike results: nlist=64 nprobe=16 gives FP=0 FN=0 vs exact IndexFlatL2 search across all 100k training vectors. Single-query p99: 341 µs.

Optimization path

Brute-force Numo KNN      →  p99 12-14s, score  -1335
+ BLAS identity trick     →  alloc 11MB → 800KB per request
+ HNSW O(log N) ef=50     →  p99  5.87ms, score +4977  (breakthrough)
+ Iodine epoll 4t         →  p99  4.62ms, score +5082  (+104 pts)
+ HNSW ef=200             →  detect 3000/3000, score +5264  (+182 pts)
+ alloc reduction R7      →  Sakamoto DOW, NIL_DIMS const, removed dead @norms_sq
+ GC.compact R8           →  api2 RSS -19MB, eliminates GC spike under load
+ YJIT exec-mem=8 R9      →  p99 ~3.7ms stable, score ~5427 avg (+229 pts)
+ pre-mounted responses R10→  eliminates Hash + Oj per request; gain within noise floor
+ FAISS IVF nlist=64 R11  →  IVF p99 341µs (7x vs HNSW), est final ~5825  (+398 pts)

The dominant gain came from HNSW: O(N)=100k comparisons → O(log N)≈17 node visits.

Raising ef from 50 to 200 pushed detection accuracy to perfect (0 FP, 0 FN).

Run 7 reduces per-request GVL hold time via Sakamoto DOW (no Time allocation on null last_tx path). Run 8 adds GC.compact after dataset load.

Run 9 enables YJIT with a constrained 8 MB code cache - the default 48 MB caused GC pressure by eating into the 26 MB headroom between serving RSS (~134 MB) and the container limit (160 MB). With exec-mem=8, YJIT JITs only the hot paths and stabilizes at p99 ~3.7ms across 9/10 benchmark runs.

Run 11 breaks through the HNSW structural floor by replacing graph traversal with sequential IVF cluster scan:

341 µs IVF search p99 vs ~2500 µs HNSW (7x improvement).

08 · Submission

╔════════════════════════════════════════════════════════════╗
║  GitHub user:      bulletdev                               ║
║  Repo:             bulletonrails-ruby                      ║
║  Submission ID:    bulletdev-ruby                          ║
║  branch main:      source code                             ║
║  branch submission: docker-compose.yml at root             ║
╚════════════════════════════════════════════════════════════╝

To trigger the official test: open an issue with rinha/test in the description into official rinha repository.

▓▒░ · Ruby is fast enough · ░▒▓

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

01 · What is this

02 · How it works

The 14 dimensions

03 · Tech stack

04 · Architecture

Resource allocation

05 · Quick start

06 · Validation

07 · Benchmark results

08 · Submission

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
app		app
config		config
haproxy		haproxy
nginx		nginx
resources		resources
scripts		scripts
.gitignore		.gitignore
.rubocop.yml		.rubocop.yml
Dockerfile		Dockerfile
Gemfile		Gemfile
README.md		README.md
config.ru		config.ru
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

01 · What is this

02 · How it works

The 14 dimensions

03 · Tech stack

04 · Architecture

Resource allocation

05 · Quick start

06 · Validation

07 · Benchmark results

08 · Submission

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages