Card fraud costs the global financial system roughly $30 billion annually. The most effective defences share a common requirement: speed. A fraud model that takes 800ms to score is too slow to block a card-not-present transaction before authorization completes. Yet the features that predict fraud best—velocity patterns, network distances, behavioural anomalies—require aggregates over historical transaction streams, computed fresh for every new event.
Architecture Overview
The streaming fraud detection stack has four planes: the event plane (inbound transactions via Kafka), the feature plane (real-time feature computation and online store), the inference plane (model serving with sub-50ms SLO), and the feedback plane (confirmed fraud labels flowing back for model retraining).
The dominant latency contributors are feature store reads and model inference. Feature store latency is primarily a function of the number of aggregations required (each requiring a Redis HGETALL) and serialisation overhead. Model inference time is a function of model complexity—a gradient-boosted tree with 200 estimators at depth 6 typically takes 8–12ms; a neural network with embedding layers can take 25–40ms.
Velocity Features
The most predictive real-time features are velocity counts and aggregations over rolling time windows. These answer questions like "how many transactions has this card made in the last 15 minutes?" or "what is the ratio of this transaction amount to the 7-day average spend?"
from pyflink.datastream import StreamExecutionEnvironment from pyflink.datastream.window import SlidingEventTimeWindows from pyflink.common.time import Time def compute_velocity_features(txn_stream): """Compute sliding-window velocity features per card.""" velocity_15m = ( txn_stream .key_by(lambda t: t['pan_token']) .window(SlidingEventTimeWindows.of( Time.minutes(15), Time.minutes(1) )) .aggregate(VelocityAggregator( count=True, sum_amount=True, distinct_merchants=True )) ) velocity_1h = ( txn_stream .key_by(lambda t: t['pan_token']) .window(SlidingEventTimeWindows.of( Time.hours(1), Time.minutes(5) )) .aggregate(VelocityAggregator( count=True, sum_amount=True, distinct_countries=True )) ) # Write to online feature store (Redis) velocity_15m.add_sink(FeatureStoreSink(prefix='vel_15m', ttl_seconds=900)) velocity_1h.add_sink(FeatureStoreSink(prefix='vel_1h', ttl_seconds=3600)) return velocity_15m, velocity_1h
Feature Engineering for Fraud
| Feature Category | Examples | Window | Typical IV |
|---|---|---|---|
| Card Velocity | txn_count_15m, amount_sum_1h, distinct_mcc_24h | 15m – 24h | 0.38 – 0.52 |
| Amount Anomaly | amount_vs_7d_avg, amount_percentile_30d | 7d – 30d | 0.28 – 0.41 |
| Geographic | geo_distance_km (last txn), country_change_flag | Last event | 0.31 – 0.44 |
| Merchant | merchant_fraud_rate_7d, new_merchant_flag | 7d | 0.24 – 0.36 |
| Behavioural | hour_of_day_anomaly, weekend_flag, channel_change | Historical baseline | 0.14 – 0.22 |
"Velocity at 15 minutes is your most powerful feature. A fraudster who steals a card will typically test it with a small purchase, then quickly scale up. That acceleration shows up immediately in the velocity count."
Precision–Recall Trade-off
Fraud detection is a rare-event classification problem with asymmetric costs. A false positive (blocking a legitimate transaction) costs €2–4 in customer service, goodwill loss, and chargeback handling. A false negative (missing a fraud) costs the full transaction amount plus chargeback penalties. The operating threshold must balance these costs, typically targeting a false positive rate of 0.3–0.8% (i.e., 3–8 blocked legitimate transactions per 1,000) while catching 80–90% of fraud.
Champion / Challenger Deployment
Rather than a hard model swap, production fraud systems use traffic splitting: the champion model handles 90% of traffic and makes decisions; the challenger handles 10% in shadow mode, scoring transactions but not blocking them. After 4 weeks of shadow scoring, the challenger's decisions are compared to confirmed fraud labels. If the challenger outperforms on F-score at the target operating point, it is promoted to champion.
Infrastructure Requirements
A production real-time fraud system at moderate scale (5,000 TPS peak) requires:
| Component | Technology | Sizing (5k TPS) | Key SLO |
|---|---|---|---|
| Event Bus | Apache Kafka | 6-broker cluster, RF=3 | Produce p99 < 5ms |
| Stream Processing | Apache Flink | 12 TaskManagers, 4 slots | Processing lag < 200ms |
| Online Feature Store | Redis Cluster | 6 shards, 64GB each | GET p99 < 2ms |
| Model Serving | Triton / custom FastAPI | 4 GPU instances (A10) | Inference p99 < 15ms |
| Offline Store | Delta Lake / Spark | 30-node Spark cluster | Daily training < 4h |