← All articles

Card fraud costs the global financial system roughly $30 billion annually. The most effective defences share a common requirement: speed. A fraud model that takes 800ms to score is too slow to block a card-not-present transaction before authorization completes. Yet the features that predict fraud best—velocity patterns, network distances, behavioural anomalies—require aggregates over historical transaction streams, computed fresh for every new event.

Architecture Overview

The streaming fraud detection stack has four planes: the event plane (inbound transactions via Kafka), the feature plane (real-time feature computation and online store), the inference plane (model serving with sub-50ms SLO), and the feedback plane (confirmed fraud labels flowing back for model retraining).

End-to-End Pipeline Latency by Stage
Median latency at each stage of the inference path. p50 and p99 measured at 8,000 TPS sustained load. Target SLO: p99 < 95ms total.

The dominant latency contributors are feature store reads and model inference. Feature store latency is primarily a function of the number of aggregations required (each requiring a Redis HGETALL) and serialisation overhead. Model inference time is a function of model complexity—a gradient-boosted tree with 200 estimators at depth 6 typically takes 8–12ms; a neural network with embedding layers can take 25–40ms.

Velocity Features

The most predictive real-time features are velocity counts and aggregations over rolling time windows. These answer questions like "how many transactions has this card made in the last 15 minutes?" or "what is the ratio of this transaction amount to the 7-day average spend?"

Python / Flink — velocity feature computation
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.datastream.window import SlidingEventTimeWindows
from pyflink.common.time import Time

def compute_velocity_features(txn_stream):
    """Compute sliding-window velocity features per card."""

    velocity_15m = (
        txn_stream
        .key_by(lambda t: t['pan_token'])
        .window(SlidingEventTimeWindows.of(
            Time.minutes(15), Time.minutes(1)
        ))
        .aggregate(VelocityAggregator(
            count=True, sum_amount=True, distinct_merchants=True
        ))
    )

    velocity_1h = (
        txn_stream
        .key_by(lambda t: t['pan_token'])
        .window(SlidingEventTimeWindows.of(
            Time.hours(1), Time.minutes(5)
        ))
        .aggregate(VelocityAggregator(
            count=True, sum_amount=True, distinct_countries=True
        ))
    )

    # Write to online feature store (Redis)
    velocity_15m.add_sink(FeatureStoreSink(prefix='vel_15m', ttl_seconds=900))
    velocity_1h.add_sink(FeatureStoreSink(prefix='vel_1h', ttl_seconds=3600))

    return velocity_15m, velocity_1h

Feature Engineering for Fraud

Feature CategoryExamplesWindowTypical IV
Card Velocitytxn_count_15m, amount_sum_1h, distinct_mcc_24h15m – 24h0.38 – 0.52
Amount Anomalyamount_vs_7d_avg, amount_percentile_30d7d – 30d0.28 – 0.41
Geographicgeo_distance_km (last txn), country_change_flagLast event0.31 – 0.44
Merchantmerchant_fraud_rate_7d, new_merchant_flag7d0.24 – 0.36
Behaviouralhour_of_day_anomaly, weekend_flag, channel_changeHistorical baseline0.14 – 0.22

"Velocity at 15 minutes is your most powerful feature. A fraudster who steals a card will typically test it with a small purchase, then quickly scale up. That acceleration shows up immediately in the velocity count."

Precision–Recall Trade-off

Fraud detection is a rare-event classification problem with asymmetric costs. A false positive (blocking a legitimate transaction) costs €2–4 in customer service, goodwill loss, and chargeback handling. A false negative (missing a fraud) costs the full transaction amount plus chargeback penalties. The operating threshold must balance these costs, typically targeting a false positive rate of 0.3–0.8% (i.e., 3–8 blocked legitimate transactions per 1,000) while catching 80–90% of fraud.

Precision–Recall Curve — Champion vs Challenger Model
Champion: GBM trained on 18 months of data. Challenger: same architecture retrained on 24 months. Out-of-time evaluation window: July–Sep 2023. Fraud prevalence: 0.41%.

Champion / Challenger Deployment

Rather than a hard model swap, production fraud systems use traffic splitting: the champion model handles 90% of traffic and makes decisions; the challenger handles 10% in shadow mode, scoring transactions but not blocking them. After 4 weeks of shadow scoring, the challenger's decisions are compared to confirmed fraud labels. If the challenger outperforms on F-score at the target operating point, it is promoted to champion.

Fraud Rate Trend and Model Deployment Events (Monthly)
Card fraud rate (% of transaction value) with key model deployment events. Step-downs indicate new model versions improving detection. Q4 2021 – Q3 2023.

Infrastructure Requirements

A production real-time fraud system at moderate scale (5,000 TPS peak) requires:

ComponentTechnologySizing (5k TPS)Key SLO
Event BusApache Kafka6-broker cluster, RF=3Produce p99 < 5ms
Stream ProcessingApache Flink12 TaskManagers, 4 slotsProcessing lag < 200ms
Online Feature StoreRedis Cluster6 shards, 64GB eachGET p99 < 2ms
Model ServingTriton / custom FastAPI4 GPU instances (A10)Inference p99 < 15ms
Offline StoreDelta Lake / Spark30-node Spark clusterDaily training < 4h