Skip to main content
← Wednesday's Workflows

Fraud Detection System Architecture 🏗️

Real-time ML agents monitoring 1K to 1M+ transactions/day with PCI-DSS compliance

September 11, 2025
18 min read
🏦 Banking🏗️ Architecture🤖 ML Agents🔒 PCI-DSS📊 Real-Time
🎯This Week's Journey

From transaction stream to fraud verdict in under 100ms.

Monday showed 3 prompts for fraud detection. Tuesday automated them with agents. Wednesday mapped team workflows. Today: complete production architecture. ML agents, real-time scoring, decision orchestration, and scaling from 1K to 1M+ transactions/day with full PCI-DSS compliance.

📋

Key Assumptions

1
Process 1K-1M transactions/day across cards, ACH, wire transfers
2
Real-time scoring required (<100ms p95 latency)
3
PCI-DSS Level 1 compliance for card data
4
SOC2 Type II audit requirements
5
Multi-region deployment for 99.99% availability
6
Historical data retention: 7 years for compliance

System Requirements

Functional

  • Real-time transaction ingestion from payment gateways
  • ML-based risk scoring with feature engineering
  • Multi-agent decision orchestration (planner, executor, evaluator, guardrail)
  • Rule engine for policy-based checks
  • Case management for fraud analyst review
  • Automated blocking/flagging with override capability
  • Customer notification and dispute handling

Non-Functional (SLOs)

latency p95 ms100
latency p99 ms250
freshness min0.016
availability percent99.99
false positive rate0.02
false negative rate0.001

💰 Cost Targets: {"per_transaction_usd":0.002,"ml_inference_per_1k":0.15,"storage_per_gb_month":0.023}

Agent Layer

planner

L3

Decomposes transaction analysis into subtasks, selects appropriate scoring models

🔧 feature_store.get_customer_features(), model_registry.select_model(transaction_type), rule_engine.get_applicable_rules()

⚡ Recovery: Fallback to rule-based scoring if ML unavailable, Use cached customer features if feature store down, Default to conservative risk tier on error

executor

L2

Runs risk scoring models, aggregates signals, computes final risk score

🔧 ml_inference.predict(model_id, features), feature_engineering.compute_velocity(), device_fingerprint.check_reputation(), geo_scoring.analyze_location()

⚡ Recovery: Retry with exponential backoff (max 3 attempts), Circuit breaker on inference service, Fallback to ensemble of simpler models

evaluator

L3

Validates risk score quality, checks for model drift, ensures explainability

🔧 drift_detector.check_distribution(), explainer.shap_values(), quality_metrics.compute_auc_roc(), bias_detector.check_fairness()

⚡ Recovery: Queue for offline evaluation if real-time fails, Use historical baseline if drift detector unavailable, Generate simplified explanation on SHAP timeout

guardrail

L4

Enforces PCI-DSS policies, checks compliance rules, prevents biased decisions

🔧 pci_validator.check_card_data(), aml_checker.screen_transaction(), bias_filter.check_protected_attributes(), policy_engine.evaluate_rules()

⚡ Recovery: Fail-safe: block transaction if guardrails fail, Human review required on policy uncertainty, Audit log all override attempts

risk_scorer

L2

Domain-specific fraud risk scoring with ensemble models

🔧 xgboost_model.predict(), neural_net.infer(), rule_based_scorer.evaluate(), ensemble_aggregator.combine()

⚡ Recovery: Use rule-based fallback if ML unavailable, Degrade to simpler model on timeout, Conservative scoring on feature unavailability

decision

L4

Makes final approve/decline/review decision with human-in-loop support

🔧 decision_tree.evaluate(), threshold_manager.get_dynamic_threshold(), escalation_router.check_criteria(), message_generator.create_customer_notification()

⚡ Recovery: Escalate to human analyst on low confidence, Default to decline on system uncertainty, Log all decisions for audit trail

ML Layer

Feature Store

Update: Real-time (streaming) + batch (daily aggregates)

  • customer_transaction_velocity_1h
  • customer_transaction_velocity_24h
  • merchant_fraud_rate_7d
  • device_reputation_score
  • geo_distance_from_home
  • account_age_days
  • avg_transaction_amount_30d
  • card_decline_rate_7d
  • time_since_last_transaction_min
  • cross_border_transaction_flag

Model Registry

Strategy: Semantic versioning (major.minor.patch), A/B testing for new versions

  • xgboost_fraud_v3
  • neural_net_fraud_v2
  • rule_based_v1

Observability Stack

Real-time monitoring, tracing & alerting

0 active
SOURCES
Apps, Services, Infra
COLLECTION
10 Metrics
PROCESSING
Aggregate & Transform
DASHBOARDS
4 Views
ALERTS
Enabled
📊Metrics(10)
📝Logs(Structured)
🔗Traces(Distributed)
transaction_processing_latency_p95_ms
fraud_detection_accuracy
false_positive_rate
false_negative_rate
model_inference_latency_ms
feature_store_cache_hit_rate

Deployment Variants

🚀

Startup Architecture

Fast to deploy, cost-efficient, scales to 100 competitors

Infrastructure

AWS Lambda (serverless)
API Gateway
DynamoDB (NoSQL)
SageMaker (managed ML)
S3 (storage)
CloudWatch (monitoring)
Fast to deploy (<1 week)
Low operational overhead
Pay-per-use pricing
Single region (us-east-1)
Handles 1K-10K transactions/day
Cost: ~$150/month

Risks & Mitigations

⚠️ Model drift due to evolving fraud patterns

High

✓ Mitigation: Continuous monitoring with PSI, automated retraining pipeline, A/B testing new models, weekly model performance review

⚠️ High false positive rate impacts customer experience

Medium

✓ Mitigation: Dynamic threshold tuning, customer feedback loop, white-listing for trusted customers, manual review queue for borderline cases

⚠️ PCI-DSS compliance violation (card data leak)

Low

✓ Mitigation: Tokenization, HSM for key storage, annual PCI audit, penetration testing, employee training, data access auditing

⚠️ Latency exceeds 100ms SLA

Medium

✓ Mitigation: Feature caching, model optimization (quantization), circuit breakers, auto-scaling, load testing, CDN for static assets

⚠️ Agent hallucination leads to false fraud accusation

Low

✓ Mitigation: Guardrail agent with fact-checking, human review for high-confidence declines, explainability with SHAP, audit trail

⚠️ Vendor lock-in (AWS/GCP)

Medium

✓ Mitigation: Multi-cloud strategy (Kubernetes for portability), open-source tools (Kafka, MLflow), vendor-agnostic APIs, annual contract review

⚠️ Data privacy violation (GDPR/CCPA)

Low

✓ Mitigation: Data minimization, anonymization, right-to-deletion workflow, data residency compliance, DPO oversight, regular audits

🧬

Evolution Roadmap

Progressive transformation from MVP to scale

🌱
Phase 1Weeks 1-12

Phase 1: MVP (0-3 months)

1
Launch serverless fraud detection (Lambda + SageMaker)
2
Handle 1K-10K transactions/day
3
Achieve 90%+ AUC-ROC
4
PCI-DSS Level 1 compliance
Complexity Level
🌿
Phase 2Months 3-6

Phase 2: Scale (3-6 months)

1
Scale to 100K transactions/day
2
Add Executor + Evaluator + Guardrail agents
3
Implement feature store (Tecton)
4
Achieve <100ms p95 latency
5
Reduce false positive rate to <2%
Complexity Level
🌳
Phase 3Months 6-12

Phase 3: Enterprise (6-12 months)

1
Scale to 1M+ transactions/day
2
Multi-region deployment (US/EU/APAC)
3
99.99% availability SLA
4
SOC 2 Type II certification
5
Advanced ML (agentic RAG, drift detection)
Complexity Level
🚀Production Ready
🏗️

Complete Systems Architecture

9-layer production architecture for fraud detection

1
🌐

Presentation

4 components

Fraud Analyst Dashboard
Customer Portal
Mobile App
Admin Console
2
⚙️

API Gateway

4 components

Load Balancer (ALB)
Rate Limiter (Kong/Apigee)
Auth Gateway (OAuth2/OIDC)
API Versioning
3
💾

Agent Layer

6 components

Planner Agent
Executor Agent
Evaluator Agent
Guardrail Agent
Risk Scorer Agent
Decision Agent
4
🔌

ML Layer

5 components

Feature Store (Tecton/Feast)
Model Registry (MLflow)
Inference Service (SageMaker/Vertex)
Evaluation Pipeline
Prompt Store
5
📊

Integration

5 components

Payment Gateway Adapter
Core Banking Connector
KYC/AML Service
Card Network APIs (Visa/MC)
Notification Service
6
🌐

Data

5 components

Transaction DB (Aurora PostgreSQL)
Feature Cache (Redis)
Case Management DB (MongoDB)
Data Lake (S3/Iceberg)
Vector DB (Pinecone)
7
⚙️

External

5 components

Payment Gateways (Stripe/Adyen)
Card Networks (Visa/Mastercard)
Bureau Data (Experian/Equifax)
Device Fingerprinting (Sift/Forter)
Email/SMS Providers
8
💾

Observability

5 components

Metrics (Prometheus/Datadog)
Logs (ELK/Splunk)
Traces (Jaeger/Tempo)
ML Monitoring (Arize/Fiddler)
Alerting (PagerDuty)
9
🔌

Security

5 components

HSM (Thales/Utimaco)
KMS (AWS/GCP)
Secrets Manager (Vault)
WAF (Cloudflare/Akamai)
SIEM (Splunk/QRadar)
🔄

Real-Time Transaction Flow

Automated data flow every hour

Step 0 of 9
Payment GatewayAPI GatewayPlanner AgentRisk ScorerDecision AgentGuardrail AgentCore BankingCustomerPOST /v1/transactions (card auth request)route(transaction_data)score_transaction(features)risk_score=0.87, features={velocity, geo, device}validate_decision(block_transaction)policy_check=PASS, compliance=OKblock_transaction(txn_id, reason)Transaction declined (fraud suspected)send_alert(customer, fraud_analyst)

Transaction Processing Flow

From payment gateway to decision in <100ms

1
Payment Gateway0 ms
Sends authorization requestTransaction JSON
2
API Gateway5 ms
Rate limit + auth checkValidated request
3
Planner Agent3 ms
Analyzes transaction type, selects modelsExecution plan
4
Feature Store2 ms
Fetches customer features (cached)Feature vector
5
Risk Scorer Agent45 ms
Runs ensemble modelsFraud probability
6
Executor Agent15 ms
Enriches with device/geo dataFinal risk score
7
Evaluator Agent20 ms
Validates score, generates SHAPQuality report
8
Guardrail Agent8 ms
PCI-DSS + AML checksCompliance flags
9
Decision Agent5 ms
Makes approve/decline decisionDecision + confidence
10
Core Banking10 ms
Processes decisionAuth response
11
Notification Serviceasync
Alerts customer/analyst (async)SMS/Email
1
Volume
1K-10K transactions/day
Pattern
Serverless Monolith
🏗️
Architecture
API Gateway (AWS API Gateway)
Lambda functions (Python)
SageMaker endpoint (XGBoost)
DynamoDB (transactions)
S3 (logs)
Cost & Performance
$150/month
per month
80-120 ms p95
2
Volume
10K-100K transactions/day
Pattern
Queue + Workers
🏗️
Architecture
ALB + ECS Fargate
SQS (transaction queue)
Worker containers (scoring)
Aurora PostgreSQL
Redis (feature cache)
S3 + Athena (analytics)
Cost & Performance
$800/month
per month
60-100 ms p95
3
Volume
100K-1M transactions/day
Pattern
Multi-Agent Orchestration
🏗️
Architecture
Kubernetes (EKS/GKE)
LangGraph agent framework
Kafka (event streaming)
Tecton (feature store)
MLflow (model registry)
Aurora Multi-AZ
ElastiCache Redis Cluster
Cost & Performance
$3,500/month
per month
40-80 ms p95
Recommended
4
Volume
1M+ transactions/day
Pattern
Enterprise Multi-Region
🏗️
Architecture
Global load balancer (Cloudflare)
Multi-region Kubernetes
Kafka Streams (real-time processing)
Tecton (distributed feature store)
SageMaker Multi-Model Endpoints
Aurora Global Database
Redis Enterprise (active-active)
Iceberg Data Lake
Cost & Performance
$15,000+/month
per month
20-60 ms p95

Key Integrations

Payment Gateways (Stripe/Adyen)

Protocol: REST API + Webhooks
Gateway sends auth request
System scores transaction
Return approve/decline
Gateway processes

Card Networks (Visa/Mastercard)

Protocol: ISO 8583 + Visa DPS API
Receive card auth request
Check card BIN database
Validate CVV/AVS
Return network response

Core Banking System

Protocol: SOAP/REST + MQ
Query account balance
Check transaction limits
Post authorization hold
Settle transaction

Device Fingerprinting (Sift/Forter)

Protocol: REST API + JavaScript SDK
Collect device signals (IP, browser, OS)
Send to fingerprinting service
Receive device reputation score
Incorporate into risk model

KYC/AML Service (ComplyAdvantage)

Protocol: REST API
Screen customer against sanctions lists
Check PEP (Politically Exposed Person) status
Monitor for adverse media
Return risk rating

Security & Compliance

Failure Modes & Fallbacks

FailureFallbackImpactSLA
ML inference service down→ Rule-based scoringDegraded accuracy (78% vs 94% AUC), increased false positives99.9%
Feature store unavailable→ Use cached features (stale up to 1 hour)Slightly outdated features, minimal accuracy impact99.95%
Payment gateway timeout→ Retry 3x with exponential backoff → Queue for async processingIncreased latency (up to 5 sec), eventual consistency99.5%
Guardrail agent fails→ Fail-safe: block transactionFalse positive (customer inconvenience), but no compliance risk100%
Database unavailable→ Read from replica → Degrade to read-only modeCannot write new decisions, read historical data only99.99%
High false positive rate→ Dynamic threshold adjustmentTemporarily lower precision, but maintain customer experienceN/A
System Architecture
┌──────────────┐
│   Planner    │ ← Orchestrates workflow
└──────┬───────┘
       │
   ┌───┴────┬──────────┬───────────┬──────────┐
   │        │          │           │          │
┌──▼───┐ ┌─▼────┐  ┌──▼──────┐ ┌─▼────┐  ┌──▼──────┐
│Risk  │ │Exec  │  │Evaluator│ │Guard │  │Decision │
│Scorer│ │Agent │  │ Agent   │ │rail  │  │ Agent   │
└──┬───┘ └─┬────┘  └──┬──────┘ └─┬────┘  └──┬──────┘
   │       │          │           │          │
   └───────┴──────────┴───────────┴──────────┘
                      │
                  ┌───▼────┐
                  │  Core  │
                  │Banking │
                  └────────┘

🔄Agent Collaboration Flow

1
Planner Agent
Receives transaction, analyzes type (card/ACH/wire), selects scoring models
2
Risk Scorer Agent
Computes fraud probability using ensemble models → Returns score + factors
3
Executor Agent
Aggregates risk signals, enriches with device/geo data → Returns final risk score
4
Evaluator Agent
Validates score quality, checks drift, generates SHAP explainability
5
Guardrail Agent
Enforces PCI-DSS, AML checks, bias filters → Flags policy violations
6
Decision Agent
Makes approve/decline/review decision → Routes to core banking or analyst
7
Planner Agent
Logs decision, triggers notifications, updates customer risk profile

🎭Agent Types

Reactive Agent

Low

Risk Scorer - Responds to features, returns score

Stateless

Reflexive Agent

Medium

Executor - Uses context + history for aggregation

Reads state

Deliberative Agent

High

Evaluator - Plans validation strategy based on model type

Stateful

Orchestrator Agent

Highest

Planner - Coordinates all agents, handles retries, manages state

Full state management

📈Levels of Autonomy

L1
Tool
Human calls, agent responds
Monday's prompts
L2
Chained Tools
Sequential execution
Risk Scorer → Executor
L3
Agent
Makes decisions, can loop
Evaluator with drift detection
L4
Multi-Agent
Agents collaborate autonomously
Full fraud detection system

RAG vs Fine-Tuning

Fraud patterns evolve rapidly. RAG allows daily updates with new fraud cases without retraining. Fine-tuning would require weekly retraining ($5K/week).
✅ RAG (Chosen)
Cost: $200/mo
Update: Daily
How: Add new fraud cases to vector DB
❌ Fine-Tuning
Cost: $20K/mo
Update: Weekly
How: Retrain XGBoost + neural net
Implementation: Vector DB (Pinecone) with historical fraud cases, merchant reputation data, device fingerprints. Retrieved during risk scoring for similar pattern matching.

Hallucination Detection

LLMs hallucinate fraud reasons (fake merchant names, false transaction patterns)
L1
Confidence scores (< 0.8 = flag for review)
L2
Cross-reference merchant database (verify merchant exists)
L3
Logical consistency (e.g., transaction amount matches merchant category)
L4
Human analyst review queue
0.1% hallucination rate, 100% caught before customer impact

Evaluation Framework

AUC-ROC
0.94target: 0.92+
Precision
0.89target: 0.85+
Recall
0.92target: 0.90+
False Positive Rate
1.8%target: <2%
Latency p95
85mstarget: <100ms
Testing: Shadow mode: 10K real transactions parallel with manual review. Champion/Challenger: A/B test new models on 10% traffic.

Dataset Curation

1
Collect: 500K transactions - Historical data + synthetic fraud cases
2
Label: 100K labeled - ($$50K)
3
Balance: 50/50 fraud/legit split - SMOTE oversampling + undersampling
4
Augment: +20K edge cases - Adversarial examples, rare merchant categories
120K high-quality examples, balanced classes, Cohen's Kappa: 0.88

Agentic RAG

Agent iteratively retrieves based on reasoning
Transaction from new merchant → RAG retrieves similar merchant fraud patterns → Agent reasons 'need customer transaction history' → RAG retrieves customer profile → Agent reasons 'need device reputation' → RAG retrieves device data → Final risk score with full context
💡 Not one-shot retrieval. Agent decides what else it needs to know, improving accuracy by 12% vs static retrieval.

Explainable AI (XAI)

Tech Stack Summary

LLMs
GPT-4 Turbo, Claude 3 Sonnet
ML Models
XGBoost, TensorFlow (neural nets), scikit-learn (rule-based)
Orchestration
LangGraph, Apache Airflow
Feature Store
Tecton, Feast
Model Registry
MLflow, SageMaker Model Registry
Database
Aurora PostgreSQL (transactions), DynamoDB (cases), MongoDB (unstructured)
Cache
Redis, ElastiCache
Streaming
Kafka, Kinesis
Compute
Kubernetes (EKS/GKE), Lambda
Monitoring
Datadog, Prometheus, Grafana
Logging
ELK Stack, Splunk
Tracing
Jaeger, AWS X-Ray
Security
Vault (secrets), HSM (keys), WAF (Cloudflare)
Data Lake
S3 + Iceberg, Snowflake
🏗️

Need a Fraud Detection System?

We'll design your production architecture, select the right ML models, and ensure PCI-DSS compliance. From 1K to 1M+ transactions/day.

©

2026 Randeep Bhatia. All Rights Reserved.

No part of this content may be reproduced, distributed, or transmitted in any form without prior written permission.