From transaction stream to fraud verdict in under 100ms.
Monday showed 3 prompts for fraud detection. Tuesday automated them with agents. Wednesday mapped team workflows. Today: complete production architecture. ML agents, real-time scoring, decision orchestration, and scaling from 1K to 1M+ transactions/day with full PCI-DSS compliance.
Key Assumptions
System Requirements
Functional
- Real-time transaction ingestion from payment gateways
- ML-based risk scoring with feature engineering
- Multi-agent decision orchestration (planner, executor, evaluator, guardrail)
- Rule engine for policy-based checks
- Case management for fraud analyst review
- Automated blocking/flagging with override capability
- Customer notification and dispute handling
Non-Functional (SLOs)
💰 Cost Targets: {"per_transaction_usd":0.002,"ml_inference_per_1k":0.15,"storage_per_gb_month":0.023}
Agent Layer
planner
L3Decomposes transaction analysis into subtasks, selects appropriate scoring models
🔧 feature_store.get_customer_features(), model_registry.select_model(transaction_type), rule_engine.get_applicable_rules()
⚡ Recovery: Fallback to rule-based scoring if ML unavailable, Use cached customer features if feature store down, Default to conservative risk tier on error
executor
L2Runs risk scoring models, aggregates signals, computes final risk score
🔧 ml_inference.predict(model_id, features), feature_engineering.compute_velocity(), device_fingerprint.check_reputation(), geo_scoring.analyze_location()
⚡ Recovery: Retry with exponential backoff (max 3 attempts), Circuit breaker on inference service, Fallback to ensemble of simpler models
evaluator
L3Validates risk score quality, checks for model drift, ensures explainability
🔧 drift_detector.check_distribution(), explainer.shap_values(), quality_metrics.compute_auc_roc(), bias_detector.check_fairness()
⚡ Recovery: Queue for offline evaluation if real-time fails, Use historical baseline if drift detector unavailable, Generate simplified explanation on SHAP timeout
guardrail
L4Enforces PCI-DSS policies, checks compliance rules, prevents biased decisions
🔧 pci_validator.check_card_data(), aml_checker.screen_transaction(), bias_filter.check_protected_attributes(), policy_engine.evaluate_rules()
⚡ Recovery: Fail-safe: block transaction if guardrails fail, Human review required on policy uncertainty, Audit log all override attempts
risk_scorer
L2Domain-specific fraud risk scoring with ensemble models
🔧 xgboost_model.predict(), neural_net.infer(), rule_based_scorer.evaluate(), ensemble_aggregator.combine()
⚡ Recovery: Use rule-based fallback if ML unavailable, Degrade to simpler model on timeout, Conservative scoring on feature unavailability
decision
L4Makes final approve/decline/review decision with human-in-loop support
🔧 decision_tree.evaluate(), threshold_manager.get_dynamic_threshold(), escalation_router.check_criteria(), message_generator.create_customer_notification()
⚡ Recovery: Escalate to human analyst on low confidence, Default to decline on system uncertainty, Log all decisions for audit trail
ML Layer
Feature Store
Update: Real-time (streaming) + batch (daily aggregates)
- • customer_transaction_velocity_1h
- • customer_transaction_velocity_24h
- • merchant_fraud_rate_7d
- • device_reputation_score
- • geo_distance_from_home
- • account_age_days
- • avg_transaction_amount_30d
- • card_decline_rate_7d
- • time_since_last_transaction_min
- • cross_border_transaction_flag
Model Registry
Strategy: Semantic versioning (major.minor.patch), A/B testing for new versions
- • xgboost_fraud_v3
- • neural_net_fraud_v2
- • rule_based_v1
Observability Stack
Real-time monitoring, tracing & alerting
0 activeDeployment Variants
Startup Architecture
Fast to deploy, cost-efficient, scales to 100 competitors
Infrastructure
Risks & Mitigations
⚠️ Model drift due to evolving fraud patterns
High✓ Mitigation: Continuous monitoring with PSI, automated retraining pipeline, A/B testing new models, weekly model performance review
⚠️ High false positive rate impacts customer experience
Medium✓ Mitigation: Dynamic threshold tuning, customer feedback loop, white-listing for trusted customers, manual review queue for borderline cases
⚠️ PCI-DSS compliance violation (card data leak)
Low✓ Mitigation: Tokenization, HSM for key storage, annual PCI audit, penetration testing, employee training, data access auditing
⚠️ Latency exceeds 100ms SLA
Medium✓ Mitigation: Feature caching, model optimization (quantization), circuit breakers, auto-scaling, load testing, CDN for static assets
⚠️ Agent hallucination leads to false fraud accusation
Low✓ Mitigation: Guardrail agent with fact-checking, human review for high-confidence declines, explainability with SHAP, audit trail
⚠️ Vendor lock-in (AWS/GCP)
Medium✓ Mitigation: Multi-cloud strategy (Kubernetes for portability), open-source tools (Kafka, MLflow), vendor-agnostic APIs, annual contract review
⚠️ Data privacy violation (GDPR/CCPA)
Low✓ Mitigation: Data minimization, anonymization, right-to-deletion workflow, data residency compliance, DPO oversight, regular audits
Evolution Roadmap
Progressive transformation from MVP to scale
Phase 1: MVP (0-3 months)
Phase 2: Scale (3-6 months)
Phase 3: Enterprise (6-12 months)
Complete Systems Architecture
9-layer production architecture for fraud detection
Presentation
4 components
API Gateway
4 components
Agent Layer
6 components
ML Layer
5 components
Integration
5 components
Data
5 components
External
5 components
Observability
5 components
Security
5 components
Real-Time Transaction Flow
Automated data flow every hour
Transaction Processing Flow
From payment gateway to decision in <100ms
Key Integrations
Payment Gateways (Stripe/Adyen)
Card Networks (Visa/Mastercard)
Core Banking System
Device Fingerprinting (Sift/Forter)
KYC/AML Service (ComplyAdvantage)
Security & Compliance
Failure Modes & Fallbacks
| Failure | Fallback | Impact | SLA |
|---|---|---|---|
| ML inference service down | → Rule-based scoring | Degraded accuracy (78% vs 94% AUC), increased false positives | 99.9% |
| Feature store unavailable | → Use cached features (stale up to 1 hour) | Slightly outdated features, minimal accuracy impact | 99.95% |
| Payment gateway timeout | → Retry 3x with exponential backoff → Queue for async processing | Increased latency (up to 5 sec), eventual consistency | 99.5% |
| Guardrail agent fails | → Fail-safe: block transaction | False positive (customer inconvenience), but no compliance risk | 100% |
| Database unavailable | → Read from replica → Degrade to read-only mode | Cannot write new decisions, read historical data only | 99.99% |
| High false positive rate | → Dynamic threshold adjustment | Temporarily lower precision, but maintain customer experience | N/A |
┌──────────────┐
│ Planner │ ← Orchestrates workflow
└──────┬───────┘
│
┌───┴────┬──────────┬───────────┬──────────┐
│ │ │ │ │
┌──▼───┐ ┌─▼────┐ ┌──▼──────┐ ┌─▼────┐ ┌──▼──────┐
│Risk │ │Exec │ │Evaluator│ │Guard │ │Decision │
│Scorer│ │Agent │ │ Agent │ │rail │ │ Agent │
└──┬───┘ └─┬────┘ └──┬──────┘ └─┬────┘ └──┬──────┘
│ │ │ │ │
└───────┴──────────┴───────────┴──────────┘
│
┌───▼────┐
│ Core │
│Banking │
└────────┘🔄Agent Collaboration Flow
🎭Agent Types
Reactive Agent
LowRisk Scorer - Responds to features, returns score
Reflexive Agent
MediumExecutor - Uses context + history for aggregation
Deliberative Agent
HighEvaluator - Plans validation strategy based on model type
Orchestrator Agent
HighestPlanner - Coordinates all agents, handles retries, manages state
📈Levels of Autonomy
RAG vs Fine-Tuning
Hallucination Detection
Evaluation Framework
Dataset Curation
Agentic RAG
Explainable AI (XAI)
Tech Stack Summary
2026 Randeep Bhatia. All Rights Reserved.
No part of this content may be reproduced, distributed, or transmitted in any form without prior written permission.