From prompts to production credit risk system.
Monday: 3 core prompts for credit assessment. Tuesday: automated code for data extraction and scoring. Wednesday: team workflows across risk, compliance, and engineering. Thursday: complete technical architecture with AI agents, ML pipelines, real-time scoring, and regulatory compliance for 100,000+ applications daily.
Key Assumptions
System Requirements
Functional
- Ingest applicant data from web forms, APIs, and partner integrations with schema validation
- Extract and engineer 200+ features from structured and unstructured data sources
- Score applications in real-time using ensemble ML models and business rules
- Generate explainable credit decisions with SHAP values and reason codes
- Orchestrate multi-step workflows: data validation β feature extraction β scoring β decisioning β adverse action
- Support manual review queues for edge cases and compliance spot-checks
- Maintain full audit trail of all decisions, model versions, and data lineage for 7 years
Non-Functional (SLOs)
π° Cost Targets: {"per_application_usd":0.15,"per_feature_compute_usd":0.001,"per_model_inference_usd":0.02,"monthly_infra_per_1k_apps_usd":500}
Agent Layer
planner
L4Decompose credit application into subtasks and coordinate agent execution
π§ task_decomposer, dependency_resolver, agent_selector
β‘ Recovery: Retry with exponential backoff (3 attempts), Route to manual review queue if planning fails, Log failure context for debugging
executor
L3Execute credit workflow: validation β feature extraction β scoring β decisioning
π§ data_validator, feature_extractor, model_inference_service, decision_engine
β‘ Recovery: Checkpoint intermediate results to resume on failure, Fallback to rule-based scoring if ML inference fails, Queue for manual review if critical step fails
evaluator
L3Validate outputs at each stage: data quality, feature validity, score sanity, compliance
π§ schema_validator, statistical_test_runner, drift_detector, explainability_checker
β‘ Recovery: Flag low-confidence outputs for human review, Trigger alerts if validation failure rate exceeds 5%, Automatically rollback model if drift detected
guardrail
L4Enforce compliance policies (FCRA, ECOA), detect bias, prevent prohibited basis discrimination
π§ pii_redactor, bias_detector, prohibited_basis_checker, adverse_action_generator
β‘ Recovery: Block decision if policy violation detected (hard stop), Generate adverse action notice with reason codes, Escalate to compliance officer for review
feature_extraction
L2Extract and engineer 200+ features from raw applicant data and external sources
π§ credit_bureau_api, bank_statement_parser, income_estimator, debt_to_income_calculator, feature_store_writer
β‘ Recovery: Use cached features if external API fails (max 24hr staleness), Impute missing features with population median, Flag incomplete feature sets for manual review
scoring
L2Generate credit score using ensemble of ML models and business rules
π§ model_inference_service, ensemble_aggregator, explainability_engine, rule_engine
β‘ Recovery: Fallback to single best model if ensemble fails, Use rule-based score if all ML models fail, Route to manual underwriting if confidence < 70%
explainability
L2Generate human-readable explanations for credit decisions (required for adverse actions)
π§ shap_interpreter, reason_code_mapper, natural_language_generator, regulatory_template_engine
β‘ Recovery: Use template-based explanations if LLM generation fails, Require human review if explanation confidence < 80%, Log all explanations for audit trail
ML Layer
Feature Store
Update: Real-time for transactional data, daily batch for credit bureau data, hourly for alternative data
- β’ credit_score
- β’ num_tradelines
- β’ total_debt
- β’ debt_to_income_ratio
- β’ num_inquiries_6mo
- β’ oldest_account_age_months
- β’ num_delinquencies_24mo
- β’ utilization_rate
- β’ income_stability_score
- β’ employment_tenure_months
- β’ rent_payment_history
- β’ utility_payment_history
- β’ bank_balance_avg_3mo
- β’ transaction_velocity
- β’ fraud_risk_score
Model Registry
Strategy: Semantic versioning (major.minor.patch), Git-backed model artifacts, MLflow tracking
- β’ credit_score_xgboost
- β’ credit_score_lightgbm
- β’ credit_score_neural_net
- β’ fraud_detector
Observability Stack
Real-time monitoring, tracing & alerting
0 activeDeployment Variants
Startup Architecture
Fast to deploy, cost-efficient, scales to 100 competitors
Infrastructure
Risks & Mitigations
β οΈ Model bias leading to discriminatory lending (ECOA violation)
Mediumβ Mitigation: Bias testing in every model release (demographic parity, equalized odds), guardrail agent enforces fairness thresholds, quarterly audits by compliance team, explainability for all denials, model governance committee approval required for production deployment
β οΈ Credit bureau API outage causing application backlog
Low (99.5% SLA from bureaus)β Mitigation: 24-hour cache for credit reports, multi-bureau redundancy (if Experian fails, try Equifax), fallback to rule-based scoring, auto-scaling message queue to handle backlog, SLA monitoring with PagerDuty alerts
β οΈ Model drift due to economic changes (e.g., recession, interest rate hikes)
High (economic cycles inevitable)β Mitigation: Daily drift detection (PSI, KS test), automated alerts if drift >0.25, quarterly model retraining with recent data, A/B testing before full rollout, business rule overrides during economic shocks, stress testing with recession scenarios
β οΈ Data breach exposing PII (SSN, credit reports)
Low (with proper security controls)β Mitigation: Encryption at rest (AES-256) and in transit (TLS 1.3), PII redaction before LLM calls, access controls (RBAC), audit logging (7-year retention), annual penetration testing, SOC2 Type II certification, incident response plan with <1 hour detection time
β οΈ LLM hallucination in adverse action notices (FCRA violation)
Medium (LLMs hallucinate ~5% of time)β Mitigation: 4-layer hallucination detection (confidence, fact-check, consistency, human review), template-based fallback, 100% explanation validation against ground truth, compliance officer spot-checks (10% sample), zero-tolerance policy for inaccurate notices
β οΈ Vendor lock-in to single LLM provider (OpenAI, Anthropic)
Medium (providers change pricing, terms)β Mitigation: Multi-provider architecture (GPT-4 + Claude), abstraction layer for LLM calls, prompt versioning in Git, fallback to open-source models (Llama 3, Mistral), quarterly cost analysis, contract negotiations with volume commitments
β οΈ Regulatory changes requiring model retraining (e.g., new FCRA rules)
Medium (regulations evolve)β Mitigation: Modular architecture (easy to swap components), RAG for policy enforcement (instant updates), legal team monitors regulatory changes, compliance committee reviews model changes, 3-month buffer for major regulatory updates, partnership with RegTech vendors
Evolution Roadmap
Progressive transformation from MVP to scale
Phase 1: MVP (0-3 months)
Phase 2: Scale (3-6 months)
Phase 3: Enterprise (6-12 months)
Complete Systems Architecture
9-layer architecture from presentation to security
Presentation
4 components
API Gateway
4 components
Agent Layer
7 components
ML Layer
5 components
Integration
5 components
Data
6 components
External
5 components
Observability
6 components
Security
6 components
Sequence Diagram - Credit Application Flow
Automated data flow every hour
Data Flow - Application to Decision
Applicant submission β Credit decision in under 500ms
Key Integrations
Credit Bureaus (Experian, Equifax, TransUnion)
Bank Statement Parsing (Plaid, Yodlee, MX)
Identity Verification (Jumio, Onfido, Persona)
Fraud Detection (Sift, Forter, Riskified)
LLM APIs (OpenAI GPT-4, Anthropic Claude)
Security & Compliance
Failure Modes & Recovery
| Failure | Fallback | Impact | SLA |
|---|---|---|---|
| Credit bureau API timeout or rate limit | Use cached credit report (max 24hr staleness) β If no cache, route to manual underwriting queue | Degraded accuracy but no downtime | 99.5% (credit bureau SLA is 99.0%) |
| ML model inference service down | Fallback to rule-based scoring (credit score + DTI + employment) β If rules fail, manual review | Lower approval rate (rule-based is more conservative), increased manual review volume | 99.9% (multi-model redundancy) |
| Feature extraction agent fails (missing data, API errors) | Impute missing features with population median β Flag application for review if >20% features missing | Reduced model confidence, higher manual review rate | 99.8% |
| Guardrail agent detects policy violation (e.g., prohibited basis discrimination) | Block decision immediately (hard stop) β Route to compliance officer for manual review β Generate incident report | Application delayed but compliance maintained (critical for regulatory adherence) | 100% (zero tolerance for policy violations) |
| Database unavailable (PostgreSQL primary down) | Failover to read replica (read-only mode) β Queue write operations β Promote replica to primary | Read-only mode for 2-5 minutes, write operations queued and replayed after recovery | 99.95% (automated failover) |
| LLM API fails (OpenAI/Anthropic outage) | Use template-based adverse action notices (pre-approved regulatory language) β Flag for human review | Generic explanations instead of personalized, compliance maintained | 99.9% (multi-provider redundancy) |
| Message queue overload (SQS/Pub/Sub backlog) | Auto-scale workers (horizontal scaling) β If backlog >10K messages, throttle new submissions β Alert ops team | Increased latency (queue processing time), potential user-facing delays | 99.9% |
RAG vs Fine-Tuning for Credit Policy Enforcement
Hallucination Detection in Credit Explanations
Evaluation Framework for Credit Models
Dataset Curation for Credit Risk
Agentic RAG for Dynamic Feature Engineering
Model Ensemble Strategy
Tech Stack Summary
2026 Randeep Bhatia. All Rights Reserved.
No part of this content may be reproduced, distributed, or transmitted in any form without prior written permission.