From prompts to production-grade patient intake system.
Monday showed 3 core prompts. Tuesday automated the workflow. Wednesday mapped team roles. Today: complete technical architecture. Multi-agent orchestration, FHIR integration, HIPAA compliance, and scaling from 100 to 10,000 patients per day. This is the production system design that powers modern healthcare automation.
Key Assumptions
System Requirements
Functional
- Extract 47+ structured fields from free-text patient narratives
- Validate completeness against EHR requirements (demographics, insurance, medical history)
- Generate contextual follow-up questions for missing critical data
- Detect and redact PHI before LLM processing (names, SSNs, MRNs)
- Transform extracted data to FHIR R4 bundles for EHR ingestion
- Maintain audit trail for all PHI access (7-year retention)
- Support multi-language intake (English, Spanish minimum)
Non-Functional (SLOs)
π° Cost Targets: {"per_intake_usd":0.15,"monthly_infra_startup_usd":500,"monthly_infra_enterprise_usd":5000}
Agent Layer
planner
L3Decomposes intake request into subtasks, selects appropriate tools and agents
π§ task_decomposer, agent_selector, context_analyzer
β‘ Recovery: If task decomposition fails: fallback to simple sequential flow, If agent selection uncertain: route to human review queue, Retry with simplified plan (max 2 retries)
executor
L2Extracts structured data from patient narrative using LLM
π§ claude_api, gpt4_api, schema_validator, rag_retriever
β‘ Recovery: If LLM API fails: switch to backup LLM (GPT-4 β Claude β Gemini), If extraction confidence < 0.7: flag for human review, If schema validation fails: retry with clarified prompt, Max 3 retries with exponential backoff
evaluator
L3Validates completeness and quality of extracted data against EHR requirements
π§ field_validator, completeness_checker, clinical_rule_engine, icd10_validator
β‘ Recovery: If validation rules fail: use fallback rule set, If critical field missing: escalate to urgent review, If quality score < 0.8: trigger re-extraction
guardrail
L4PHI detection, redaction, safety checks, policy enforcement
π§ comprehend_medical, phi_detector, safety_classifier, policy_engine
β‘ Recovery: If PHI detection fails: block processing entirely (fail-safe), If safety score < threshold: escalate to compliance team, If policy violation detected: halt and audit, Zero tolerance for PHI leakage
question_generator
L3Generates contextual follow-up questions for missing critical data
π§ gpt4_api, clinical_context_retriever, question_ranker
β‘ Recovery: If question generation fails: use template-based questions, If no clinical context: generate generic questions, Max 5 questions per iteration
orchestrator
L4Coordinates all agents, manages workflow state, handles routing decisions
π§ state_manager, routing_engine, decision_tree, retry_handler
β‘ Recovery: If agent fails: route to backup agent or human queue, If workflow stuck: timeout after 30s and escalate, If loop detected: break after 3 iterations, Maintain workflow state for resume on failure
ML Layer
Feature Store
Update: Real-time for online features, daily batch for historical aggregates
- β’ patient_intake_history_count
- β’ avg_extraction_confidence
- β’ missing_fields_frequency
- β’ question_answer_rate
- β’ session_completion_time_seconds
- β’ phi_entity_density
- β’ clinical_complexity_score
- β’ language_detected
- β’ source_channel
Model Registry
Strategy: Semantic versioning with A/B testing for production rollout
- β’ extraction_model
- β’ phi_detector
- β’ question_generator
Observability Stack
Real-time monitoring, tracing & alerting
0 activeDeployment Variants
Startup Architecture
Fast to deploy, cost-efficient, scales to 100 competitors
Infrastructure
Risks & Mitigations
β οΈ PHI leakage to LLM provider
Lowβ Mitigation: Mandatory PHI redaction before LLM processing. Fail-safe: block all processing if PHI detection fails. AWS Comprehend Medical for detection. Audit trail for all PHI access. Regular penetration testing.
β οΈ LLM hallucination causes medical error
Mediumβ Mitigation: Multi-layer validation: confidence thresholds, drug database cross-reference, logical consistency checks, human review queue. Current hallucination rate: 0.3%, 100% caught before EHR write. Shadow mode testing before production rollout.
β οΈ EHR integration downtime
Mediumβ Mitigation: Multi-LLM failover. Retry queue with exponential backoff. Alerting within 5 minutes. Manual override process. SLA: 99.0% (allows 7.2 hours/month downtime).
β οΈ Cost overrun from LLM usage
Mediumβ Mitigation: Cost per intake target: $0.15. Monitoring and alerting at $0.20. Automatic throttling at $0.25. Optimize prompts to reduce token usage. Use cheaper models for non-critical tasks (e.g., GPT-3.5 for question generation).
β οΈ Compliance audit failure
Lowβ Mitigation: 7-year audit trail retention. Regular internal audits. Third-party HIPAA audit annually. SOC 2 Type II certification. Dedicated compliance officer. Incident response plan tested quarterly.
β οΈ Agent loop / infinite retry
Lowβ Mitigation: Circuit breaker after 3 iterations. Timeout after 30 seconds. Monitoring for loop detection. Automatic escalation to human review. Workflow state persistence for recovery.
β οΈ Data residency violation (cross-border transfer)
Lowβ Mitigation: US-based AWS regions only (us-east-1, us-west-2). No cross-region replication. Patient consent for data processing. Regular audits of data flows. DLP (Data Loss Prevention) policies.
Evolution Roadmap
Progressive transformation from MVP to scale
Phase 1: MVP (0-3 months)
Phase 2: Scale (3-6 months)
Phase 3: Enterprise (6-12 months)
Complete Systems Architecture
9-layer production architecture from patient portal to EHR
Presentation Layer
4 components
API Gateway Layer
4 components
Agent Layer
6 components
ML Layer
5 components
Integration Layer
4 components
Data Layer
5 components
External Services
6 components
Observability Layer
5 components
Security Layer
5 components
Complete Request Flow - Patient Intake to EHR
Automated data flow every hour
End-to-End Data Flow
Patient text β Sanitized β Extracted β Validated β Questions β FHIR β EHR
Key Integrations
Epic FHIR API
Cerner FHIR API
AWS Comprehend Medical
HL7 v2 (Legacy Systems)
Security & Compliance Architecture
Failure Modes & Recovery
| Failure | Fallback | Impact | SLA |
|---|---|---|---|
| LLM API down (Anthropic outage) | Automatic failover to GPT-4, then Gemini. If all fail, queue for manual processing. | Degraded performance (slower), not broken | 99.5% (allows 3.6 hours/month downtime) |
| Extraction confidence < 0.7 | Flag for human review, send to intake coordinator queue | Quality maintained, slight delay | 99.9% (manual review within 2 hours) |
| EHR API timeout (Epic/Cerner down) | Retry 3x with exponential backoff. If fails, store in retry queue with 5-minute interval. | Eventual consistency (data arrives late) | 99.0% (allows 7.2 hours/month) |
| PHI detection service fails | BLOCK ALL PROCESSING. Fail-safe mode. No PHI to LLM under any circumstance. | System unavailable until PHI detection restored | 100% (zero tolerance for PHI leakage) |
| Database unavailable | Switch to read replica for read operations. Write operations queued. | Read-only mode, writes delayed | 99.9% |
| Agent loop detected (infinite retry) | Circuit breaker after 3 iterations. Escalate to human. | Prevents resource exhaustion | N/A (safety mechanism) |
| FHIR validation fails | Log error, retry with corrected mapping. If fails 3x, human review. | Data quality maintained | 99.5% |
ββββββββββββββββββββββββββββββββββββββββββββ
β ORCHESTRATOR AGENT β
β (Workflow coordination & routing) β
ββββββββββββββ¬ββββββββββββββββββββββββββββββ
β
βββββββββ΄βββββββββ¬βββββββββ¬ββββββββββ¬βββββββββββ
β β β β β
ββββββΌββββββ βββββββΌβββββ βββΌβββββββ ββΌβββββββββ ββΌβββββββββ
β PLANNER β βGUARDRAIL β β INTAKE β βVALIDATORβ βQUESTION β
β AGENT β β AGENT β β AGENT β β AGENT β β AGENT β
β β β β β β β β β β
β Task β β PHI β βExtract β βCheck β βGenerate β
β Decomp β β Detect β βFields β βComplete β βFollow-upβ
ββββββββββββ ββββββββββββ ββββββββββ βββββββββββ βββββββββββ
β β β β β
ββββββββββββββββββ΄βββββββββββ΄βββββββββββ΄ββββββββββββ
β
ββββββΌββββββ
β FHIR β
β ADAPTER β
ββββββ¬ββββββ
β
ββββββΌββββββ
β Epic EHR β
ββββββββββββπAgent Collaboration Flow
πAgent Types
Reactive Agent
Low (Level 1)Intake Agent - Responds to input, returns output. No memory.
Reflexive Agent
Medium (Level 2)Validation Agent - Uses rules + context. Limited decision-making.
Deliberative Agent
High (Level 3)Question Agent - Plans questions based on gaps. Reasons about what to ask.
Orchestrator Agent
Highest (Level 4)Orchestrator - Makes routing decisions, handles loops, manages workflow state.
πLevels of Agent Autonomy
RAG vs Fine-Tuning Decision
Hallucination Detection
Evaluation Framework
Dataset Curation
Agentic RAG
Prompt Versioning
Complete Tech Stack
Need Architecture Review?
We'll audit your system design, identify bottlenecks, show you how to scale 10x, and ensure HIPAA compliance. 90-minute deep dive with actionable recommendations.
2026 Randeep Bhatia. All Rights Reserved.
No part of this content may be reproduced, distributed, or transmitted in any form without prior written permission.