From prompts to production-grade healthcare infrastructure.
Monday showed 3 prompts for patient intake. Tuesday automated the workflow. Wednesday mapped team roles. Today: complete technical architecture with 4 specialized agents, FHIR integration, HIPAA compliance, ML evaluation loops, and scaling patterns from startup to enterprise. This is the blueprint for 10,000+ patients per day.
Key Assumptions
- •Processing 100-10,000 patient intake forms per day across multiple facilities
- •HIPAA compliance mandatory: PHI encryption, audit logs, access controls, 7-year retention
- •Integration with Epic or Cerner EHR via FHIR R4 API
- •Medical terminology database (ICD-10, drug interactions) updated weekly
- •Startup deployment: AWS/GCP serverless → Enterprise: Multi-region Kubernetes with private networking
System Requirements
Functional
- Extract 47 structured fields from free-text patient narratives with 99%+ accuracy
- Validate completeness against EHR requirements and generate contextual follow-up questions
- Redact PHI before LLM processing using AWS Comprehend Medical or equivalent
- Transform extracted data to FHIR R4 bundles and POST to Epic/Cerner APIs
- Maintain audit trail of all PHI access with 7-year retention for HIPAA compliance
- Support iterative questioning loop until all 47 fields are complete or flagged for manual review
- Provide real-time confidence scores and flag low-confidence extractions for human review
Non-Functional (SLOs)
💰 Cost Targets: {"per_patient_intake_usd":0.15,"per_1000_patients_monthly_usd":150}
Agent Layer
planner
L4Decomposes intake task into subtasks, selects tools, manages workflow state
🔧 TaskDecomposer, ContextRetriever, AgentRouter
⚡ Recovery: If decomposition fails: fallback to single-step extraction, If tool unavailable: route to manual queue, Retry with exponential backoff (3 attempts)
executor_intake
L2Extracts 47 structured fields from free text using LLM with RAG context
🔧 Claude API (primary), GPT-4 API (fallback), VectorDB (RAG retrieval), PromptStore (versioned prompts)
⚡ Recovery: If LLM API timeout: retry with GPT-4 fallback, If confidence < 0.7: flag for human review, If JSON parse fails: use schema validator to fix
validator
L3Checks completeness of 47 fields, cross-references drug database, validates logical consistency
🔧 RuleEngine (47 field checks), RxNorm API (drug validation), LogicValidator (age/symptom consistency)
⚡ Recovery: If drug API down: skip drug validation, flag for later, If rule engine fails: use LLM-based validation as fallback
question_generator
L3Generates contextual follow-up questions for missing/incomplete fields
🔧 GPT-4 (question generation), TemplateLibrary (clinical question patterns), PriorityRanker (medical urgency)
⚡ Recovery: If generation fails: use template-based questions, If priority ranking fails: default to field order
evaluator
L4Quality checks on extracted data, hallucination detection, confidence scoring
🔧 HallucinationDetector (cross-reference drug DB), ConfidenceScorer (ensemble model), HistoricalComparator (drift detection)
⚡ Recovery: If hallucination detected: block EHR write, route to human, If confidence < threshold: flag for review
guardrail
L4PHI redaction, policy enforcement, safety filters before LLM processing
🔧 AWS Comprehend Medical (PHI detection), PolicyEngine (HIPAA rules), RedactionService (mask PHI)
⚡ Recovery: If PHI detection fails: block processing entirely (safety first), Log failure and route to manual review
ML Layer
Feature Store
Update: Real-time for online inference, daily batch for offline training
- • patient_age_bin (derived from dob)
- • symptom_count (extracted from text)
- • medication_count
- • prior_visit_count (from EHR)
- • text_length_chars
- • medical_term_density (ICD-10 matches per 100 words)
Model Registry
Strategy: Semantic versioning with A/B testing for new versions (10% traffic)
- • extraction_model_claude
- • extraction_model_gpt
- • hallucination_detector
- • question_ranker
Observability
Metrics
- 📊 intake_requests_total
- 📊 extraction_latency_p95_ms
- 📊 extraction_accuracy_percent
- 📊 validation_gap_rate
- 📊 question_generation_latency_ms
- 📊 ehr_write_success_rate
- 📊 ehr_write_latency_ms
- 📊 phi_redaction_latency_ms
- 📊 hallucination_detection_rate
- 📊 agent_retry_count
- 📊 llm_token_usage_total
- 📊 cost_per_patient_usd
Dashboards
- 📈 ops_dashboard
- 📈 ml_dashboard
- 📈 compliance_dashboard
- 📈 cost_dashboard
Traces
✅ Enabled
Deployment Variants
🚀 Startup
Infrastructure:
- • AWS Lambda + API Gateway (serverless)
- • RDS PostgreSQL (single instance)
- • ElastiCache Redis (single node)
- • S3 for audit logs
- • CloudWatch for observability
- • AWS Secrets Manager
- • Anthropic/OpenAI API (direct calls)
→ Minimal ops overhead - fully managed services
→ Pay-per-use pricing (~$50-200/month for 100-1K patients/day)
→ Single-region deployment (us-east-1 or eu-west-1)
→ AWS-managed encryption keys
→ Basic RBAC via IAM roles
→ CloudWatch dashboards for monitoring
🏢 Enterprise
Infrastructure:
- • EKS (Kubernetes) in 3+ regions
- • Aurora Global Database (multi-region)
- • ElastiCache Redis cluster (multi-AZ)
- • Kafka (MSK or self-hosted)
- • Private VPC with VPC peering
- • Transit Gateway for multi-region networking
- • BYO KMS/HSM for encryption
- • SSO/SAML integration (Okta/Azure AD)
- • Dedicated audit infrastructure (separate AWS account)
- • Multi-LLM failover (Claude + GPT + Gemini)
- • Prometheus + Grafana + Jaeger
- • PagerDuty for alerting
→ 99.99% uptime SLA with multi-region failover
→ Data residency per tenant (US/EU/APAC)
→ Private networking - no public endpoints
→ Customer-managed encryption keys (CMK)
→ Advanced RBAC with SSO/SAML
→ Dedicated security team access
→ Compliance certifications (SOC2, HITRUST)
→ Cost: $3K-8K/month for 10K+ patients/day
📈 Migration: Start with startup architecture. At 1K patients/day, migrate to Kubernetes with zero downtime using blue-green deployment. Add multi-region at 5K patients/day. Enable private networking and BYO KMS when enterprise contracts require it. Estimated migration time: 2-3 months with phased rollout.
Risks & Mitigations
⚠️ LLM hallucination leads to incorrect medical data in EHR
Medium (0.3% rate observed)✓ Mitigation: Multi-layer hallucination detection (L1-L4). 100% detection rate in testing. Block EHR write if flagged. Human review queue for all flagged cases.
⚠️ PHI leakage to LLM provider
Low (if properly implemented)✓ Mitigation: Mandatory PHI redaction before LLM processing. AWS Comprehend Medical with 99.5% detection. Audit all LLM requests. Zero-tolerance policy: block processing if PHI detection fails.
⚠️ EHR API downtime prevents patient intake
Medium (Epic/Cerner have ~99% uptime)✓ Mitigation: Queue-based retry with exponential backoff. Store locally until EHR available. Alert on-call if queue >100. SLA: 99% write success within 30 minutes.
⚠️ Model drift degrades extraction accuracy over time
Medium (medical terminology evolves)✓ Mitigation: Weekly offline evaluation on 10K cases. Alert if accuracy <99%. RAG allows daily knowledge base updates without retraining. A/B test new models before full deployment.
⚠️ Cost overruns from LLM API usage
Medium (usage spikes during peak hours)✓ Mitigation: Cost guardrails: $0.15 per patient target. Monitor token usage per request. Alert if monthly cost >$500. Implement caching for repeated extractions. Use cheaper models (GPT-3.5) for low-risk cases.
⚠️ Kubernetes cluster failure in single region
Low (K8s has 99.95% uptime)✓ Mitigation: Multi-region deployment (3+ regions). Global load balancer with health checks. Auto-failover to healthy region within 30 seconds. RTO: 1 minute. RPO: 0 (real-time replication).
⚠️ Insider threat: employee accesses patient data
Low (with proper controls)✓ Mitigation: RBAC with least privilege. All PHI access logged with user, timestamp, IP. Real-time alerts on bulk downloads. Annual security training. Background checks for employees.
Evolution Roadmap
Phase 1: MVP (0-3 months)
Months 0-3- → Launch with 100 patients/day capacity
- → Single-region deployment (AWS us-east-1)
- → Basic HIPAA compliance (encryption, audit logs)
- → Manual review queue for low-confidence cases
Phase 2: Scale (3-6 months)
Months 3-6- → Scale to 1,000 patients/day
- → Add Cerner integration
- → Implement queue-based processing
- → Advanced observability (Datadog)
Phase 3: Enterprise (6-12 months)
Months 6-12- → Scale to 10,000+ patients/day
- → Multi-region deployment (3+ regions)
- → Enterprise security (SSO, BYO KMS)
- → 99.99% uptime SLA
Complete Systems Architecture
9-layer architecture from patient portal to EHR persistence
End-to-End Request Flow with Timing
Patient Intake System - Agent Orchestration
6 ComponentsPatient Intake System - External Integrations
10 ComponentsComplete Data Flow
Patient text → EHR in 8 steps with timing
Scaling Patterns
Key External Integrations
Epic EHR (FHIR R4)
Cerner EHR (FHIR R4)
AWS Comprehend Medical
RxNorm Drug Database
ICD-10 Code Service
Security & Compliance Architecture
Failure Modes & Recovery
Failure | Fallback | Impact | SLA |
---|---|---|---|
LLM API timeout or rate limit | Retry with exponential backoff (3 attempts) → Switch to fallback LLM (GPT if Claude down) → If all fail, route to manual queue | Degraded performance, not broken. Manual queue handles overflow. | 99.5% availability (allows 3.6 hours downtime/month) |
Low confidence extraction (<0.7) | Flag for human review. Do not send to EHR. Notify clinician. | Quality maintained. Patient waits for human review (~15 min). | 99.9% accuracy maintained |
EHR API timeout or 5xx error | Retry with exponential backoff (5 attempts over 10 minutes) → Queue for later retry → Alert on-call if queue >100 | Eventual consistency. Data written within 30 minutes. | 99.0% write success within 5 minutes |
PHI detection service down | BLOCK all processing. Do not send to LLM. Route to manual queue. | Safety first. System degraded but compliant. | 100% PHI protection (zero tolerance) |
Database connection pool exhausted | Read from replica for read-only operations. Queue writes. Scale up connection pool. | Read-only mode for 1-2 minutes during scale-up. | 99.9% database availability |
Hallucination detected (fake drug name) | Block EHR write. Flag for human review. Log hallucination for model improvement. | Quality maintained. Patient data integrity protected. | 0.5% hallucination rate, 100% caught |
Kubernetes node failure | K8s auto-reschedules pods to healthy nodes. Load balancer routes around failed node. | Minimal. 30-60 second latency spike during rescheduling. | 99.95% uptime |