← Wednesday's Workflows

Patient Intake System Architecture 🏗️

HIPAA-compliant multi-agent system scaling from 100 to 10,000 patients/day

June 12, 2025
⚕️ Healthcare🏗️ Architecture📊 Scalable🔒 HIPAA🤖 Multi-Agent

From prompts to production-grade healthcare infrastructure.

Monday showed 3 prompts for patient intake. Tuesday automated the workflow. Wednesday mapped team roles. Today: complete technical architecture with 4 specialized agents, FHIR integration, HIPAA compliance, ML evaluation loops, and scaling patterns from startup to enterprise. This is the blueprint for 10,000+ patients per day.

Key Assumptions

  • Processing 100-10,000 patient intake forms per day across multiple facilities
  • HIPAA compliance mandatory: PHI encryption, audit logs, access controls, 7-year retention
  • Integration with Epic or Cerner EHR via FHIR R4 API
  • Medical terminology database (ICD-10, drug interactions) updated weekly
  • Startup deployment: AWS/GCP serverless → Enterprise: Multi-region Kubernetes with private networking

System Requirements

Functional

  • Extract 47 structured fields from free-text patient narratives with 99%+ accuracy
  • Validate completeness against EHR requirements and generate contextual follow-up questions
  • Redact PHI before LLM processing using AWS Comprehend Medical or equivalent
  • Transform extracted data to FHIR R4 bundles and POST to Epic/Cerner APIs
  • Maintain audit trail of all PHI access with 7-year retention for HIPAA compliance
  • Support iterative questioning loop until all 47 fields are complete or flagged for manual review
  • Provide real-time confidence scores and flag low-confidence extractions for human review

Non-Functional (SLOs)

latency p95 ms5000
freshness min1
availability percent99.5
extraction accuracy percent99
hallucination rate percent0.5

💰 Cost Targets: {"per_patient_intake_usd":0.15,"per_1000_patients_monthly_usd":150}

Agent Layer

planner

L4

Decomposes intake task into subtasks, selects tools, manages workflow state

🔧 TaskDecomposer, ContextRetriever, AgentRouter

⚡ Recovery: If decomposition fails: fallback to single-step extraction, If tool unavailable: route to manual queue, Retry with exponential backoff (3 attempts)

executor_intake

L2

Extracts 47 structured fields from free text using LLM with RAG context

🔧 Claude API (primary), GPT-4 API (fallback), VectorDB (RAG retrieval), PromptStore (versioned prompts)

⚡ Recovery: If LLM API timeout: retry with GPT-4 fallback, If confidence < 0.7: flag for human review, If JSON parse fails: use schema validator to fix

validator

L3

Checks completeness of 47 fields, cross-references drug database, validates logical consistency

🔧 RuleEngine (47 field checks), RxNorm API (drug validation), LogicValidator (age/symptom consistency)

⚡ Recovery: If drug API down: skip drug validation, flag for later, If rule engine fails: use LLM-based validation as fallback

question_generator

L3

Generates contextual follow-up questions for missing/incomplete fields

🔧 GPT-4 (question generation), TemplateLibrary (clinical question patterns), PriorityRanker (medical urgency)

⚡ Recovery: If generation fails: use template-based questions, If priority ranking fails: default to field order

evaluator

L4

Quality checks on extracted data, hallucination detection, confidence scoring

🔧 HallucinationDetector (cross-reference drug DB), ConfidenceScorer (ensemble model), HistoricalComparator (drift detection)

⚡ Recovery: If hallucination detected: block EHR write, route to human, If confidence < threshold: flag for review

guardrail

L4

PHI redaction, policy enforcement, safety filters before LLM processing

🔧 AWS Comprehend Medical (PHI detection), PolicyEngine (HIPAA rules), RedactionService (mask PHI)

⚡ Recovery: If PHI detection fails: block processing entirely (safety first), Log failure and route to manual review

ML Layer

Feature Store

Update: Real-time for online inference, daily batch for offline training

  • patient_age_bin (derived from dob)
  • symptom_count (extracted from text)
  • medication_count
  • prior_visit_count (from EHR)
  • text_length_chars
  • medical_term_density (ICD-10 matches per 100 words)

Model Registry

Strategy: Semantic versioning with A/B testing for new versions (10% traffic)

  • extraction_model_claude
  • extraction_model_gpt
  • hallucination_detector
  • question_ranker

Observability

Metrics

  • 📊 intake_requests_total
  • 📊 extraction_latency_p95_ms
  • 📊 extraction_accuracy_percent
  • 📊 validation_gap_rate
  • 📊 question_generation_latency_ms
  • 📊 ehr_write_success_rate
  • 📊 ehr_write_latency_ms
  • 📊 phi_redaction_latency_ms
  • 📊 hallucination_detection_rate
  • 📊 agent_retry_count
  • 📊 llm_token_usage_total
  • 📊 cost_per_patient_usd

Dashboards

  • 📈 ops_dashboard
  • 📈 ml_dashboard
  • 📈 compliance_dashboard
  • 📈 cost_dashboard

Traces

✅ Enabled

Deployment Variants

🚀 Startup

Infrastructure:

  • AWS Lambda + API Gateway (serverless)
  • RDS PostgreSQL (single instance)
  • ElastiCache Redis (single node)
  • S3 for audit logs
  • CloudWatch for observability
  • AWS Secrets Manager
  • Anthropic/OpenAI API (direct calls)

Minimal ops overhead - fully managed services

Pay-per-use pricing (~$50-200/month for 100-1K patients/day)

Single-region deployment (us-east-1 or eu-west-1)

AWS-managed encryption keys

Basic RBAC via IAM roles

CloudWatch dashboards for monitoring

🏢 Enterprise

Infrastructure:

  • EKS (Kubernetes) in 3+ regions
  • Aurora Global Database (multi-region)
  • ElastiCache Redis cluster (multi-AZ)
  • Kafka (MSK or self-hosted)
  • Private VPC with VPC peering
  • Transit Gateway for multi-region networking
  • BYO KMS/HSM for encryption
  • SSO/SAML integration (Okta/Azure AD)
  • Dedicated audit infrastructure (separate AWS account)
  • Multi-LLM failover (Claude + GPT + Gemini)
  • Prometheus + Grafana + Jaeger
  • PagerDuty for alerting

99.99% uptime SLA with multi-region failover

Data residency per tenant (US/EU/APAC)

Private networking - no public endpoints

Customer-managed encryption keys (CMK)

Advanced RBAC with SSO/SAML

Dedicated security team access

Compliance certifications (SOC2, HITRUST)

Cost: $3K-8K/month for 10K+ patients/day

📈 Migration: Start with startup architecture. At 1K patients/day, migrate to Kubernetes with zero downtime using blue-green deployment. Add multi-region at 5K patients/day. Enable private networking and BYO KMS when enterprise contracts require it. Estimated migration time: 2-3 months with phased rollout.

Risks & Mitigations

⚠️ LLM hallucination leads to incorrect medical data in EHR

Medium (0.3% rate observed)

✓ Mitigation: Multi-layer hallucination detection (L1-L4). 100% detection rate in testing. Block EHR write if flagged. Human review queue for all flagged cases.

⚠️ PHI leakage to LLM provider

Low (if properly implemented)

✓ Mitigation: Mandatory PHI redaction before LLM processing. AWS Comprehend Medical with 99.5% detection. Audit all LLM requests. Zero-tolerance policy: block processing if PHI detection fails.

⚠️ EHR API downtime prevents patient intake

Medium (Epic/Cerner have ~99% uptime)

✓ Mitigation: Queue-based retry with exponential backoff. Store locally until EHR available. Alert on-call if queue >100. SLA: 99% write success within 30 minutes.

⚠️ Model drift degrades extraction accuracy over time

Medium (medical terminology evolves)

✓ Mitigation: Weekly offline evaluation on 10K cases. Alert if accuracy <99%. RAG allows daily knowledge base updates without retraining. A/B test new models before full deployment.

⚠️ Cost overruns from LLM API usage

Medium (usage spikes during peak hours)

✓ Mitigation: Cost guardrails: $0.15 per patient target. Monitor token usage per request. Alert if monthly cost >$500. Implement caching for repeated extractions. Use cheaper models (GPT-3.5) for low-risk cases.

⚠️ Kubernetes cluster failure in single region

Low (K8s has 99.95% uptime)

✓ Mitigation: Multi-region deployment (3+ regions). Global load balancer with health checks. Auto-failover to healthy region within 30 seconds. RTO: 1 minute. RPO: 0 (real-time replication).

⚠️ Insider threat: employee accesses patient data

Low (with proper controls)

✓ Mitigation: RBAC with least privilege. All PHI access logged with user, timestamp, IP. Real-time alerts on bulk downloads. Annual security training. Background checks for employees.

Evolution Roadmap

1

Phase 1: MVP (0-3 months)

Months 0-3
  • Launch with 100 patients/day capacity
  • Single-region deployment (AWS us-east-1)
  • Basic HIPAA compliance (encryption, audit logs)
  • Manual review queue for low-confidence cases
2

Phase 2: Scale (3-6 months)

Months 3-6
  • Scale to 1,000 patients/day
  • Add Cerner integration
  • Implement queue-based processing
  • Advanced observability (Datadog)
3

Phase 3: Enterprise (6-12 months)

Months 6-12
  • Scale to 10,000+ patients/day
  • Multi-region deployment (3+ regions)
  • Enterprise security (SSO, BYO KMS)
  • 99.99% uptime SLA

Complete Systems Architecture

9-layer architecture from patient portal to EHR persistence

Presentation
Patient Web Portal (React)
Tablet App (React Native)
SMS Gateway (Twilio)
API Gateway
Load Balancer (ALB/Cloud LB)
Rate Limiter (Redis)
Auth Service (OIDC/SAML)
API Gateway (Kong/Apigee)
Agent Layer
Planner Agent (LangGraph)
Intake Executor Agent (Claude/GPT)
Validation Agent (Rule Engine + LLM)
Question Generator Agent (GPT/Claude)
Evaluator Agent (Quality Checks)
Guardrail Agent (PHI Redaction)
Orchestrator (LangGraph Supervisor)
ML Layer
Feature Store (Feast/Tecton)
Model Registry (MLflow)
Offline Training (Batch)
Online Inference (Real-time)
Evaluation Service (Metrics)
Prompt Store (Versioned)
Integration
PHI Handler (Comprehend Medical)
EHR Adapter (FHIR Mapper)
Epic API Client (OAuth 2.0)
Cerner API Client (OAuth 2.0)
Drug Database API (RxNorm)
Data
PostgreSQL (Patient Data)
Redis (Cache + Queue)
S3 (Audit Logs)
Vector DB (RAG - Pinecone/Weaviate)
External
Epic FHIR API
Cerner FHIR API
AWS Comprehend Medical
RxNorm Drug API
ICD-10 Code Service
Observability
Metrics (Prometheus/Datadog)
Logs (CloudWatch/ELK)
Traces (Jaeger/Honeycomb)
Dashboards (Grafana)
Alerting (PagerDuty)
Security
KMS (Encryption Keys)
WAF (DDoS Protection)
RBAC (IAM/Okta)
Secrets Manager (Vault/AWS Secrets)
Audit Service (Compliance)

End-to-End Request Flow with Timing

PatientAPI GatewayGuardrail AgentPlanner AgentIntake AgentValidation AgentEvaluator AgentQuestion AgentEHR AdapterEpic APIPOST /intake with free textPre-process: PHI detectionSanitized text + task plan requestExecute extraction with prompt v2.3Extracted JSON (47 fields attempted)Gap analysis: 5 fields missingQuality check: confidence=0.92, incompleteGenerate follow-up questions for gapsReturn 5 contextual questions to patientPOST /intake/continue with answersResume session with new dataMerge answers + re-validateComplete: all 47 fields presentApproved: format FHIR bundlePOST FHIR R4 Patient + Observation bundle201 Created with patient ID200 OK: Intake complete

Patient Intake System - Agent Orchestration

6 Components
[RPC]Raw patient input[Response]Sanitized data[RPC]Extract fields task[Response]47 structured fields[RPC]Validate extraction[Response]Validation results[RPC]Missing fields list[Response]Follow-up questions[RPC]Quality check request[Response]Confidence scores[Event]ML feedbackPlanner Agent4 capabilitiesGuardrail Agent4 capabilitiesIntake Executor Agent4 capabilitiesValidation Agent4 capabilitiesQuestion Generator Agent4 capabilitiesEvaluator Agent4 capabilities
HTTP
REST
gRPC
Event
Stream
WebSocket

Patient Intake System - External Integrations

10 Components
[HTTPS]Patient intake data[HTTPS]Mobile intake submission[WebSocket]Real-time questions[REST]FHIR bundles[REST]Patient history[REST]Medication validation[REST]Drug interactions[gRPC]RAG queries[gRPC]Context embeddings[Event]Audit events[Event]Notification triggers[Webhook]Delivery status[HTTPS]Review actions[WebSocket]Queue updates[Kafka]Training data stream[REST]Model updatesCore Intake System4 capabilitiesPatient Web Portal4 capabilitiesMobile App4 capabilitiesEHR System (FHIR)4 capabilitiesDrug Database API4 capabilitiesVector Database4 capabilitiesAudit Log Storage4 capabilitiesNotification Service4 capabilitiesProvider Dashboard4 capabilitiesML Training Pipeline4 capabilities
HTTP
REST
gRPC
Event
Stream
WebSocket

Complete Data Flow

Patient text → EHR in 8 steps with timing

1
Patient Portal0ms
Submits intake formFree text narrative (500-2000 words)
2
API Gateway50ms
Authenticates + rate limitsValidated request
3
Guardrail Agent800ms
Detects and redacts PHISanitized text + PHI entity log
4
Planner Agent200ms
Decomposes task, routes to Intake AgentTask plan + tool selection
5
Intake Executor Agent3000ms
Extracts 47 fields using LLM + RAGJSON with confidence scores
6
Validation Agent1500ms
Checks completeness + drug interactionsGap list (5 missing fields)
7
Evaluator Agent500ms
Quality check + hallucination detectionQuality score (0.92), incomplete flag
8
Planner Agent100ms
Routes to Question Generator (incomplete)Decision: generate follow-ups
9
Question Generator Agent2500ms
Generates 5 contextual questionsPrioritized question list
10
API Gateway100ms
Returns questions to patientJSON response (5 questions)
11
Patient PortalVariable (patient time)
Patient answers questionsAnswer text for 5 questions
12
Planner Agent1000ms
Merges answers + re-validatesUpdated JSON (47 fields complete)
13
Evaluator Agent500ms
Final approval checkApproved for EHR write
14
EHR Adapter800ms
Formats FHIR R4 bundlePatient + Observation + Condition resources
15
Epic API1200ms
Persists to EHR database201 Created with patient ID
16
Audit Service100ms
Logs PHI access eventAudit trail entry (7yr retention)

Scaling Patterns

Volume
0-100 patients/day
Pattern
Serverless Monolith
Architecture
API Gateway (AWS API Gateway / Cloud Run)
Serverless functions (Lambda / Cloud Functions)
Managed PostgreSQL (RDS / Cloud SQL)
Redis (ElastiCache / Memorystore)
S3 for audit logs
Cost
$50-100/month
5-8 seconds (cold start risk)
Volume
100-1,000 patients/day
Pattern
Queue-Based Processing
Architecture
Load balancer (ALB / Cloud LB)
API servers (ECS / Cloud Run)
Message queue (SQS / Pub/Sub)
Worker pool (ECS tasks / Cloud Run jobs)
Managed DB + Read replicas
Redis cluster
Cost
$200-400/month
3-5 seconds
Volume
1,000-10,000 patients/day
Pattern
Multi-Agent Orchestration
Architecture
Kubernetes cluster (EKS / GKE)
LangGraph orchestrator (containerized)
Agent pool (auto-scaling pods)
Message bus (Kafka / Pub/Sub)
Multi-region DB (Aurora Global / Spanner)
Vector DB cluster (Pinecone / Weaviate)
Observability stack (Prometheus + Grafana)
Cost
$800-1,500/month
2-4 seconds
Volume
10,000+ patients/day
Pattern
Enterprise Multi-Region
Architecture
Multi-region Kubernetes (EKS in 3+ regions)
Global load balancer (Route 53 / Cloud CDN)
Event streaming (Kafka / Confluent)
Multi-LLM failover (Claude + GPT + Gemini)
Replicated DB (Aurora Global Database)
Private networking (VPC peering / Transit Gateway)
BYO KMS/HSM for encryption
SSO/SAML integration (Okta / Azure AD)
Dedicated audit infrastructure
Cost
$3,000-8,000/month
1-3 seconds

Key External Integrations

Epic EHR (FHIR R4)

Protocol: HL7 FHIR R4
Extract JSON from intake
Map to FHIR resources (Patient, Observation, Condition)
Bundle resources into FHIR transaction
POST to Epic FHIR endpoint with OAuth token
Handle 201 Created or retry on 5xx

Cerner EHR (FHIR R4)

Protocol: HL7 FHIR R4
Similar to Epic but with Cerner-specific extensions
Map to Cerner FHIR profiles
POST to Cerner Millennium endpoint

AWS Comprehend Medical

Protocol: AWS SDK (boto3 / AWS SDK for JS)
Send patient text to DetectPHI API
Receive PHI entities (names, dates, MRNs, etc.)
Redact entities before LLM processing
Log redacted entities for audit trail

RxNorm Drug Database

Protocol: REST API (NLM RxNorm API)
Extract medication names from intake
Query RxNorm for standardized codes (RxCUI)
Check drug-drug interactions
Flag high-risk combinations

ICD-10 Code Service

Protocol: REST API (WHO ICD API or custom)
Extract symptoms from intake
Query ICD API for matching codes
Attach codes to FHIR Condition resources

Security & Compliance Architecture

Failure Modes & Recovery

FailureFallbackImpactSLA
LLM API timeout or rate limitRetry with exponential backoff (3 attempts) → Switch to fallback LLM (GPT if Claude down) → If all fail, route to manual queueDegraded performance, not broken. Manual queue handles overflow.99.5% availability (allows 3.6 hours downtime/month)
Low confidence extraction (<0.7)Flag for human review. Do not send to EHR. Notify clinician.Quality maintained. Patient waits for human review (~15 min).99.9% accuracy maintained
EHR API timeout or 5xx errorRetry with exponential backoff (5 attempts over 10 minutes) → Queue for later retry → Alert on-call if queue >100Eventual consistency. Data written within 30 minutes.99.0% write success within 5 minutes
PHI detection service downBLOCK all processing. Do not send to LLM. Route to manual queue.Safety first. System degraded but compliant.100% PHI protection (zero tolerance)
Database connection pool exhaustedRead from replica for read-only operations. Queue writes. Scale up connection pool.Read-only mode for 1-2 minutes during scale-up.99.9% database availability
Hallucination detected (fake drug name)Block EHR write. Flag for human review. Log hallucination for model improvement.Quality maintained. Patient data integrity protected.0.5% hallucination rate, 100% caught
Kubernetes node failureK8s auto-reschedules pods to healthy nodes. Load balancer routes around failed node.Minimal. 30-60 second latency spike during rescheduling.99.95% uptime

Advanced ML Engineering Patterns

RAG vs Fine-Tuning Decision

Hallucination Detection Pipeline

Evaluation Framework

Dataset Curation & Labeling

Agentic RAG (Iterative Retrieval)

Prompt Engineering & Versioning

Technology Stack

LLMs
Claude 3.5 Sonnet (primary), GPT-4 (fallback), Gemini (future)
Agent Orchestration
LangGraph (primary), CrewAI (evaluated), Custom framework (fallback)
Database
PostgreSQL (RDS/Aurora), Redis (ElastiCache)
Message Queue
Redis (startup), SQS (mid-tier), Kafka (enterprise)
Compute
Lambda (startup), ECS (mid-tier), EKS (enterprise)
Vector DB
Pinecone (managed), Weaviate (self-hosted), pgvector (embedded)
Observability
CloudWatch (startup), Datadog (mid-tier), Prometheus+Grafana+Jaeger (enterprise)
Security
AWS KMS (startup), BYO KMS/HSM (enterprise), Vault (secrets)
PHI Detection
AWS Comprehend Medical (primary), Presidio (open-source fallback)
EHR Integration
HAPI FHIR (Java), fhir-kit-client (Node.js)
🏗️

Need a Custom Healthcare AI System?

We build production-grade AI systems for healthcare. HIPAA-compliant, scalable, and integrated with your EHR.