Patient Intake System Architecture: HIPAA-Compliant AI Design

From prompts to production-grade healthcare infrastructure.

Monday showed 3 prompts for patient intake. Tuesday automated the workflow. Wednesday mapped team roles. Today: complete technical architecture with 4 specialized agents, FHIR integration, HIPAA compliance, ML evaluation loops, and scaling patterns from startup to enterprise. This is the blueprint for 10,000+ patients per day.

Key Assumptions

•Processing 100-10,000 patient intake forms per day across multiple facilities
•HIPAA compliance mandatory: PHI encryption, audit logs, access controls, 7-year retention
•Integration with Epic or Cerner EHR via FHIR R4 API
•Medical terminology database (ICD-10, drug interactions) updated weekly
•Startup deployment: AWS/GCP serverless → Enterprise: Multi-region Kubernetes with private networking

System Requirements

Functional

Extract 47 structured fields from free-text patient narratives with 99%+ accuracy
Validate completeness against EHR requirements and generate contextual follow-up questions
Redact PHI before LLM processing using AWS Comprehend Medical or equivalent
Transform extracted data to FHIR R4 bundles and POST to Epic/Cerner APIs
Maintain audit trail of all PHI access with 7-year retention for HIPAA compliance
Support iterative questioning loop until all 47 fields are complete or flagged for manual review
Provide real-time confidence scores and flag low-confidence extractions for human review

Non-Functional (SLOs)

latency p95 ms5000

freshness min1

availability percent99.5

extraction accuracy percent99

hallucination rate percent0.5

💰 Cost Targets: {"per_patient_intake_usd":0.15,"per_1000_patients_monthly_usd":150}

Agent Layer

planner

Decomposes intake task into subtasks, selects tools, manages workflow state

🔧 TaskDecomposer, ContextRetriever, AgentRouter

⚡ Recovery: If decomposition fails: fallback to single-step extraction, If tool unavailable: route to manual queue, Retry with exponential backoff (3 attempts)

executor_intake

Extracts 47 structured fields from free text using LLM with RAG context

🔧 Claude API (primary), GPT-4 API (fallback), VectorDB (RAG retrieval), PromptStore (versioned prompts)

⚡ Recovery: If LLM API timeout: retry with GPT-4 fallback, If confidence < 0.7: flag for human review, If JSON parse fails: use schema validator to fix

validator

Checks completeness of 47 fields, cross-references drug database, validates logical consistency

🔧 RuleEngine (47 field checks), RxNorm API (drug validation), LogicValidator (age/symptom consistency)

⚡ Recovery: If drug API down: skip drug validation, flag for later, If rule engine fails: use LLM-based validation as fallback

question_generator

Generates contextual follow-up questions for missing/incomplete fields

🔧 GPT-4 (question generation), TemplateLibrary (clinical question patterns), PriorityRanker (medical urgency)

⚡ Recovery: If generation fails: use template-based questions, If priority ranking fails: default to field order

evaluator

Quality checks on extracted data, hallucination detection, confidence scoring

🔧 HallucinationDetector (cross-reference drug DB), ConfidenceScorer (ensemble model), HistoricalComparator (drift detection)

⚡ Recovery: If hallucination detected: block EHR write, route to human, If confidence < threshold: flag for review

guardrail

PHI redaction, policy enforcement, safety filters before LLM processing

🔧 AWS Comprehend Medical (PHI detection), PolicyEngine (HIPAA rules), RedactionService (mask PHI)

⚡ Recovery: If PHI detection fails: block processing entirely (safety first), Log failure and route to manual review

ML Layer

Feature Store

Update: Real-time for online inference, daily batch for offline training

• patient_age_bin (derived from dob)
• symptom_count (extracted from text)
• medication_count
• prior_visit_count (from EHR)
• text_length_chars
• medical_term_density (ICD-10 matches per 100 words)

Model Registry

Strategy: Semantic versioning with A/B testing for new versions (10% traffic)

• extraction_model_claude
• extraction_model_gpt
• hallucination_detector
• question_ranker

Observability

Metrics

📊 intake_requests_total
📊 extraction_latency_p95_ms
📊 extraction_accuracy_percent
📊 validation_gap_rate
📊 question_generation_latency_ms
📊 ehr_write_success_rate
📊 ehr_write_latency_ms
📊 phi_redaction_latency_ms
📊 hallucination_detection_rate
📊 agent_retry_count
📊 llm_token_usage_total
📊 cost_per_patient_usd

Dashboards

📈 ops_dashboard
📈 ml_dashboard
📈 compliance_dashboard
📈 cost_dashboard

Traces

✅ Enabled

Deployment Variants

🚀 Startup

Infrastructure:

• AWS Lambda + API Gateway (serverless)
• RDS PostgreSQL (single instance)
• ElastiCache Redis (single node)
• S3 for audit logs
• CloudWatch for observability
• AWS Secrets Manager
• Anthropic/OpenAI API (direct calls)

→ Minimal ops overhead - fully managed services

→ Pay-per-use pricing (~$50-200/month for 100-1K patients/day)

→ Single-region deployment (us-east-1 or eu-west-1)

→ AWS-managed encryption keys

→ Basic RBAC via IAM roles

→ CloudWatch dashboards for monitoring

🏢 Enterprise

Infrastructure:

• EKS (Kubernetes) in 3+ regions
• Aurora Global Database (multi-region)
• ElastiCache Redis cluster (multi-AZ)
• Kafka (MSK or self-hosted)
• Private VPC with VPC peering
• Transit Gateway for multi-region networking
• BYO KMS/HSM for encryption
• SSO/SAML integration (Okta/Azure AD)
• Dedicated audit infrastructure (separate AWS account)
• Multi-LLM failover (Claude + GPT + Gemini)
• Prometheus + Grafana + Jaeger
• PagerDuty for alerting

→ 99.99% uptime SLA with multi-region failover

→ Data residency per tenant (US/EU/APAC)

→ Private networking - no public endpoints

→ Customer-managed encryption keys (CMK)

→ Advanced RBAC with SSO/SAML

→ Dedicated security team access

→ Compliance certifications (SOC2, HITRUST)

→ Cost: $3K-8K/month for 10K+ patients/day

📈 Migration: Start with startup architecture. At 1K patients/day, migrate to Kubernetes with zero downtime using blue-green deployment. Add multi-region at 5K patients/day. Enable private networking and BYO KMS when enterprise contracts require it. Estimated migration time: 2-3 months with phased rollout.

Risks & Mitigations

⚠️ LLM hallucination leads to incorrect medical data in EHR

Medium (0.3% rate observed)

✓ Mitigation: Multi-layer hallucination detection (L1-L4). 100% detection rate in testing. Block EHR write if flagged. Human review queue for all flagged cases.

⚠️ PHI leakage to LLM provider

Low (if properly implemented)

✓ Mitigation: Mandatory PHI redaction before LLM processing. AWS Comprehend Medical with 99.5% detection. Audit all LLM requests. Zero-tolerance policy: block processing if PHI detection fails.

⚠️ EHR API downtime prevents patient intake

Medium (Epic/Cerner have ~99% uptime)

✓ Mitigation: Queue-based retry with exponential backoff. Store locally until EHR available. Alert on-call if queue >100. SLA: 99% write success within 30 minutes.

⚠️ Model drift degrades extraction accuracy over time

Medium (medical terminology evolves)

✓ Mitigation: Weekly offline evaluation on 10K cases. Alert if accuracy <99%. RAG allows daily knowledge base updates without retraining. A/B test new models before full deployment.

⚠️ Cost overruns from LLM API usage

Medium (usage spikes during peak hours)

✓ Mitigation: Cost guardrails: $0.15 per patient target. Monitor token usage per request. Alert if monthly cost >$500. Implement caching for repeated extractions. Use cheaper models (GPT-3.5) for low-risk cases.

⚠️ Kubernetes cluster failure in single region

Low (K8s has 99.95% uptime)

✓ Mitigation: Multi-region deployment (3+ regions). Global load balancer with health checks. Auto-failover to healthy region within 30 seconds. RTO: 1 minute. RPO: 0 (real-time replication).

⚠️ Insider threat: employee accesses patient data

Low (with proper controls)

✓ Mitigation: RBAC with least privilege. All PHI access logged with user, timestamp, IP. Real-time alerts on bulk downloads. Annual security training. Background checks for employees.

Evolution Roadmap

Phase 1: MVP (0-3 months)

Months 0-3

→ Launch with 100 patients/day capacity
→ Single-region deployment (AWS us-east-1)
→ Basic HIPAA compliance (encryption, audit logs)
→ Manual review queue for low-confidence cases

Phase 2: Scale (3-6 months)

Months 3-6

→ Scale to 1,000 patients/day
→ Add Cerner integration
→ Implement queue-based processing
→ Advanced observability (Datadog)

Phase 3: Enterprise (6-12 months)

Months 6-12

→ Scale to 10,000+ patients/day
→ Multi-region deployment (3+ regions)
→ Enterprise security (SSO, BYO KMS)
→ 99.99% uptime SLA

Complete Systems Architecture

9-layer architecture from patient portal to EHR persistence

Presentation

Patient Web Portal (React)

Tablet App (React Native)

SMS Gateway (Twilio)

API Gateway

Load Balancer (ALB/Cloud LB)

Rate Limiter (Redis)

Auth Service (OIDC/SAML)

API Gateway (Kong/Apigee)

Agent Layer

Planner Agent (LangGraph)

Intake Executor Agent (Claude/GPT)

Validation Agent (Rule Engine + LLM)

Question Generator Agent (GPT/Claude)

Evaluator Agent (Quality Checks)

Guardrail Agent (PHI Redaction)

Orchestrator (LangGraph Supervisor)

ML Layer

Feature Store (Feast/Tecton)

Model Registry (MLflow)

Offline Training (Batch)

Online Inference (Real-time)

Evaluation Service (Metrics)

Prompt Store (Versioned)

Integration

PHI Handler (Comprehend Medical)

EHR Adapter (FHIR Mapper)

Epic API Client (OAuth 2.0)

Cerner API Client (OAuth 2.0)

Drug Database API (RxNorm)

Data

PostgreSQL (Patient Data)

Redis (Cache + Queue)

S3 (Audit Logs)

Vector DB (RAG - Pinecone/Weaviate)

External

Epic FHIR API

Cerner FHIR API

AWS Comprehend Medical

RxNorm Drug API

ICD-10 Code Service

Observability

Metrics (Prometheus/Datadog)

Logs (CloudWatch/ELK)

Traces (Jaeger/Honeycomb)

Dashboards (Grafana)

Alerting (PagerDuty)

Security

KMS (Encryption Keys)

WAF (DDoS Protection)

RBAC (IAM/Okta)

Secrets Manager (Vault/AWS Secrets)

Audit Service (Compliance)

End-to-End Request Flow with Timing

Patient Intake System - Agent Orchestration

6 Components

HTTP

REST

gRPC

Event

Stream

WebSocket

Patient Intake System - External Integrations

10 Components

HTTP

REST

gRPC

Event

Stream

WebSocket

Complete Data Flow

Patient text → EHR in 8 steps with timing

Patient Portal0ms

Submits intake form → Free text narrative (500-2000 words)

API Gateway50ms

Authenticates + rate limits → Validated request

Guardrail Agent800ms

Detects and redacts PHI → Sanitized text + PHI entity log

Planner Agent200ms

Decomposes task, routes to Intake Agent → Task plan + tool selection

Intake Executor Agent3000ms

Extracts 47 fields using LLM + RAG → JSON with confidence scores

Validation Agent1500ms

Checks completeness + drug interactions → Gap list (5 missing fields)

Evaluator Agent500ms

Quality check + hallucination detection → Quality score (0.92), incomplete flag

Planner Agent100ms

Routes to Question Generator (incomplete) → Decision: generate follow-ups

Question Generator Agent2500ms

Generates 5 contextual questions → Prioritized question list

API Gateway100ms

Returns questions to patient → JSON response (5 questions)

Patient PortalVariable (patient time)

Patient answers questions → Answer text for 5 questions

Planner Agent1000ms

Merges answers + re-validates → Updated JSON (47 fields complete)

Evaluator Agent500ms

Final approval check → Approved for EHR write

EHR Adapter800ms

Formats FHIR R4 bundle → Patient + Observation + Condition resources

Epic API1200ms

Persists to EHR database → 201 Created with patient ID

Audit Service100ms

Logs PHI access event → Audit trail entry (7yr retention)

Scaling Patterns

Volume

0-100 patients/day

Pattern

Serverless Monolith

Architecture

• API Gateway (AWS API Gateway / Cloud Run)

• Serverless functions (Lambda / Cloud Functions)

• Managed PostgreSQL (RDS / Cloud SQL)

• Redis (ElastiCache / Memorystore)

• S3 for audit logs

Cost

$50-100/month

5-8 seconds (cold start risk)

Volume

100-1,000 patients/day

Pattern

Queue-Based Processing

Architecture

• Load balancer (ALB / Cloud LB)

• API servers (ECS / Cloud Run)

• Message queue (SQS / Pub/Sub)

• Worker pool (ECS tasks / Cloud Run jobs)

• Managed DB + Read replicas

• Redis cluster

Cost

$200-400/month

3-5 seconds

Volume

1,000-10,000 patients/day

Pattern

Multi-Agent Orchestration

Architecture

• Kubernetes cluster (EKS / GKE)

• LangGraph orchestrator (containerized)

• Agent pool (auto-scaling pods)

• Message bus (Kafka / Pub/Sub)

• Multi-region DB (Aurora Global / Spanner)

• Vector DB cluster (Pinecone / Weaviate)

• Observability stack (Prometheus + Grafana)

Cost

$800-1,500/month

2-4 seconds

Volume

10,000+ patients/day

Pattern

Enterprise Multi-Region

Architecture

• Multi-region Kubernetes (EKS in 3+ regions)

• Global load balancer (Route 53 / Cloud CDN)

• Event streaming (Kafka / Confluent)

• Multi-LLM failover (Claude + GPT + Gemini)

• Replicated DB (Aurora Global Database)

• Private networking (VPC peering / Transit Gateway)

• BYO KMS/HSM for encryption

• SSO/SAML integration (Okta / Azure AD)

• Dedicated audit infrastructure

Cost

$3,000-8,000/month

1-3 seconds

Key External Integrations

Epic EHR (FHIR R4)

Protocol: HL7 FHIR R4

Extract JSON from intake

Map to FHIR resources (Patient, Observation, Condition)

Bundle resources into FHIR transaction

POST to Epic FHIR endpoint with OAuth token

Handle 201 Created or retry on 5xx

Cerner EHR (FHIR R4)

Protocol: HL7 FHIR R4

Similar to Epic but with Cerner-specific extensions

Map to Cerner FHIR profiles

POST to Cerner Millennium endpoint

AWS Comprehend Medical

Protocol: AWS SDK (boto3 / AWS SDK for JS)

Send patient text to DetectPHI API

Receive PHI entities (names, dates, MRNs, etc.)

Redact entities before LLM processing

Log redacted entities for audit trail

RxNorm Drug Database

Protocol: REST API (NLM RxNorm API)

Extract medication names from intake

Query RxNorm for standardized codes (RxCUI)

Check drug-drug interactions

Flag high-risk combinations

ICD-10 Code Service

Protocol: REST API (WHO ICD API or custom)

Extract symptoms from intake

Query ICD API for matching codes

Attach codes to FHIR Condition resources

Security & Compliance Architecture

Failure Modes & Recovery

Failure	Fallback	Impact	SLA
LLM API timeout or rate limit	Retry with exponential backoff (3 attempts) → Switch to fallback LLM (GPT if Claude down) → If all fail, route to manual queue	Degraded performance, not broken. Manual queue handles overflow.	99.5% availability (allows 3.6 hours downtime/month)
Low confidence extraction (<0.7)	Flag for human review. Do not send to EHR. Notify clinician.	Quality maintained. Patient waits for human review (~15 min).	99.9% accuracy maintained
EHR API timeout or 5xx error	Retry with exponential backoff (5 attempts over 10 minutes) → Queue for later retry → Alert on-call if queue >100	Eventual consistency. Data written within 30 minutes.	99.0% write success within 5 minutes
PHI detection service down	BLOCK all processing. Do not send to LLM. Route to manual queue.	Safety first. System degraded but compliant.	100% PHI protection (zero tolerance)
Database connection pool exhausted	Read from replica for read-only operations. Queue writes. Scale up connection pool.	Read-only mode for 1-2 minutes during scale-up.	99.9% database availability
Hallucination detected (fake drug name)	Block EHR write. Flag for human review. Log hallucination for model improvement.	Quality maintained. Patient data integrity protected.	0.5% hallucination rate, 100% caught
Kubernetes node failure	K8s auto-reschedules pods to healthy nodes. Load balancer routes around failed node.	Minimal. 30-60 second latency spike during rescheduling.	99.95% uptime

Advanced ML Engineering Patterns

RAG vs Fine-Tuning Decision

Hallucination Detection Pipeline

Evaluation Framework

Dataset Curation & Labeling

Agentic RAG (Iterative Retrieval)

Prompt Engineering & Versioning

Technology Stack

LLMs

Claude 3.5 Sonnet (primary), GPT-4 (fallback), Gemini (future)

Agent Orchestration

LangGraph (primary), CrewAI (evaluated), Custom framework (fallback)

Database

PostgreSQL (RDS/Aurora), Redis (ElastiCache)

Message Queue

Redis (startup), SQS (mid-tier), Kafka (enterprise)

Compute

Lambda (startup), ECS (mid-tier), EKS (enterprise)

Vector DB

Pinecone (managed), Weaviate (self-hosted), pgvector (embedded)

Observability

CloudWatch (startup), Datadog (mid-tier), Prometheus+Grafana+Jaeger (enterprise)

Security

AWS KMS (startup), BYO KMS/HSM (enterprise), Vault (secrets)

PHI Detection

AWS Comprehend Medical (primary), Presidio (open-source fallback)

EHR Integration

HAPI FHIR (Java), fhir-kit-client (Node.js)

🏗️

Need a Custom Healthcare AI System?

We build production-grade AI systems for healthcare. HIPAA-compliant, scalable, and integrated with your EHR.

Patient Intake System Architecture 🏗️

From prompts to production-grade healthcare infrastructure.

Key Assumptions

System Requirements

Functional

Non-Functional (SLOs)

Agent Layer

planner

executor_intake

validator

question_generator

evaluator

guardrail

ML Layer

Feature Store

Model Registry

Observability

Metrics

Dashboards

Traces

Deployment Variants

🚀 Startup

🏢 Enterprise

Risks & Mitigations

⚠️ LLM hallucination leads to incorrect medical data in EHR

⚠️ PHI leakage to LLM provider

⚠️ EHR API downtime prevents patient intake

⚠️ Model drift degrades extraction accuracy over time

⚠️ Cost overruns from LLM API usage

⚠️ Kubernetes cluster failure in single region

⚠️ Insider threat: employee accesses patient data

Evolution Roadmap

Phase 1: MVP (0-3 months)

Phase 2: Scale (3-6 months)

Phase 3: Enterprise (6-12 months)

Complete Systems Architecture

End-to-End Request Flow with Timing

Patient Intake System - Agent Orchestration

Patient Intake System - External Integrations

Complete Data Flow

Scaling Patterns

Key External Integrations

Epic EHR (FHIR R4)

Cerner EHR (FHIR R4)

AWS Comprehend Medical

RxNorm Drug Database

ICD-10 Code Service

Security & Compliance Architecture

Failure Modes & Recovery

Advanced ML Engineering Patterns

RAG vs Fine-Tuning Decision

Hallucination Detection Pipeline

Evaluation Framework

Dataset Curation & Labeling

Agentic RAG (Iterative Retrieval)

Prompt Engineering & Versioning

Technology Stack

Need a Custom Healthcare AI System?