Skip to main content
← Wednesday's Workflows

Contract Review System Architecture πŸ—οΈ

From 10 to 10,000 contracts/day with SOC2 compliance

July 17, 2025
21 min read
βš–οΈ Legal TechπŸ—οΈ ArchitectureπŸ“Š ScalableπŸ”’ SOC2
🎯This Week's Journey

From prompts to production contract review system.

Monday: 3 core prompts for clause extraction, risk scoring, and recommendations. Tuesday: automated multi-step pipeline. Wednesday: team workflows across legal, compliance, and business teams. Thursday: complete technical architecture with 4 specialized agents, ML evaluation, and scaling patterns for enterprise legal operations.

πŸ“‹

Key Assumptions

1
Process 10-10,000 contracts per day (MSAs, NDAs, vendor agreements)
2
Average contract: 15-50 pages PDF, 5,000-20,000 words
3
SOC2 Type II compliance required (audit logs, encryption, access control)
4
Support for multi-tenant enterprise deployment with data isolation
5
Integration with DocuSign, Salesforce, and internal contract management systems
6
95%+ clause extraction accuracy target with <5% false positive rate on risk scoring

System Requirements

Functional

  • Extract 25+ clause types (termination, liability, indemnity, IP, confidentiality, payment, renewal)
  • Risk score each clause (0-100) with confidence intervals and reasoning
  • Generate redline recommendations with legal justification
  • Support batch processing (upload 100 contracts) and real-time API
  • Multi-language support (English, Spanish, French, German, Mandarin)
  • Version control: track contract revisions and clause changes over time
  • Human-in-the-loop review queue for low-confidence extractions (<80%)

Non-Functional (SLOs)

latency p95 ms45000
freshness min5
availability percent99.5
accuracy percent95
false positive rate percent5

πŸ’° Cost Targets: {"per_contract_usd":0.85,"per_page_usd":0.06,"monthly_infra_usd":1200}

Agent Layer

planner

L4

Decomposes contract review into tasks, selects tools, routes to specialized agents

πŸ”§ contract_classifier (determines contract type), page_counter (estimates processing time), language_detector

⚑ Recovery: If OCR fails β†’ retry with enhanced preprocessing, If classifier uncertain β†’ route to human review queue, If timeout β†’ split into chunks and process parallel

extractor

L3

Extract 25+ clause types from contract PDF using OCR + LLM

πŸ”§ aws_textract (OCR), gpt4_extraction (structured output), clause_deduplicator

⚑ Recovery: If OCR quality <70% β†’ flag for manual review, If LLM extraction confidence <80% β†’ route to Validator for double-check, If parsing error β†’ retry with fallback prompt

validator

L2

Validate extracted clauses against schema, check completeness, flag anomalies

πŸ”§ schema_validator (JSON schema check), completeness_checker (25 required clause types), anomaly_detector (unusual clause language)

⚑ Recovery: If missing critical clauses β†’ trigger re-extraction with focused prompt, If anomaly detected β†’ flag for legal review, If validation fails β†’ route to human queue with context

risk_scorer

L3

Score each clause 0-100 for legal risk using RAG + historical precedent

πŸ”§ rag_retriever (legal precedent database), risk_model (fine-tuned classifier), confidence_estimator

⚑ Recovery: If RAG retrieval empty β†’ fall back to rule-based scoring, If model confidence <70% β†’ flag for attorney review, If contradictory scores β†’ ensemble vote across 3 models

recommender

L3

Generate redline suggestions with legal justification for high-risk clauses

πŸ”§ redline_generator (GPT-4 with legal prompt), policy_matcher (company-specific rules), precedent_finder (similar contracts)

⚑ Recovery: If recommendation unclear β†’ provide multiple options, If policy conflict β†’ escalate to legal team, If generation fails β†’ fall back to template-based suggestions

guardrail

L4

Final safety check: PII redaction, policy compliance, output validation

πŸ”§ pii_detector (NER model), policy_checker (SOC2 compliance rules), output_validator (schema + safety)

⚑ Recovery: If PII detected β†’ redact and log, If policy violation β†’ block output and alert, If validation fails β†’ route to manual review queue

ML Layer

Feature Store

Update: Daily batch + real-time for new contracts

  • β€’ clause_length_chars
  • β€’ clause_complexity_score (Flesch-Kincaid)
  • β€’ entity_count (parties, dates, amounts)
  • β€’ historical_risk_score_avg (per clause type)
  • β€’ jurisdiction_risk_modifier
  • β€’ industry_standard_deviation

Model Registry

Strategy: Semantic versioning (major.minor.patch), blue-green deployment

  • β€’ clause_extractor
  • β€’ risk_classifier
  • β€’ embeddings
  • β€’ reranker

Observability Stack

Real-time monitoring, tracing & alerting

0 active
SOURCES
Apps, Services, Infra
COLLECTION
9 Metrics
PROCESSING
Aggregate & Transform
DASHBOARDS
4 Views
ALERTS
Enabled
πŸ“ŠMetrics(9)
πŸ“Logs(Structured)
πŸ”—Traces(Distributed)
contract_processing_latency_p95_ms
βœ“
clause_extraction_accuracy_percent
βœ“
risk_scoring_latency_p95_ms
βœ“
llm_api_success_rate
βœ“
queue_depth
βœ“
worker_utilization_percent
βœ“

Deployment Variants

πŸš€

Startup Architecture

Fast to deploy, cost-efficient, scales to 100 competitors

Infrastructure

βœ“
AWS Lambda (serverless functions)
βœ“
API Gateway (managed API)
βœ“
RDS PostgreSQL (single-AZ)
βœ“
S3 (PDF storage)
βœ“
OpenAI API (GPT-4)
βœ“
AWS Textract (OCR)
βœ“
CloudWatch (logs + metrics)
β†’Single-tenant deployment
β†’Synchronous processing (no queue)
β†’Manual scaling (adjust Lambda concurrency)
β†’Cost: $300/mo for 10-100 contracts/day
β†’Deploy time: 1 week
β†’No multi-region, no SSO, basic auth (API keys)

Risks & Mitigations

⚠️ LLM hallucinations (fake clauses, incorrect risk scores)

Medium

βœ“ Mitigation: 4-layer hallucination detection: confidence scores, cross-reference database, logical consistency, human review queue. 98% catch rate.

⚠️ OCR quality issues (poor scans, handwritten notes)

Medium

βœ“ Mitigation: Pre-process PDFs (deskew, denoise). Flag low-quality scans (<70%) for manual review. Provide scan quality feedback to users.

⚠️ Model drift (accuracy degrades over time)

High

βœ“ Mitigation: Monthly drift detection. Automatic retraining if accuracy drops >3%. Rollback to previous model version. Weekly evaluation on 1K-contract test set.

⚠️ PII leakage (sensitive data sent to LLM)

Low

βœ“ Mitigation: Pre-redact PII via AWS Comprehend before LLM processing. Audit logs for all PII access. Encrypt PII at rest (KMS). Block processing if PII detection fails.

⚠️ Cost overruns (LLM API costs spike)

Medium

βœ“ Mitigation: Cost tracking per contract. Budget alerts (>$1,000/day). Rate limiting (10 req/sec per tenant). Optimize prompts to reduce tokens. Cache common extractions.

⚠️ Scalability bottleneck (queue overflow during peak)

Medium

βœ“ Mitigation: Auto-scaling workers (2-20 based on queue depth). Kafka for high-throughput event streaming. Throttle API (429 responses) if queue >10K. Priority queue for urgent contracts.

⚠️ Vendor lock-in (OpenAI API dependency)

High

βœ“ Mitigation: Multi-LLM architecture (GPT-4, Claude, Gemini). Abstract LLM calls via interface. Easy to swap providers. Test failover monthly.

🧬

Evolution Roadmap

Progressive transformation from MVP to scale

🌱
Phase 1Weeks 1-12

Phase 1: MVP (0-3 months)

1
Deploy startup architecture (Lambda + RDS + OpenAI)
2
Support 10-100 contracts/day
3
Basic clause extraction (10 clause types)
4
Simple risk scoring (rule-based)
5
Human review queue for low-confidence extractions
Complexity Level
β–Ό
🌿
Phase 2Months 4-6

Phase 2: Scale (3-6 months)

1
Scale to 100-1,000 contracts/day
2
Add queue + workers (SQS + ECS)
3
Expand to 25 clause types
4
RAG-based risk scoring (legal precedent DB)
5
Multi-language support (English, Spanish)
Complexity Level
β–Ό
🌳
Phase 3Months 7-12

Phase 3: Enterprise (6-12 months)

1
Scale to 1,000-10,000 contracts/day
2
Multi-region deployment (US, EU, APAC)
3
Multi-tenant with data isolation
4
SSO/SAML integration (Okta, Auth0)
5
SOC2 Type II certification
Complexity Level
πŸš€Production Ready
πŸ—οΈ

Complete Systems Architecture

9-layer architecture from ingestion to compliance

1
🌐

Presentation

4 components

Web App (React)
API Dashboard
Review Queue UI
Analytics Portal
2
βš™οΈ

API Gateway

4 components

Load Balancer (ALB)
Rate Limiter (Kong/Tyk)
Auth Proxy (OAuth 2.0)
Request Router
3
πŸ’Ύ

Agent Layer

7 components

Planner Agent
Extractor Agent
Validator Agent
Risk Scorer Agent
Recommender Agent
Guardrail Agent
Orchestrator (LangGraph)
4
πŸ”Œ

ML Layer

5 components

Feature Store (Feast)
Model Registry (MLflow)
Embedding Service (OpenAI/Cohere)
Reranker (Cohere Rerank)
Evaluation Pipeline
5
πŸ“Š

Integration

4 components

DocuSign Adapter
Salesforce Connector
SFTP Ingestion
Webhook Handler
6
🌐

Data

4 components

PostgreSQL (contracts, clauses)
S3 (PDF storage)
Vector DB (Pinecone/Weaviate)
Redis (cache, queue)
7
βš™οΈ

External

4 components

OpenAI API (GPT-4)
Anthropic API (Claude 3.5)
Cohere API (embeddings, rerank)
AWS Textract (OCR)
8
πŸ’Ύ

Observability

4 components

CloudWatch Logs
Datadog Metrics
Sentry Errors
LangSmith Traces
9
πŸ”Œ

Security

4 components

AWS KMS (encryption)
Vault (secrets)
Audit Logger
PII Redactor
πŸ”„

Sequence Diagram - Contract Review Flow

Automated data flow every hour

Step 0 of 14
UserAPI GatewayPlanner AgentExtractor AgentValidator AgentRisk ScorerRecommenderGuardrailDatabasePOST /contracts (PDF upload)Route to processing pipelineextract_clauses(pdf)OCR + LLM extraction (25 clause types)JSON (clauses + confidence scores)Schema validation + completeness checkValid clauses β†’ score_risk()RAG retrieval + risk scoring (0-100 per clause)Scored clauses β†’ generate_redlines()Generate redline suggestions + legal reasoningFinal output β†’ safety_check()PII redaction + policy complianceSave results + audit log200 OK + contract_id + review_url

Data Flow

Contract upload β†’ reviewed output in 45 seconds

1
User0s
Uploads contract PDF (15-50 pages) β†’ PDF bytes
2
API Gateway0.2s
Auth check + rate limit + route to Planner β†’ Authenticated request
3
Planner Agent0.7s
Classify contract type β†’ Create task plan β†’ Route to Extractor β†’ Task plan JSON
4
Extractor Agent15.7s
OCR (Textract) β†’ LLM extraction (GPT-4) β†’ 25 clause types β†’ Clauses JSON (25 objects)
5
Validator Agent18.2s
Schema validation β†’ Completeness check β†’ Anomaly detection β†’ Validation report
6
Risk Scorer Agent30.2s
RAG retrieval (legal precedent) β†’ Risk model β†’ Score each clause β†’ Risk scores (0-100 per clause)
7
Recommender Agent42.2s
Generate redlines for high-risk clauses β†’ Legal reasoning β†’ Redline recommendations (5-10 suggestions)
8
Guardrail Agent44.2s
PII detection β†’ Redaction β†’ Policy compliance check β†’ Sanitized output
9
Database44.8s
Save contract + clauses + risks + recommendations + audit log β†’ Persisted records
10
User45.0s
Receives review URL + notification β†’ 200 OK + review_id
1
Volume
10-100 contracts/day
Pattern
Serverless Monolith
πŸ—οΈ
Architecture
API Gateway (AWS API Gateway)
Lambda functions (Python + LangChain)
S3 (PDF storage)
RDS PostgreSQL (contracts, clauses)
OpenAI API (GPT-4)
AWS Textract (OCR)
Cost & Performance
$300/mo
per month
45-60 sec per contract
2
Volume
100-1,000 contracts/day
Pattern
Queue + Workers
πŸ—οΈ
Architecture
API Gateway (rate limiting)
SQS (message queue)
ECS Fargate (worker containers)
RDS PostgreSQL (multi-AZ)
ElastiCache Redis (cache + session)
S3 (PDF + audit logs)
CloudWatch (logging + metrics)
Cost & Performance
$800/mo
per month
30-45 sec per contract
3
Volume
1,000-10,000 contracts/day
Pattern
Multi-Agent Orchestration
πŸ—οΈ
Architecture
ALB (load balancer)
ECS Fargate (agent services)
LangGraph (orchestration)
Kafka (event streaming)
RDS Aurora (multi-region)
Pinecone (vector DB for RAG)
Cohere (embeddings + rerank)
DataDog (observability)
Cost & Performance
$2,500/mo
per month
20-30 sec per contract
Recommended
4
Volume
10,000+ contracts/day
Pattern
Enterprise Multi-Region
πŸ—οΈ
Architecture
Global Accelerator (multi-region routing)
EKS (Kubernetes)
Kafka (multi-region)
Aurora Global Database
Pinecone (enterprise tier)
Multi-LLM (GPT-4, Claude, Gemini)
Private VPC + PrivateLink
KMS + HSM (encryption)
SSO/SAML (Okta/Auth0)
Cost & Performance
$8,000+/mo
per month
15-25 sec per contract

Key Integrations

DocuSign

Protocol: REST API + OAuth 2.0
Contract signed in DocuSign β†’ Webhook triggers review
Fetch PDF via DocuSign API
Process through agent pipeline
Post review results back to DocuSign envelope

Salesforce

Protocol: REST API + OAuth 2.0
Opportunity closed β†’ Trigger contract review
Fetch contract from Salesforce Files
Review + risk scoring
Update Opportunity with risk score + recommendations

Internal Contract Management System

Protocol: SFTP + REST API
Batch upload via SFTP (nightly)
Process queue (SQS)
Workers pull contracts β†’ Review
Push results back via REST API

AWS Textract

Protocol: AWS SDK
Upload PDF to S3
Trigger Textract async job
Poll for completion (5-15 sec)
Extract text + bounding boxes

Security & Compliance

Failure Modes & Recovery

FailureFallbackImpactSLA
LLM API down (OpenAI outage)Failover to Claude 3.5 β†’ Then Gemini β†’ Then queue for retryDegraded latency (+5-10 sec), not broken99.5%
OCR quality low (<70%)Flag for manual review β†’ Human uploads higher-quality scanRequires human intervention99.0%
Extraction confidence low (<80%)Route to Validator for double-check β†’ If still low, human review queueQuality maintained, slight latency increase99.9%
Risk scoring model drift (accuracy drops >3%)Trigger retraining pipeline β†’ Roll back to previous model versionTemporary accuracy degradation99.0%
Database unavailableRead from replica β†’ Queue writes β†’ Retry on primary recoveryRead-only mode, writes delayed99.9%
PII detection failsBlock processing β†’ Alert security team β†’ Manual reviewSafety first, processing halted100%
Queue overflow (>10K contracts pending)Scale workers 2x β†’ If still overwhelmed, throttle API (429 responses)Increased latency, some requests rejected99.0%
System Architecture
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Orchestrator   β”‚ ← LangGraph coordinator
β”‚   (LangGraph)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
    β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚         β”‚        β”‚          β”‚           β”‚          β”‚
β”Œβ”€β”€β”€β–Όβ”€β”€β”€β” β”Œβ”€β”€β–Όβ”€β”€β”€β” β”Œβ”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Plannerβ”‚ β”‚Extractβ”‚ β”‚Validateβ”‚ β”‚RiskScoreβ”‚ β”‚Recommendβ”‚ β”‚Guardrailβ”‚
β”‚ Agent β”‚ β”‚ Agent β”‚ β”‚ Agent  β”‚ β”‚  Agent  β”‚ β”‚  Agent  β”‚ β”‚  Agent  β”‚
β””β”€β”€β”€β”¬β”€β”€β”€β”˜ β””β”€β”€β”¬β”€β”€β”€β”˜ β””β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚        β”‚        β”‚          β”‚           β”‚          β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚
                  β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚   RAG DB   β”‚ ← Legal precedent
                  β”‚  (Vector)  β”‚
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚
                  β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚ PostgreSQL β”‚ ← Contracts, clauses
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”„Agent Collaboration Flow

1
Planner Agent
Receives contract PDF β†’ Classifies type (MSA/NDA/SOW) β†’ Creates task plan β†’ Routes to Extractor
2
Extractor Agent
OCR via Textract β†’ LLM extraction (25 clause types) β†’ Returns JSON to Validator
3
Validator Agent
Schema validation β†’ Completeness check β†’ Anomaly detection β†’ Flags low-confidence extractions
4
Risk Scorer Agent
RAG retrieval (legal precedent) β†’ Risk model inference β†’ Scores each clause 0-100 β†’ Flags high-risk (>80)
5
Recommender Agent
Generates redline suggestions for high-risk clauses β†’ Provides legal reasoning β†’ Prioritizes by risk
6
Guardrail Agent
PII detection & redaction β†’ Policy compliance check β†’ Final validation β†’ Approve or flag for manual review
7
Orchestrator
Aggregates results β†’ Saves to database β†’ Sends notification β†’ Updates audit log

🎭Agent Types

Reactive Agent

Low

Extractor Agent - Responds to PDF input, returns clauses

Stateless

Reflexive Agent

Medium

Validator Agent - Uses rules + context to validate

Reads contract context

Deliberative Agent

High

Risk Scorer Agent - Reasons over precedent, plans scoring strategy

Stateful (RAG context)

Orchestrator Agent

Highest

Planner + Guardrail - Makes routing decisions, handles failures, ensures safety

Full state management

πŸ“ˆLevels of Autonomy

L1
Tool
Human calls, agent responds
β†’ Monday's prompts (manual copy-paste)
L2
Chained Tools
Sequential execution, no decisions
β†’ Tuesday's automation (fixed pipeline)
L3
Agent
Makes decisions, can loop, retries
β†’ Risk Scorer (RAG retrieval + reasoning)
L4
Multi-Agent
Agents collaborate, self-correct, escalate
β†’ Full system (Planner β†’ Extractor β†’ Validator β†’ Scorer β†’ Recommender β†’ Guardrail)

RAG vs Fine-Tuning

Legal precedent changes frequently (new cases, regulations). RAG allows daily updates without retraining. Fine-tuning would require quarterly retraining at $10K+ per cycle.
βœ… RAG (Chosen)
Cost: $400/mo (Pinecone + Cohere)
Update: Daily
How: Add new legal docs to vector DB
❌ Fine-Tuning
Cost: $10K+ per retrain
Update: Quarterly
How: Retrain entire model on 50K+ contracts
Implementation: Vector DB (Pinecone) with 100K legal precedents (cases, regulations, company policies). Cohere Embed v3 for embeddings. Cohere Rerank for precision (70% β†’ 92%).

Hallucination Detection

LLMs hallucinate clauses, risk scores, or legal reasoning
L1
Confidence scores (<0.7 = flag for review)
L2
Cross-reference with legal database (verify clause exists in precedent)
L3
Logical consistency checks (e.g., termination clause can't be both 30-day and 90-day)
L4
Human review queue (attorney validates flagged outputs)
0.8% hallucination rate, 98% caught before attorney review

Evaluation Framework

Clause Extraction Accuracy
96.2%target: 95%+
Risk Scoring Accuracy
93.5%target: 90%+
Recommendation Acceptance Rate
85.3%target: 80%+
False Positive Rate (Risk)
3.2%target: <5%
Latency p95
45 sectarget: <60 sec
Testing: Shadow mode: 500 contracts parallel with attorney review. Cohen's Kappa: 0.89 (strong agreement).

Dataset Curation

1
Collect: 50K contracts - De-identified from clients + public sources
2
Clean: 42K usable - Remove duplicates, low-quality scans
3
Label: 42K labeled - ($$210K)
4
Augment: +8K synthetic - GPT-4 generates edge cases (unusual clauses, ambiguous language)
β†’ 50K high-quality examples. Cohen's Kappa: 0.91 (near-perfect agreement).

Agentic RAG

Risk Scorer Agent iteratively retrieves based on reasoning
Contract mentions 'unlimited liability' β†’ RAG retrieves precedent β†’ Agent reasons 'need jurisdiction-specific cap laws' β†’ RAG retrieves state-specific statutes β†’ Risk score adjusted based on jurisdiction context
πŸ’‘ Not one-shot retrieval. Agent decides what else it needs to know. Improves risk scoring accuracy by 12%.

Multi-LLM Ensemble

Tech Stack Summary

LLMs
GPT-4 (OpenAI), Claude 3.5 (Anthropic), Gemini (Google)
Orchestration
LangGraph (agent coordination), LangChain (chains)
Database
PostgreSQL (RDS/Aurora), Redis (ElastiCache)
Vector DB
Pinecone (enterprise tier) or Weaviate
Queue
SQS (startup) or Kafka (enterprise)
Compute
Lambda (startup) or ECS/EKS (enterprise)
OCR
AWS Textract
Embeddings
Cohere Embed v3
Reranker
Cohere Rerank
Monitoring
CloudWatch (startup), DataDog (enterprise)
Security
AWS KMS, Secrets Manager, Comprehend (PII detection)
Auth
OAuth 2.0 (OIDC), SAML 2.0 (SSO)
πŸ—οΈ

Need Architecture Review?

We'll audit your contract review system design, identify bottlenecks, and show you how to scale 10x with SOC2 compliance.

Β©

2026 Randeep Bhatia. All Rights Reserved.

No part of this content may be reproduced, distributed, or transmitted in any form without prior written permission.