From prompts to production contact center.
Monday: 3 core prompts for voice/chat support. Tuesday: automated agents. Wednesday: team workflows. Thursday: complete technical architecture. Real-time transcription, sentiment analysis, agent assist, and omnichannel orchestration at 10K+ concurrent sessions.
Key Assumptions
System Requirements
Functional
- Real-time voice transcription (streaming ASR)
- Sentiment analysis on every message/utterance
- Agent assist: knowledge retrieval + suggested responses
- Automatic escalation based on sentiment/keywords
- Omnichannel routing (voice, chat, SMS, email)
- Call recording & transcript storage (7yr retention)
- Quality scoring & post-call analytics
Non-Functional (SLOs)
💰 Cost Targets: {"per_session_usd":0.15,"per_minute_voice_usd":0.02,"per_agent_assist_usd":0.005}
Agent Layer
planner
L3Decomposes customer request into sub-tasks, selects appropriate agents
🔧 intent_classifier, entity_extractor, task_decomposer
⚡ Recovery: Fallback to rule-based routing if LLM fails, Default to human handoff if confidence <0.6
transcription
L2Real-time speech-to-text with speaker diarization
🔧 deepgram_api, speaker_diarization_model
⚡ Recovery: Retry with exponential backoff, Fallback to Google Speech-to-Text if primary fails, Buffer audio for replay on recovery
sentiment
L2Analyze sentiment and detect escalation triggers
🔧 sentiment_classifier, emotion_detector, keyword_matcher
⚡ Recovery: Use rule-based sentiment if model fails, Default to neutral sentiment on error
assist
L3Generate suggested responses and retrieve knowledge articles
🔧 rag_retriever, response_generator, kb_search
⚡ Recovery: Return top KB articles only if generation fails, Cache previous successful responses for similar queries
escalation
L3Route to supervisor or specialized agent based on rules
🔧 escalation_rule_engine, agent_availability_checker, priority_scorer
⚡ Recovery: Default to supervisor queue if no specialized agent available, Notify customer of escalation delay
guardrail
L4PII redaction, policy checks, safety filters
🔧 pii_detector, content_filter, policy_validator
⚡ Recovery: Block entire message if PII detection fails, Log for manual review
ML Layer
Feature Store
Update: Real-time for session features, daily batch for historical
- • customer_lifetime_value
- • avg_session_duration
- • historical_sentiment_avg
- • escalation_history_count
- • product_usage_metrics
Model Registry
Strategy: Semantic versioning with A/B testing
- • sentiment_classifier
- • intent_classifier
- • response_generator
Observability Stack
Real-time monitoring, tracing & alerting
0 activeDeployment Variants
Startup Architecture
Fast to deploy, cost-efficient, scales to 100 competitors
Infrastructure
Risks & Mitigations
⚠️ ASR accuracy degrades for accents/background noise
High✓ Mitigation: Use multiple ASR providers (Deepgram + Google). Train custom model on customer recordings. Offer text fallback option.
⚠️ LLM costs spiral with high volume
Medium✓ Mitigation: Cache common responses (50% hit rate). Use smaller models for simple queries. Set per-session cost limits ($0.15). Monitor daily spend.
⚠️ PII leakage to LLM
Low✓ Mitigation: Redact before sending to LLM. Use AWS Comprehend Medical. Audit all prompts. HIPAA compliance review. Incident response plan.
⚠️ Agent assist suggestions are wrong/harmful
Medium✓ Mitigation: Human-in-the-loop: agent must approve. Fact-check against product catalog. Log all suggestions for review. A/B test new models in shadow mode.
⚠️ Sentiment model bias (race, gender)
Medium✓ Mitigation: Audit training data for bias. Test on diverse demographics. Use fairness metrics (equalized odds). Regular bias reviews.
⚠️ System overload during peak (Black Friday)
High✓ Mitigation: Auto-scaling with 2x headroom. Queue overflow to manual. Load testing before peak. Graceful degradation (disable non-critical features).
⚠️ Vendor lock-in (Deepgram, OpenAI)
Medium✓ Mitigation: Multi-provider architecture. Abstraction layer for ASR/LLM. Test failover quarterly. Negotiate volume discounts + SLAs.
Evolution Roadmap
Progressive transformation from MVP to scale
Phase 1: MVP (0-3 months)
Phase 2: Scale (3-6 months)
Phase 3: Enterprise (6-12 months)
Complete Systems Architecture
9-layer architecture for production voice & chat support
Presentation
4 components
API Gateway
4 components
Agent Layer
7 components
ML Layer
5 components
Integration
4 components
Data
4 components
External
4 components
Observability
5 components
Security
5 components
Sequence Diagram - Voice Call Flow
Automated data flow every hour
Data Flow - Voice Call
Key Integrations
WebRTC/Telephony
Speech-to-Text
CRM (Salesforce/Zendesk)
Knowledge Base
Security & Compliance
Failure Modes & Fallbacks
| Failure | Fallback | Impact | SLA |
|---|---|---|---|
| ASR service down | Switch to backup ASR (Google Speech-to-Text), buffer audio for replay | Degraded latency (+500ms), no data loss | 99.9% |
| LLM API timeout | Return cached responses for common queries, fallback to rule-based assist | Lower quality suggestions, no generation | 99.5% |
| Sentiment model fails | Use rule-based sentiment (keyword matching), default to neutral | Reduced accuracy, manual escalation may be needed | 99.0% |
| Database unavailable | Read from replica, write to queue for eventual consistency | Read-only mode, delayed writes | 99.9% |
| PII detection fails | Block all processing, route to manual review | Safety first, session paused | 100% |
| WebRTC gateway down | Route to backup gateway in another AZ, notify customer of reconnect | Brief interruption (<5s), no data loss | 99.9% |
| Knowledge base search fails | Return top articles from cache, skip reranking | Lower relevance, manual search may be needed | 99.5% |
RAG vs Fine-Tuning for Agent Assist
Hallucination Detection
Evaluation Framework
Dataset Curation
Agentic RAG
Real-Time Model Serving
Tech Stack Summary
Need Architecture Review?
We'll audit your contact center architecture, identify bottlenecks, and show you how to scale to 10K+ concurrent sessions with AI.
2026 Randeep Bhatia. All Rights Reserved.
No part of this content may be reproduced, distributed, or transmitted in any form without prior written permission.