From prompts to production retention engine.
Monday: 3 core prompts (churn prediction, engagement scoring, campaign generation). Tuesday: automated agent code. Wednesday: team workflows (Data β Growth β Product). Thursday: complete technical architecture. Agents, ML pipeline, scaling patterns, and compliance for 10,000+ users daily.
Key Assumptions
System Requirements
Functional
- Ingest user events (pageviews, feature usage, purchases) from multiple sources
- Compute engagement scores in real-time (<500ms) and churn risk daily
- Generate personalized retention campaigns (email, in-app, push) via LLM agents
- A/B test campaign variants and measure lift in retention metrics
- Provide dashboards for Growth team (churn trends, campaign performance, cohort analysis)
- Support multi-tenancy for enterprise customers (data isolation, custom models)
- Audit trail for all ML predictions and campaign decisions (HIPAA compliance)
Non-Functional (SLOs)
π° Cost Targets: {"per_user_per_month_usd":0.15,"ml_inference_per_1k_users_usd":2.5,"campaign_generation_per_user_usd":0.03}
Agent Layer
planner_agent
L3Decomposes retention tasks, selects tools, decides campaign strategy
π§ segment_classifier, campaign_recommender, timing_optimizer
β‘ Recovery: If tool fails: retry 3x with exponential backoff, If all retries fail: route to manual review queue, Log failure to audit trail with context
executor_agent
L2Executes retention workflows (score computation, campaign generation, delivery)
π§ feature_store.get_features(), ml_model.predict(), llm_api.generate_campaign(), email_api.send()
β‘ Recovery: If feature fetch fails: use cached features (staleness <1hr), If ML model fails: fallback to rule-based scoring, If LLM API fails: use pre-generated template, If email send fails: queue for retry (max 3x)
evaluator_agent
L3Validates campaign quality, checks for hallucinations, measures effectiveness
π§ content_classifier (toxicity, relevance), brand_checker, hallucination_detector, engagement_predictor
β‘ Recovery: If quality score <0.6: reject and regenerate, If hallucination detected: flag for human review, If brand violation: auto-reject with explanation
guardrail_agent
L4Enforces PII redaction, policy compliance, safety filters
π§ pii_detector (AWS Comprehend/Presidio), policy_engine (OPA/custom rules), content_filter (profanity, sensitive topics)
β‘ Recovery: If PII detected: auto-redact and log, If policy violation: block delivery and alert, If redaction fails: route to manual review
campaign_generator_agent
L2Generates personalized retention campaigns using LLM
π§ llm_api (GPT-4/Claude), template_selector, personalization_engine
β‘ Recovery: If LLM timeout: use cached template with dynamic fields, If generation fails: fallback to rule-based template, If content too generic: retry with more context
scoring_agent
L2Real-time engagement scoring and churn prediction
π§ feature_store.get_online_features(), ml_model.predict() (XGBoost/LightGBM), feature_importance_explainer
β‘ Recovery: If feature fetch fails: use last known score (staleness <24hr), If model inference fails: fallback to rule-based heuristic, If score anomaly detected: flag for review
ML Layer
Feature Store
Update: Real-time (streaming) + Daily batch refresh
- β’ user_tenure_days
- β’ events_last_7d
- β’ events_last_30d
- β’ feature_usage_breadth (distinct features used)
- β’ feature_usage_depth (avg usage per feature)
- β’ days_since_last_login
- β’ purchase_count_lifetime
- β’ support_tickets_last_90d
- β’ cohort_retention_rate
- β’ referral_count
Model Registry
Strategy: Semantic versioning (major.minor.patch), Git-backed
- β’ churn_classifier_v3
- β’ engagement_scorer_v2
- β’ campaign_recommender_v1
Observability Stack
Real-time monitoring, tracing & alerting
0 activeDeployment Variants
Startup Architecture
Fast to deploy, cost-efficient, scales to 100 competitors
Infrastructure
Risks & Mitigations
β οΈ Model accuracy degrades over time (concept drift)
High (user behavior changes)β Mitigation: Weekly offline evaluation. Auto-retrain if accuracy drops >5%. A/B test new models before full rollout.
β οΈ LLM generates inappropriate content (toxicity, bias)
Medium (LLMs are imperfect)β Mitigation: Guardrail Agent with toxicity filter. Human review for flagged content. Content policy enforcement via OPA.
β οΈ PII leakage to LLM provider
Low (with proper redaction)β Mitigation: PII detection before LLM call. Audit trail for all LLM requests. DPA with LLM provider. Fail-closed if redaction fails.
β οΈ Cost overruns from LLM API usage
Medium (usage spikes)β Mitigation: Rate limiting per user. Cost budgets with alerts. Fallback to templates if budget exceeded. Monthly cost review.
β οΈ Feature store staleness (outdated scores)
Medium (infrastructure failures)β Mitigation: Cache with TTL <1hr. Monitoring for staleness. Alert if staleness >30min. Fallback to rule-based if cache miss.
β οΈ Email deliverability issues (spam filters)
Medium (reputation management)β Mitigation: SPF/DKIM/DMARC setup. Monitor bounce/complaint rates. Warm up IP addresses. Use reputable email service (SendGrid).
β οΈ Agent orchestration failures (infinite loops, deadlocks)
Low (with proper testing)β Mitigation: Circuit breakers for agents. Max retry limits (3x). Timeout for each agent step (5s). Dead-letter queue for failed tasks.
Evolution Roadmap
Progressive transformation from MVP to scale
Phase 1: Foundation (0-3 months)
Phase 2: Scale (3-6 months)
Phase 3: Enterprise (6-12 months)
Complete Systems Architecture
9-layer architecture from user events to retention campaigns
Presentation
4 components
API Gateway
4 components
Agent Layer
6 components
ML Layer
6 components
Integration
5 components
Data
5 components
External
4 components
Observability
5 components
Security
5 components
Sequence Diagram - Retention Campaign Flow
Automated data flow every hour
Data Flow - Event to Campaign
User event β Scoring β Campaign generation β Delivery in <2 seconds
Key Integrations
Product Analytics (Segment/Amplitude)
Email Service (SendGrid/AWS SES)
CRM (Salesforce/HubSpot)
LLM API (OpenAI/Anthropic)
Feature Store (Feast/Tecton)
Security & Compliance
Failure Modes & Fallbacks
| Failure | Fallback | Impact | SLA |
|---|---|---|---|
| LLM API down (OpenAI/Anthropic) | β Use cached templates with dynamic personalization | Degraded personalization, campaigns still sent | 99.5% |
| ML model inference timeout | β Use rule-based scoring (engagement = events_last_7d / 10) | Lower accuracy, no churn prediction | 99.0% |
| Feature store unavailable | β Use cached features (staleness <1hr acceptable) | Slightly outdated scores, campaigns still sent | 99.5% |
| Email delivery service down | β Queue campaigns for retry (max 3x over 24hr) | Delayed delivery, no data loss | 99.0% |
| PII detection service fails | β Block all campaign generation until service recovers | No campaigns sent (safety first) | 100% compliance |
| Kafka broker down | β Buffer events in API server (max 1hr) | Delayed feature updates, eventual consistency | 99.5% |
| Database connection pool exhausted | β Read from replica, queue writes | Read-only mode for dashboards | 99.0% |
βββββββββββββββββββ
β Orchestrator β β Coordinates all agents
β (LangGraph) β
ββββββββββ¬βββββββββ
β
ββββββ΄ββββββ¬βββββββββββ¬ββββββββββββ¬ββββββββββββ¬βββββββββββ
β β β β β β
βββββΌββββ ββββΌββββ βββββΌβββββ ββββΌββββββ ββββΌβββββββ ββΌβββββββββ
βPlannerβ βScorerβ βCampaignβ βEvaluatorβ βGuardrailβ βExecutor β
β Agent β βAgent β β Agent β β Agent β β Agent β β Agent β
βββββ¬ββββ ββββ¬ββββ βββββ¬βββββ ββββ¬βββββββ ββββ¬βββββββ ββ¬βββββββββ
β β β β β β
ββββββββββββ΄βββββββββββ΄ββββββββββββ΄βββββββββββββ΄βββββββββββ
β
ββββββββΌβββββββ
β Actions β
β (Email/Push)β
βββββββββββββββπAgent Collaboration Flow
πAgent Types
Reactive Agent
LowScoring Agent - Responds to event, returns score
Reflexive Agent
MediumCampaign Generator - Uses context (user profile, past campaigns)
Deliberative Agent
HighPlanner Agent - Plans multi-step campaign strategy
Orchestrator Agent
HighestMain Orchestrator - Coordinates all agents, handles loops/retries
πLevels of Autonomy
RAG vs Fine-Tuning for Campaign Generation
Hallucination Detection for Campaigns
Evaluation Framework
Dataset Curation
Agentic RAG for Campaign Personalization
Multi-Armed Bandit for Campaign Optimization
Tech Stack Summary
Need Architecture Review?
We'll audit your retention system, identify bottlenecks, and show you how to scale 10x while cutting costs 30%.
2026 Randeep Bhatia. All Rights Reserved.
No part of this content may be reproduced, distributed, or transmitted in any form without prior written permission.