Skip to main content
← Wednesday's Workflows

Omni-Channel Personalization System Architecture 🏗️

From 10K to 10M users/day with real-time ML and multi-agent orchestration

September 18, 2025
19 min read
🛍️ Retail🏗️ Architecture🤖 Multi-Agent📊 ML Infrastructure🔒 GDPR-Compliant
🎯This Week's Journey

From prompts to production personalization platform.

Monday: 3 core prompts (customer profiling, product matching, journey orchestration). Tuesday: automation code (agents + ML pipelines). Wednesday: team workflows (data science, engineering, ops). Thursday: complete technical architecture. Agents, ML layers, scaling patterns, GDPR compliance, and startup → enterprise evolution.

📋

Key Assumptions

1
10K-10M active users/day across web, mobile, email, SMS channels
2
Real-time personalization (< 100ms p95 latency for recommendations)
3
GDPR compliance required (EU data residency, PII redaction, right to deletion)
4
Multi-tenant architecture for enterprise (isolated data per customer)
5
Integration with existing e-commerce platforms (Shopify, Salesforce Commerce Cloud, custom)

System Requirements

Functional

  • Ingest user events (page views, clicks, purchases) from 5+ channels in real-time
  • Build unified customer profile (demographics, behavior, preferences, purchase history)
  • Generate personalized recommendations (products, content, offers) in < 100ms
  • Orchestrate multi-step journeys (abandoned cart, post-purchase, re-engagement)
  • A/B test personalization strategies with statistical significance tracking
  • Provide explainability for recommendations (why this product?)
  • Support batch campaigns (email, push) and real-time triggers (web, app)

Non-Functional (SLOs)

latency p95 ms100
freshness min5
availability percent99.9
recommendation quality ndcg0.85
click through rate lift percent15

💰 Cost Targets: {"per_user_per_month_usd":0.05,"ml_inference_per_1k_requests_usd":0.1,"feature_store_per_gb_usd":0.2}

Agent Layer

planner

L4

Decomposes user request into subtasks, selects tools, coordinates execution

🔧 cache_lookup, profile_agent, recommendation_agent, guardrail_agent

⚡ Recovery: If profile_agent fails → use cached profile (stale < 1hr), If recommendation_agent fails → fallback to popular products, If guardrail_agent fails → block request (safety first)

executor

L3

Executes the plan from Planner, manages state, handles retries

🔧 profile_agent.get_profile(), recommendation_agent.generate(), guardrail_agent.validate()

⚡ Recovery: Retry failed steps 3x with exponential backoff, If step fails after retries → skip step and log, If critical step fails → abort and return error

evaluator

L2

Validates output quality, checks business rules, logs metrics

🔧 diversity_check, relevance_score, policy_compliance, explainability_check

⚡ Recovery: If quality_score < 0.7 → trigger re-generation, If policy violation → block recommendation, If explainability missing → log warning but allow

guardrail

L1

Enforces safety, privacy, and compliance policies

🔧 pii_detector (Presidio), banned_products_filter, age_gate_check, gdpr_consent_check, diversity_enforcer

⚡ Recovery: If PII detected → redact and log incident, If banned product → remove from list, If GDPR violation → block entire request

profile

L2

Builds unified customer profile from all data sources

🔧 feature_store.get_features(), event_aggregator, crm_sync, cdp_lookup

⚡ Recovery: If feature_store unavailable → compute features on-the-fly (slower), If CRM sync fails → use cached CRM data (stale < 24hr), If CDP unavailable → skip CDP enrichment

recommendation

L3

Generates personalized product recommendations using ML models

🔧 model_registry.load_model(), collaborative_filter, content_filter, llm_reranker, diversity_optimizer

⚡ Recovery: If ML model fails → fallback to rule-based recommendations, If LLM reranker fails → use raw model scores, If diversity optimizer fails → return top N by score

ML Layer

Feature Store

Update: Real-time (streaming) for behavioral features, batch (daily) for demographics

  • user_demographics (age, gender, location)
  • behavioral_features (page_views_7d, purchases_30d, avg_order_value)
  • preference_features (favorite_categories, brand_affinity)
  • engagement_features (email_open_rate, app_usage_frequency)
  • product_features (category, price_tier, popularity_score)
  • contextual_features (time_of_day, device_type, channel)

Model Registry

Strategy: Semantic versioning (major.minor.patch), A/B test new versions before rollout

  • collaborative_filter
  • content_filter
  • llm_reranker

Observability Stack

Real-time monitoring, tracing & alerting

0 active
SOURCES
Apps, Services, Infra
COLLECTION
10 Metrics
PROCESSING
Aggregate & Transform
DASHBOARDS
5 Views
ALERTS
Enabled
📊Metrics(10)
📝Logs(Structured)
🔗Traces(Distributed)
recommendation_latency_p95_ms
recommendation_quality_ndcg
agent_execution_time_ms
cache_hit_rate
model_inference_time_ms
feature_store_latency_ms

Deployment Variants

🚀

Startup Architecture

Fast to deploy, cost-efficient, scales to 100 competitors

Infrastructure

Serverless (Lambda/Cloud Functions)
Managed databases (DynamoDB/Firestore)
Managed cache (ElastiCache/Memorystore)
SaaS feature store (Tecton Cloud)
OpenAI API (no self-hosted models)
Fast to ship, low operational overhead
Pay-per-use pricing (cost scales with usage)
Single-tenant (one customer per deployment)
Good for MVP and early growth (< 1M users/day)

Risks & Mitigations

⚠️ LLM API costs spiral out of control

High

✓ Mitigation: Set cost guardrails ($500/day max). Use cheaper models for non-critical tasks (GPT-3.5 for reranking, GPT-4 only for explainability). Cache aggressively (TTL=5min). Monitor cost per user.

⚠️ Model drift degrades recommendations

Medium

✓ Mitigation: Automated drift detection (KL divergence > 0.1 triggers alert). Weekly model retraining. A/B test new models before rollout. Monitor NDCG daily.

⚠️ PII leakage to LLM

Low

✓ Mitigation: Guardrail agent with Presidio PII detection. Redact before sending to LLM. Audit all LLM requests. Encrypt data at rest and in transit. Regular security audits.

⚠️ Cold-start problem (new users, no data)

High

✓ Mitigation: Use content-based filtering (product attributes) instead of collaborative filtering. Show popular products. Ask for explicit preferences (quiz). Synthetic data augmentation.

⚠️ Recommendation bias (filter bubble)

Medium

✓ Mitigation: Diversity optimizer (no more than 3 products from same category). Exploration bonus (10% of recommendations are random). Monitor diversity score.

⚠️ Vendor lock-in (AWS/GCP)

Medium

✓ Mitigation: Use open-source tools where possible (Feast, MLflow, Kafka). Abstract cloud-specific services (S3 → object storage interface). Multi-cloud strategy for critical services.

⚠️ Team lacks ML expertise

Medium

✓ Mitigation: Hire ML engineer or consultant. Use managed services (Tecton, Databricks). Invest in training. Start simple (rule-based) and iterate.

🧬

Evolution Roadmap

Progressive transformation from MVP to scale

🌱
Phase 10-3 months

Phase 1: MVP (0-3 months)

1
Ship basic personalization (rule-based + popular products)
2
Integrate with e-commerce platform (Shopify)
3
Collect event data (page views, clicks, purchases)
4
Deploy on serverless (Lambda + DynamoDB)
Complexity Level
🌿
Phase 23-6 months

Phase 2: ML-Powered (3-6 months)

1
Train collaborative filtering model (ALS)
2
Deploy feature store (Feast)
3
Add LLM reranker (GPT-4)
4
Implement A/B testing framework
5
Scale to 100K users/day
Complexity Level
🌳
Phase 36-12 months

Phase 3: Multi-Agent + Enterprise (6-12 months)

1
Build multi-agent system (planner, executor, evaluator, guardrail)
2
Deploy on Kubernetes (multi-tenant)
3
Add advanced ML (Two-Tower model, diversity optimizer)
4
Implement GDPR compliance (PII redaction, data residency)
5
Scale to 1M+ users/day
Complexity Level
🚀Production Ready
🏗️

Complete Systems Architecture

9-layer architecture from presentation to security

1
🌐

Presentation

4 components

Web App (React/Next.js)
Mobile App (React Native)
Email Client (SendGrid/SES)
SMS Gateway (Twilio)
2
⚙️

API Gateway

4 components

Load Balancer (ALB/Nginx)
Rate Limiter (Redis)
Auth (OIDC/SAML)
API Gateway (Kong/Apigee)
3
💾

Agent Layer

6 components

Planner Agent (LangGraph)
Executor Agent (Orchestrator)
Evaluator Agent (Quality Check)
Guardrail Agent (PII Redaction)
Profile Agent (Customer 360)
Recommendation Agent (ML Model)
4
🔌

ML Layer

5 components

Feature Store (Feast/Tecton)
Model Registry (MLflow)
Online Inference (TensorFlow Serving)
Offline Training (Spark/Databricks)
Evaluation Pipeline (Great Expectations)
5
📊

Integration

4 components

E-commerce API Adapter (Shopify/Salesforce)
CRM Connector (Salesforce/HubSpot)
CDP Sync (Segment/mParticle)
Analytics Export (GA4/Amplitude)
6
🌐

Data

5 components

Event Stream (Kafka/Kinesis)
OLTP Database (PostgreSQL/Aurora)
OLAP Warehouse (Snowflake/BigQuery)
Vector DB (Pinecone/Weaviate)
Cache (Redis/Memcached)
7
⚙️

External

4 components

LLM APIs (OpenAI/Anthropic/Gemini)
E-commerce Platforms (Shopify/Commerce Cloud)
Payment Gateways (Stripe/Adyen)
Identity Providers (Auth0/Okta)
8
💾

Observability

5 components

Metrics (Prometheus/Datadog)
Logs (ELK/Splunk)
Traces (Jaeger/Honeycomb)
Dashboards (Grafana/Datadog)
Alerts (PagerDuty/Opsgenie)
9
🔌

Security

5 components

WAF (Cloudflare/AWS WAF)
Secrets Manager (Vault/KMS)
PII Detection (Presidio/AWS Macie)
Audit Log (CloudTrail/Splunk)
RBAC (Keycloak/Okta)
🔄

Sequence Diagram - Personalized Recommendation Request

Automated data flow every hour

Step 0 of 19
UserAPI GatewayPlanner AgentProfile AgentRecommendation AgentGuardrail AgentFeature StoreModel RegistryCacheGET /recommendations?user_id=123Route request + auth checkCheck cache for user_id=123Cache missFetch customer profileGet features (demographics, behavior, preferences)Return feature vector (128 dims)Profile + featuresGenerate recommendationsLoad model v2.3 (collaborative filtering + LLM reranker)Model artifactsRun inference (top 20 products)Recommendations + scoresValidate recommendations (policy, PII, safety)Check: no banned products, no PII leakage, diversity score > 0.6Approved (filtered to 15 products)Store result (TTL=5min)Return JSON (15 products + explainability)200 OK + recommendations

Data Flow

User request → personalized recommendations in < 100ms

1
User0 ms
Requests recommendationsuser_id, channel, context
2
API Gateway5 ms
Authenticates, rate limitsAuth token validated
3
Planner Agent8 ms
Generates execution planPlan: profile → recommend → validate
4
Cache (Redis)3 ms
Checks cacheCache miss
5
Profile Agent25 ms
Fetches customer profileDemographics, behavior, preferences
6
Feature Store15 ms
Retrieves feature vector128-dim vector
7
Recommendation Agent35 ms
Runs ML inferenceTop 20 products + scores
8
Model Registry10 ms
Loads model v2.3Collaborative filter + LLM reranker
9
Guardrail Agent10 ms
Validates recommendationsFiltered to 15 products (removed 5 for policy)
10
Evaluator Agent8 ms
Checks qualityQuality score: 0.88, diversity: 0.72
11
Cache (Redis)3 ms
Stores resultTTL=5min
12
API Gateway5 ms
Returns responseJSON (15 products + explainability)
1
Volume
10K-100K users/day
Pattern
Serverless Monolith
🏗️
Architecture
API Gateway (AWS API Gateway)
Lambda functions (Python)
DynamoDB (user profiles)
ElastiCache (Redis for cache)
S3 (event logs)
OpenAI API (LLM reranker)
Cost & Performance
$500/mo
per month
80-120 ms p95
2
Volume
100K-1M users/day
Pattern
Queue + Workers
🏗️
Architecture
Load Balancer (ALB)
API server (Node.js/FastAPI)
Message queue (SQS/RabbitMQ)
Worker pool (ECS/Fargate)
PostgreSQL (RDS)
Redis (ElastiCache)
Feast (feature store)
Cost & Performance
$2K/mo
per month
60-100 ms p95
3
Volume
1M-10M users/day
Pattern
Multi-Agent Orchestration
🏗️
Architecture
Global Load Balancer (CloudFront/Cloudflare)
LangGraph (agent orchestrator)
Kafka (event streaming)
EKS/GKE (Kubernetes)
Snowflake (data warehouse)
Feast + Tecton (feature store)
MLflow (model registry)
TensorFlow Serving (inference)
Cost & Performance
$10K/mo
per month
40-80 ms p95
Recommended
4
Volume
10M+ users/day
Pattern
Enterprise Multi-Region
🏗️
Architecture
Multi-region deployment (US, EU, APAC)
Kubernetes (EKS/GKE) with auto-scaling
Kafka + Flink (real-time processing)
Multi-model serving (A/B testing)
Replicated databases (Aurora Global)
CDN (CloudFront/Fastly)
Advanced observability (Datadog/New Relic)
Cost & Performance
$50K+/mo
per month
20-60 ms p95

Key Integrations

E-commerce Platform (Shopify/Salesforce Commerce Cloud)

Protocol: REST API + Webhooks
Sync product catalog (daily batch)
Real-time order events (webhooks)
Push recommendations to storefront (API)
Track conversions (pixel/SDK)

CRM (Salesforce/HubSpot)

Protocol: REST API
Sync customer profiles (hourly)
Update lead scores based on engagement
Trigger campaigns based on recommendations
Export campaign results

CDP (Segment/mParticle)

Protocol: HTTP Tracking API
Send user events (page views, clicks, purchases)
Receive enriched profiles
Sync audience segments
Export to data warehouse

Analytics (Google Analytics 4/Amplitude)

Protocol: Measurement Protocol / HTTP API
Track recommendation impressions
Track clicks and conversions
Export to BigQuery for analysis
Build custom dashboards

Email/SMS (SendGrid/Twilio)

Protocol: REST API
Send personalized email campaigns
Send SMS for abandoned cart
Track opens, clicks, unsubscribes
Sync suppression list

Security & Compliance

Failure Modes & Fallbacks

FailureFallbackImpactSLA
LLM API down (OpenAI/Anthropic)Use rule-based recommendations (popular products, trending items)Degraded quality (no personalization), but system still works99.5%
Feature store unavailableCompute features on-the-fly (slower) or use cached features (stale < 1hr)Increased latency (50-100ms) or slightly stale recommendations99.9%
ML model inference failsUse previous model version or rule-based recommendationsSlightly lower quality, but no downtime99.9%
Guardrail agent fails (PII detection)Block all recommendations (safety first)No recommendations served (better safe than sorry)100%
Database unavailableRead from replica; write to queue for later processingRead-only mode for profiles; writes delayed99.9%
Cache (Redis) downDirect database queries (slower)Increased latency (2-3x), higher database load99.5%
API Gateway rate limit exceededReturn 429 (Too Many Requests) with retry-after headerUser sees error; must retryN/A (by design)

RAG vs Fine-Tuning

Product catalog changes daily (new arrivals, out of stock). RAG allows real-time updates. User preferences are more stable; fine-tuned embeddings capture long-term behavior.
✅ RAG (Chosen)
Cost: $200/mo (vector DB)
Update: Real-time (new products indexed immediately)
How: Embed products, store in Pinecone/Weaviate
❌ Fine-Tuning
Cost: $1K/mo (GPU training)
Update: Weekly (retrain user embeddings)
How: Fine-tune Two-Tower model on user-product interactions
Implementation: Vector DB (Pinecone) for product embeddings (updated hourly). Fine-tuned Two-Tower model for user embeddings (retrained weekly on Databricks).

Hallucination Detection

LLM reranker hallucinates product features (fake discounts, wrong categories)
L1
Confidence scores (< 0.7 = flag for review)
L2
Cross-reference product catalog (verify all attributes)
L3
Logical consistency checks (price < 0? category mismatch?)
L4
Human review queue for flagged recommendations
0.5% hallucination rate, 100% caught before serving

Evaluation Framework

NDCG@10
0.88target: 0.85+
Diversity Score
0.72target: 0.70+
CTR
6.2%target: 5%+
Conversion Rate
2.8%target: 2%+
Hallucination Rate
0.5%target: <1%
Testing: Shadow mode: 10K users see old system, 10K see new system. Compare metrics for 2 weeks before full rollout.

Dataset Curation

1
Collect: 100M events (page views, clicks, purchases) - Stream from Kafka
2
Clean: 85M usable (removed bots, duplicates) - Spark job with bot detection
3
Label: 85M implicit labels (click=1, no-click=0) - ($$0 (implicit feedback))
4
Augment: +5M synthetic (cold-start users, edge cases) - Generate synthetic users with GPT-4
90M training examples. 80/10/10 split (train/val/test). Stratified by user segment.

Agentic RAG

Agent iteratively retrieves based on reasoning
User views 'running shoes' → RAG retrieves similar products → Agent reasons 'user might need socks' → RAG retrieves complementary products → Agent generates bundle recommendation
💡 Not one-shot retrieval. Agent decides what else it needs to know. Leads to better cross-sell and upsell.

Multi-Armed Bandit (MAB)

Tech Stack Summary

LLMs
OpenAI GPT-4, Anthropic Claude, Google Gemini
Agent Orchestration
LangGraph, CrewAI, or custom Python orchestrator
ML Framework
TensorFlow, PyTorch
Feature Store
Feast (open-source) or Tecton (managed)
Model Registry
MLflow
Model Serving
TensorFlow Serving, TorchServe, or Seldon Core
Vector Database
Pinecone, Weaviate, or pgvector (PostgreSQL extension)
Database
PostgreSQL (Aurora), DynamoDB
Cache
Redis (ElastiCache)
Message Queue
Kafka (MSK), SQS, or RabbitMQ
Compute
Lambda (serverless), ECS/Fargate (containers), EKS (Kubernetes)
Monitoring
Datadog, Prometheus + Grafana, CloudWatch
Security
AWS KMS, Presidio (PII detection), OPA (policy engine)
🏗️

Need Architecture Review?

We'll audit your personalization system, identify bottlenecks, and show you how to scale 10x while cutting costs.

©

2026 Randeep Bhatia. All Rights Reserved.

No part of this content may be reproduced, distributed, or transmitted in any form without prior written permission.