Skip to main content
← Wednesday's Workflows

Product Description System Architecture 🏗️

From 100 to 100,000 SKUs/day with AI agents and ML pipeline

May 8, 2025
16 min read
🛒 E-commerce🏗️ Architecture📊 Scalable🤖 Multi-Agent🔒 PCI Compliant
🎯This Week's Journey

From prompts to production-grade product catalog system.

Monday: 3 core prompts. Tuesday: automation code. Wednesday: team workflows. Thursday: complete technical architecture. Agents, ML pipeline, data flows, scaling patterns, and multi-tenant deployment for enterprise e-commerce.

📋

Key Assumptions

1
Catalog size: 1K-1M SKUs with daily updates
2
Traffic: 100-100K description generations per day
3
Compliance: PCI-DSS for payment data, GDPR for EU customers
4
Integration: Existing CMS (Shopify/Magento), PIM, and analytics platforms
5
Quality bar: 95%+ human-approved descriptions, <1% hallucination rate

System Requirements

Functional

  • Extract product attributes from raw data (images, specs, competitor text)
  • Generate SEO-optimized descriptions in 12+ languages
  • Validate outputs for brand voice, accuracy, and compliance
  • Integrate with CMS, PIM, inventory, and analytics systems
  • Support bulk processing (10K+ SKUs) and real-time single-SKU updates
  • A/B test descriptions and track conversion impact
  • Human-in-the-loop review queue for low-confidence outputs

Non-Functional (SLOs)

latency p95 ms3000
freshness min15
availability percent99.9
accuracy percent95
hallucination rate percent1

💰 Cost Targets: {"per_sku_usd":0.05,"monthly_infra_usd_startup":500,"monthly_infra_usd_enterprise":5000}

Agent Layer

planner

L3

Decomposes SKU generation task into subtasks, selects tools

🔧 fetchProductData(), fetchCompetitorData(), selectLLM(), estimateCost()

⚡ Recovery: If data fetch fails → retry 3x with backoff, If no competitor data → use generic template, If cost exceeds budget → downgrade to cheaper LLM

executor

L2

Executes generation plan, calls LLM, formats output

🔧 callLLM(), formatMarkdown(), extractKeywords(), translateText()

⚡ Recovery: If LLM timeout → retry with shorter prompt, If hallucination detected → regenerate with stricter prompt, If translation fails → fallback to English

evaluator

L3

Validates output quality, checks brand voice, detects hallucinations

🔧 checkFactualAccuracy(), scoreBrandVoice(), detectHallucination(), scoreSEO()

⚡ Recovery: If low confidence (<70%) → route to human review, If hallucination detected → flag and regenerate, If brand voice mismatch → suggest edits

guardrail

L4

Policy checks, PII redaction, safety filters, compliance validation

🔧 detectPII(), checkPolicyViolations(), redactSensitiveData(), validateCompliance()

⚡ Recovery: If PII detected → auto-redact and log, If policy violation → block publication and alert, If compliance fail → route to legal review

seo

L2

Optimize descriptions for search engines, inject keywords

🔧 analyzeKeywordDensity(), optimizeReadability(), generateMetaTags(), scorePageRank()

⚡ Recovery: If keyword stuffing detected → rebalance, If readability too low → simplify language, If meta tags missing → auto-generate

translation

L2

Translate descriptions to 12+ languages with cultural adaptation

🔧 translateText(), adaptCulturalContext(), validateGrammar(), checkLocalCompliance()

⚡ Recovery: If translation API down → queue for later, If low quality score → human translator review, If cultural mismatch → adapt phrasing

ML Layer

Feature Store

Update: Real-time for inventory, daily batch for embeddings, weekly for analytics

  • product_category_embedding
  • brand_voice_vector
  • competitor_price_stats
  • historical_conversion_rate
  • user_engagement_metrics
  • seo_keyword_relevance
  • image_quality_score
  • inventory_velocity

Model Registry

Strategy: Semantic versioning with A/B testing before promotion

  • description_generator_v3
  • brand_voice_classifier
  • hallucination_detector
  • seo_scorer

Observability Stack

Real-time monitoring, tracing & alerting

0 active
SOURCES
Apps, Services, Infra
COLLECTION
9 Metrics
PROCESSING
Aggregate & Transform
DASHBOARDS
5 Views
ALERTS
Enabled
📊Metrics(9)
📝Logs(Structured)
🔗Traces(Distributed)
description_generation_latency_p95_ms
llm_api_success_rate
hallucination_detection_rate
human_review_queue_depth
conversion_rate_by_description_version
cost_per_sku_usd

Deployment Variants

🚀

Startup Architecture

Fast to deploy, cost-efficient, scales to 100 competitors

Infrastructure

Single AWS region (us-east-1)
Serverless (Lambda + API Gateway)
Managed services (RDS, ElastiCache, S3)
Claude API direct calls
Basic monitoring (CloudWatch)
Focus on speed to market
Manual review for all outputs initially
No multi-region, no custom VPC
Cost: $200-500/mo for 100 SKUs/day
Can scale to 1K SKUs/day without major changes

Risks & Mitigations

⚠️ LLM hallucinations publish false product claims

Medium

✓ Mitigation: 4-layer detection: confidence scores → spec cross-check → GPT-4 fact-check → human review. Block publication if any layer fails.

⚠️ Cost overruns from excessive LLM API usage

High

✓ Mitigation: Set hard budget limits ($0.10/SKU max), downgrade to cheaper models if exceeded, alert ops team at 80% threshold.

⚠️ Data breach exposes customer PII or payment data

Low

✓ Mitigation: Encrypt all data at rest (AES-256), in transit (TLS 1.3), redact PII before LLM processing, audit logs for 7 years, annual penetration testing.

⚠️ CMS integration breaks after platform update

Medium

✓ Mitigation: Version-locked SDKs, adapter pattern for multi-CMS support, automated integration tests, fallback to manual queue if API fails.

⚠️ Model drift degrades quality over time

High

✓ Mitigation: Monitor quality metrics weekly (rolling 7-day window), alert if accuracy drops >3%, auto-rollback to previous model version, quarterly retraining.

⚠️ Vendor lock-in to single LLM provider

Medium

✓ Mitigation: Multi-provider strategy (Claude + GPT), abstraction layer for easy switching, test failover monthly, negotiate volume discounts.

⚠️ Compliance violations (PCI, GDPR, CCPA)

Low

✓ Mitigation: Annual compliance audits (SOC 2, PCI-DSS), data residency controls, automated PII redaction, legal review of all policies, incident response plan.

🧬

Evolution Roadmap

Progressive transformation from MVP to scale

🌱
Phase 10-3 months

Phase 1: MVP (0-3 months)

1
Launch with 100 SKUs/day capacity
2
Manual review for all outputs
3
Basic monitoring and alerting
4
Single CMS integration (Shopify)
Complexity Level
🌿
Phase 23-6 months

Phase 2: Scale (3-6 months)

1
Scale to 1K SKUs/day
2
Reduce human review to 20%
3
Add SEO and Translation agents
4
Multi-CMS support (Shopify + Magento)
Complexity Level
🌳
Phase 36-12 months

Phase 3: Enterprise (6-12 months)

1
Scale to 10K+ SKUs/day
2
Multi-region deployment
3
99.99% uptime SLA
4
PCI-DSS Level 1 compliance
Complexity Level
🚀Production Ready
🏗️

Complete Systems Architecture

9-layer architecture from presentation to security

1
🌐

Presentation

4 components

Admin Dashboard (React)
Review Queue UI
Analytics Dashboard
Mobile App (React Native)
2
⚙️

API Gateway

4 components

Load Balancer (ALB/NGINX)
Rate Limiter (Redis)
Auth Middleware (OIDC)
API Gateway (Kong/Apigee)
3
💾

Agent Layer

6 components

Planner Agent
Executor Agent
Evaluator Agent
Guardrail Agent
SEO Agent
Translation Agent
4
🔌

ML Layer

5 components

Feature Store (Feast)
Model Registry (MLflow)
Inference Service (TorchServe)
Evaluation Service
Prompt Store (DynamoDB)
5
📊

Integration

4 components

CMS Adapter (Shopify/Magento)
PIM Connector
Inventory Service
Analytics Connector (GA4)
6
🌐

Data

4 components

PostgreSQL (products, metadata)
Redis (cache, queue)
S3 (images, logs)
Vector DB (Pinecone/Weaviate)
7
⚙️

External

4 components

LLM APIs (Claude/GPT)
Translation API (DeepL)
Image Analysis (AWS Rekognition)
Payment Gateway (Stripe)
8
💾

Observability

4 components

Metrics (Prometheus)
Logs (CloudWatch/ELK)
Traces (Jaeger)
Dashboards (Grafana)
9
🔌

Security

4 components

IAM/RBAC
Secrets Manager (AWS KMS)
Audit Log (CloudTrail)
WAF (CloudFlare)
🔄

Request Flow - Single SKU Description Generation

Automated data flow every hour

Step 0 of 9
UserAPI GatewayPlanner AgentExecutor AgentEvaluator AgentGuardrail AgentCMSPOST /generate {sku, lang}plan(sku, lang)execute(plan) → fetch data, generate textgenerate_description(features)evaluate(description)check_safety(description)approved → save to DBPOST /products/{sku}/description200 OK {description, metadata}

Data Flow - End-to-End

From product data ingestion to CMS publication

1
Product Catalog0s
New SKU added or updatedRaw product data (specs, images, category)
2
Data Ingestion2s
Fetch from PIM, enrich with competitor dataEnriched product features
3
Feature Store3s
Compute embeddings, stats, historical metricsFeature vectors
4
Planner Agent3.2s
Create execution plan, select toolsPlan JSON
5
Executor Agent5.2s
Generate description via LLMRaw description text
6
Evaluator Agent5.7s
Score quality, detect hallucinationsQuality metrics
7
Guardrail Agent5.9s
PII detection, policy checksSafety report
8
SEO Agent6.0s
Optimize keywords, readabilitySEO-optimized text
9
Translation Agent7.5s
Translate to target languagesMulti-language descriptions
10
Review Queue7.5s
Route low-confidence to human reviewReview task (if needed)
11
CMS Adapter8.0s
Publish to Shopify/MagentoPublished description
12
AnalyticsOngoing
Track conversion, engagementPerformance metrics
1
Volume
0-100 SKUs/day
Pattern
Monolith + Serverless
🏗️
Architecture
Single API server
Claude API direct calls
PostgreSQL
Redis cache
Cost & Performance
$200/mo
per month
5-8s p95
2
Volume
100-1K SKUs/day
Pattern
Queue + Workers
🏗️
Architecture
API server
Message queue (SQS/Redis)
Worker processes (Lambda/ECS)
PostgreSQL + read replicas
Redis cache + queue
Cost & Performance
$800/mo
per month
3-5s p95
3
Volume
1K-10K SKUs/day
Pattern
Multi-Agent Orchestration
🏗️
Architecture
Load balancer (ALB)
Agent framework (LangGraph)
Message bus (Kafka/SQS)
Serverless agents (Lambda)
Managed DB (RDS/Aurora)
Vector DB (Pinecone)
Feature store (Feast)
Cost & Performance
$3K/mo
per month
2-4s p95
Recommended
4
Volume
10K-100K SKUs/day
Pattern
Enterprise Multi-Region
🏗️
Architecture
Global load balancer
Container orchestration (K8s/ECS)
Event streaming (Kafka)
Multi-LLM failover
Replicated DB (Aurora Global)
Distributed cache (ElastiCache)
Multi-region feature store
Real-time ML inference
Cost & Performance
$15K+/mo
per month
1-3s p95

Key Integrations

CMS (Shopify/Magento)

Protocol: REST API + Webhooks
Listen for product.created webhook
Generate description
POST /products/{id}/description
Receive confirmation

PIM (Product Information Management)

Protocol: GraphQL API
Query product by SKU
Fetch attributes, specs, images
Enrich with competitor data
Return enriched features

Inventory System

Protocol: REST API
Check stock levels
Adjust description urgency (low stock → prioritize)
Update availability status

Analytics (Google Analytics 4)

Protocol: Measurement Protocol
Track description_generated event
Send conversion data (add_to_cart, purchase)
Correlate with description version

Payment Gateway (Stripe)

Protocol: REST API
Track revenue per product
Correlate with description A/B test
Calculate ROI per description

Security & Compliance

Failure Modes & Fallbacks

FailureFallbackImpactSLA
LLM API down (Claude/GPT)Switch to backup LLM (GPT → Claude or vice versa)Slight quality variance, 10% slower99.9%
Low-confidence generation (<70% quality score)Route to human review queueDelayed publication (1-4 hours)100% accuracy maintained
Hallucination detectedBlock publication, regenerate with stricter prompt2x latency for affected SKU<1% hallucination rate
CMS API timeout (Shopify/Magento)Retry 3x with backoff, then queue for laterDelayed publication (up to 15min)99.5%
Database unavailable (PostgreSQL)Switch to read replica for reads, queue writesRead-only mode, write latency +5min99.9%
PII detected in outputAuto-redact, log incident, block publicationRegeneration required100% PII protection
Cost budget exceeded (>$0.10/SKU)Downgrade to cheaper LLM or use templateLower quality, faster generationCost control maintained

RAG vs Fine-Tuning

Product data changes daily (RAG ideal). Brand voice is stable (fine-tuning ideal). Combine for best of both.
✅ RAG (Chosen)
Cost: $200/mo (Pinecone)
Update: Real-time
How: Vector DB with product embeddings
❌ Fine-Tuning
Cost: $2K one-time + $100/mo inference
Update: Quarterly
How: Fine-tune GPT-3.5 on 5K brand-approved descriptions
Implementation: RAG retrieves top 5 similar products → Fine-tuned model generates in brand voice → Evaluator checks consistency

Hallucination Detection

LLMs invent fake product features (e.g., 'waterproof' when not)
L1
Confidence scores (<0.7 = flag)
L2
Cross-reference product spec database
L3
GPT-4 fact-checking (Does description match specs?)
L4
Human review for flagged items
0.3% hallucination rate in production, 100% caught before publication

Evaluation Framework

Factual Accuracy
99.2%target: 99%+
Brand Voice Alignment
0.92target: 0.90+ cosine similarity
SEO Score
83target: 80+/100
Conversion Rate Lift
+7.2%target: +5%
Hallucination Rate
0.3%target: <1%
Testing: Shadow mode: 5K SKUs parallel with manual, then gradual rollout (10% → 50% → 100%)

Dataset Curation

1
Collect: 50K existing descriptions - Scrape from CMS, deduplicate
2
Clean: 40K usable - Remove duplicates, low-quality, outdated
3
Label: 10K labeled - ($$25K)
4
Augment: +5K synthetic - GPT-4 generates variations, human validates
15K high-quality training examples (Cohen's Kappa: 0.89 inter-annotator agreement)

Agentic RAG

Agents iteratively retrieve based on reasoning, not one-shot
Product is 'running shoes' → RAG retrieves similar shoes → Agent reasons 'need cushioning tech details' → RAG retrieves cushioning specs → Agent reasons 'need competitor comparison' → RAG retrieves competitor data → Final description with full context
💡 Multi-hop reasoning, richer context, 15% better quality vs single-shot RAG

Model Distillation

Tech Stack Summary

LLMs
Claude 3.5 Sonnet (primary), GPT-4 (fallback), GPT-3.5 fine-tuned (cost-optimized)
Orchestration
LangGraph (agent framework), Temporal (workflow engine), Celery (task queue)
Database
PostgreSQL (primary), Redis (cache + queue), Pinecone (vector DB)
Queue
SQS (AWS), Kafka (enterprise), Redis (startup)
Compute
Lambda (serverless agents), ECS (containerized workers), K8s (enterprise)
Monitoring
Prometheus (metrics), Grafana (dashboards), Jaeger (traces), Sentry (errors)
Security
AWS KMS (encryption), Secrets Manager (secrets), WAF (CloudFlare), Comprehend (PII detection)
ML Platform
Feast (feature store), MLflow (model registry), TorchServe (inference), Weights & Biases (experiment tracking)
Integration
Shopify/Magento SDKs, GraphQL (PIM), REST APIs (inventory, analytics)
🏗️

Need Architecture Review?

We'll audit your e-commerce system, identify bottlenecks, and design a scalable multi-agent architecture.

©

2026 Randeep Bhatia. All Rights Reserved.

No part of this content may be reproduced, distributed, or transmitted in any form without prior written permission.