← Wednesday's Workflows

Product Description System Architecture πŸ—οΈ

From 100 to 100,000 products/day with SEO optimization

June 26, 2025
πŸ›’ E-commerceπŸ—οΈ ArchitectureπŸ“Š ScalableπŸ” SEO-First

From prompts to production content pipeline.

Monday: 3 prompts for product descriptions. Tuesday: automated generation code. Wednesday: team workflows for content ops. Thursday: complete technical architecture. Agents, ML pipeline, SEO optimization, and CMS integration for 100K+ products daily.

Key Assumptions

  • β€’Catalog size: 1K-500K SKUs, growing 5-10% monthly
  • β€’Update frequency: New products daily, refreshes quarterly
  • β€’SEO requirements: Target keywords, readability scores, meta tags
  • β€’Brand consistency: Voice guidelines, prohibited terms, templates
  • β€’Integration: Shopify/Magento/BigCommerce or custom CMS API

System Requirements

Functional

  • Generate product descriptions from attributes (title, specs, images)
  • SEO optimization: keyword density, meta descriptions, alt text
  • Brand voice enforcement: tone, style, prohibited words
  • Bulk processing: 1K+ products in single batch
  • CMS integration: Push to Shopify, Magento, or custom API
  • A/B testing: Multiple variants per product
  • Quality scoring: Readability, uniqueness, keyword coverage

Non-Functional (SLOs)

latency p95 ms3000
freshness min60
availability percent99.5
quality score min85

πŸ’° Cost Targets: {"per_product_usd":0.02,"per_batch_1k_usd":15,"monthly_infra_usd":500}

Agent Layer

planner

L3

Decompose product description task into steps: template selection, content generation, SEO optimization, quality check

πŸ”§ template_selector, keyword_research_api, brand_policy_checker

⚑ Recovery: If no template found β†’ fallback to generic template, If keyword API fails β†’ use cached keywords from last 24h

executor

L2

Execute the plan: generate description, apply template, call LLM, format output

πŸ”§ openai_api, template_engine, image_analyzer

⚑ Recovery: If LLM timeout β†’ retry 3x with exponential backoff, If generation fails β†’ queue for human review

seo_agent

L3

Optimize description for search: keyword density, readability, meta tags, structured data

πŸ”§ semrush_api, readability_scorer, keyword_density_analyzer

⚑ Recovery: If SEMrush API down β†’ use cached keyword data, If density too low β†’ regenerate with keyword boost

evaluator

L4

Validate output quality: brand voice, readability, uniqueness, keyword coverage

πŸ”§ plagiarism_checker, brand_voice_classifier, readability_api

⚑ Recovery: If score < 80 β†’ trigger regeneration with feedback, If plagiarism detected β†’ block and flag

guardrail

L4

Policy enforcement: prohibited terms, legal compliance, safety filters

πŸ”§ prohibited_terms_checker, legal_compliance_api, toxicity_detector

⚑ Recovery: If violation found β†’ block publish, queue for review, If legal API down β†’ default to conservative blocking

template_agent

L2

Select and populate category-specific templates with product data

πŸ”§ template_db, variable_extractor, style_matcher

⚑ Recovery: If no category template β†’ use generic, If missing variables β†’ mark as optional

ML Layer

Feature Store

Update: Daily batch + real-time on product update

  • β€’ product_attribute_embeddings
  • β€’ historical_conversion_rate
  • β€’ category_avg_quality_score
  • β€’ brand_voice_vector
  • β€’ competitor_keyword_density
  • β€’ seasonal_keyword_trends

Model Registry

Strategy: Blue-green deployment with 10% canary

  • β€’ gpt-4o-mini
  • β€’ brand_voice_classifier
  • β€’ readability_scorer
  • β€’ keyword_ranker

Observability

Metrics

  • πŸ“Š generation_success_rate
  • πŸ“Š llm_latency_p95_ms
  • πŸ“Š quality_score_avg
  • πŸ“Š approval_rate_percent
  • πŸ“Š regeneration_rate_percent
  • πŸ“Š cost_per_product_usd
  • πŸ“Š seo_score_avg
  • πŸ“Š keyword_density_avg

Dashboards

  • πŸ“ˆ ops_dashboard
  • πŸ“ˆ ml_dashboard
  • πŸ“ˆ cost_dashboard
  • πŸ“ˆ quality_dashboard

Traces

βœ… Enabled

Deployment Variants

πŸš€ Startup

Infrastructure:

  • β€’ Vercel/Netlify for frontend
  • β€’ Serverless functions (Lambda/Cloud Run)
  • β€’ Managed PostgreSQL (Supabase/Neon)
  • β€’ Redis Cloud
  • β€’ OpenAI API (pay-as-you-go)
  • β€’ Shopify integration (OAuth app)

β†’ Quick to ship, low upfront cost

β†’ Auto-scaling with serverless

β†’ Managed services reduce ops burden

β†’ Cost: ~$100-500/mo depending on volume

🏒 Enterprise

Infrastructure:

  • β€’ Kubernetes (EKS/GKE) for control plane
  • β€’ VPC isolation + private subnets
  • β€’ Aurora PostgreSQL (multi-AZ)
  • β€’ Redis Cluster (ElastiCache)
  • β€’ BYO LLM (self-hosted or Azure OpenAI)
  • β€’ KMS/HSM for encryption
  • β€’ SSO/SAML integration
  • β€’ Multi-region deployment
  • β€’ Dedicated support + SLA

β†’ Full control over infrastructure

β†’ Data residency compliance (GDPR, SOC2)

β†’ Private networking, no public internet

β†’ Cost: $5K-20K/mo depending on scale

πŸ“ˆ Migration: Start with startup stack. Migrate to enterprise when: (1) >10K products/day, (2) Need data residency, (3) Require 99.9% SLA, (4) Custom LLM or private deployment. Migration path: Lift-and-shift to containers β†’ VPC setup β†’ Multi-region replication β†’ BYO LLM.

Risks & Mitigations

⚠️ LLM generates off-brand content

Medium

βœ“ Mitigation: Multi-layer validation: brand voice classifier (94% accuracy), human review queue for low-confidence outputs, regular fine-tuning on approved content

⚠️ Hallucinated product features

Medium

βœ“ Mitigation: Attribute validation against product DB, fact-checking layer, confidence scoring, 100% human review for high-value products (>$500)

⚠️ SEO keyword stuffing (Google penalty)

Low

βœ“ Mitigation: Keyword density limits (2-3%), readability scoring (Flesch-Kincaid >60), A/B test against human-written baseline

⚠️ API rate limits (OpenAI, Shopify)

High

βœ“ Mitigation: Multi-LLM failover, exponential backoff, queue-based retry, rate limiter (10 req/sec), caching for 24h

⚠️ Cost overruns (LLM API costs)

Medium

βœ“ Mitigation: Cost tracking per product, alerts at $1K/day, auto-throttle at $5K/day, monthly budget caps, cheaper models for low-priority products

⚠️ Data privacy violation (PII in descriptions)

Low

βœ“ Mitigation: PII detection + redaction, no customer data in prompts, audit logs (2yr retention), privacy-by-design

⚠️ Competitor trademark infringement

Medium

βœ“ Mitigation: Trademark database check, prohibited terms blocklist (500+ brands), legal review for high-risk categories, guardrail agent enforcement

Evolution Roadmap

1

Phase 1: MVP (0-3 months)

Q1 2025
  • β†’ Launch with 3 core agents (Executor, Evaluator, Guardrail)
  • β†’ Shopify integration only
  • β†’ 100-500 products/day capacity
  • β†’ 90% quality score target
2

Phase 2: Scale (3-6 months)

Q2 2025
  • β†’ Add Planner, SEO, Template agents
  • β†’ Multi-platform support (Magento, BigCommerce)
  • β†’ 1K-10K products/day capacity
  • β†’ A/B testing framework
  • β†’ 95% approval rate
3

Phase 3: Enterprise (6-12 months)

Q3-Q4 2025
  • β†’ 10K-100K products/day capacity
  • β†’ Multi-region deployment
  • β†’ 99.9% SLA
  • β†’ Custom LLM support
  • β†’ Enterprise security (SSO, RBAC, audit)

Complete Systems Architecture

End-to-end layer view

Presentation
Admin Dashboard
Bulk Upload UI
Preview Portal
API Gateway
Load Balancer
Rate Limiter
Auth (API Keys)
Agent Layer
Planner Agent
Executor Agent
Evaluator Agent
Guardrail Agent
SEO Agent
Template Agent
ML Layer
Feature Store
Model Registry
Prompt Store
Evaluation Engine
Integration
Shopify Adapter
SEMrush Connector
Image Analyzer
Data
PostgreSQL (products)
Redis (cache)
S3 (assets)
External
OpenAI API
Shopify API
SEMrush API
Observability
Metrics (Prometheus)
Logs (CloudWatch)
Traces (Jaeger)
Security
API Auth
Secrets Manager
Audit Logs

Sequence Diagram - Product Description Flow

UserAPIPlannerExecutorSEO AgentEvaluatorShopifyPOST /generate {product_id, attributes}plan_task(product_data)execute_plan(steps, context)optimize_keywords(description_draft)optimized_description + meta_tagsvalidate(description, quality_gates)quality_score: 92, approvedPUT /products/{id} (description, meta)200 OK, product_updated

E-commerce Product Description - Agent Orchestration

7 Components
[RPC]Product data + requirements[Response]Execution plan[RPC]Category + product attributes[Response]Selected template[RPC]Template + product data[Response]Generated description[RPC]Raw description + keywords[Response]Optimized content + metadata[RPC]Content for validation[Response]Compliance status + filtered content[RPC]Final description + criteria[Response]Quality scores + approvalOrchestrator4 capabilitiesPlanner Agent4 capabilitiesTemplate Agent4 capabilitiesExecutor Agent4 capabilitiesSEO Agent4 capabilitiesGuardrail Agent4 capabilitiesEvaluator Agent4 capabilities
HTTP
REST
gRPC
Event
Stream
WebSocket

E-commerce Product Description - External Integrations

9 Components
[REST]Product data feed[HTTP]Generation prompts[Response]Generated content[REST]Published descriptions + metadata[HTTP]Batch requests + config[WebSocket]Real-time status updates[Event]Product description jobs[Event]Job assignments[S3]Description archives[REST]Performance feedback[Event]Metrics + logsCore System4 capabilitiesProduct Catalog DB4 capabilitiesCMS Platform4 capabilitiesLLM Service4 capabilitiesSEO Analytics4 capabilitiesAdmin Dashboard4 capabilitiesMessage Queue4 capabilitiesCDN Storage4 capabilitiesMonitoring Service4 capabilities
HTTP
REST
gRPC
Event
Stream
WebSocket

Data Flow - Product to Published Description

End-to-end flow in 4.5 seconds

1
User0ms
Submits product data β†’ SKU, title, attributes, images
2
API Gateway50ms
Validates request, rate limits β†’ Authenticated request
3
Planner Agent500ms
Plans task, selects template, fetches keywords β†’ Execution plan + target keywords
4
Template Agent200ms
Populates category template β†’ Structured template with variables
5
Executor Agent2000ms
Generates description via LLM β†’ Draft description (300-500 words)
6
SEO Agent800ms
Optimizes keywords, creates meta tags β†’ Optimized description + meta tags
7
Evaluator Agent600ms
Validates quality, brand voice β†’ Quality score: 92, approved
8
Guardrail Agent300ms
Checks prohibited terms, legal compliance β†’ No violations, approved
9
Shopify Adapter500ms
Formats for Shopify API, publishes β†’ Product updated in CMS
10
Audit Logger50ms
Logs transaction β†’ Audit trail created

Scaling Patterns

Volume
0-100 products/day
Pattern
Synchronous API
Architecture
β€’ Single API server
β€’ Direct LLM calls
β€’ PostgreSQL
β€’ Redis cache
Cost
$100/mo
4-5s
Volume
100-1K products/day
Pattern
Queue + Workers
Architecture
β€’ API server
β€’ Redis queue
β€’ 3-5 worker processes
β€’ PostgreSQL
β€’ S3 for assets
Cost
$400/mo
3-4s
Volume
1K-10K products/day
Pattern
Multi-Agent Orchestration
Architecture
β€’ Load balancer
β€’ LangGraph orchestrator
β€’ SQS message bus
β€’ Lambda functions
β€’ RDS + ElastiCache
β€’ S3 + CloudFront
Cost
$1500/mo
2-3s
Volume
10K-100K products/day
Pattern
Enterprise Multi-Region
Architecture
β€’ Global load balancer
β€’ Kubernetes cluster
β€’ Kafka event streaming
β€’ Multi-LLM fallback
β€’ Aurora PostgreSQL (multi-region)
β€’ Redis Cluster
β€’ CDN for assets
Cost
$5000+/mo
1-2s

Key Integrations

Shopify API

Protocol: REST + GraphQL
Fetch product data via GET /products/{id}
Generate description
Update via PUT /products/{id} with description + meta
Webhook notification on success

SEMrush API

Protocol: REST
Query keyword difficulty + search volume
Get competitor keyword analysis
Return top 10 target keywords
Cache results for 24h

OpenAI API

Protocol: REST
Send prompt + product data
Stream response (SSE)
Parse JSON output
Handle rate limits (exponential backoff)

Image Analysis (AWS Rekognition)

Protocol: AWS SDK
Upload product image to S3
Call DetectLabels API
Extract features (color, style, objects)
Use in description generation

Security & Compliance

Failure Modes & Fallbacks

FailureFallbackImpactSLA
OpenAI API downSwitch to backup LLM (Anthropic Claude) or queue for retryDegraded performance, 10% slower99.5%
SEMrush API timeoutUse cached keywords from last 24h or generic keywordsSlightly lower SEO optimization99.0%
Quality score < 80Regenerate with feedback or queue for human reviewDelayed publication, maintains quality95% auto-approval
Guardrail detects policy violationBlock publication, flag for reviewSafety maintained, no bad content published100% enforcement
Shopify API rate limitExponential backoff, queue remaining productsDelayed sync, eventual consistency99.0%
Database connection lossRead from replica, queue writesRead-only mode for up to 5 minutes99.9%
Template not found for categoryUse generic fallback templateLess customized output100% coverage

Multi-Agent Architecture

6 specialized agents collaborate autonomously

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Planner    β”‚ ← Orchestrates all agents
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
   β”Œβ”€β”€β”€β”΄β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚        β”‚          β”‚          β”‚          β”‚
β”Œβ”€β”€β–Όβ”€β”€β”  β”Œβ”€β–Όβ”€β”€β”  β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”  β”Œβ”€β”€β–Όβ”€β”€β”€β”  β”Œβ”€β”€β–Όβ”€β”€β”€β”€β”
β”‚Temp β”‚  β”‚Execβ”‚  β”‚   SEO   β”‚  β”‚Eval  β”‚  β”‚Guard  β”‚
β”‚Agentβ”‚  β”‚utorβ”‚  β”‚  Agent  β”‚  β”‚uator β”‚  β”‚rail   β”‚
β””β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚       β”‚          β”‚          β”‚          β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
                   β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β”
                   β”‚Shopify β”‚
                   β”‚Adapter β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Agent Collaboration Flow

1
Planner
Receives product data, plans execution: template selection β†’ generation β†’ SEO β†’ quality check
2
Template Agent
Selects category-specific template, populates with product attributes
3
Executor
Generates description via LLM using populated template β†’ Returns draft
4
SEO Agent
Optimizes keywords, creates meta tags, structured data β†’ Returns optimized version
5
Evaluator
Validates quality (brand voice, readability, uniqueness) β†’ Returns score + approval
6
Guardrail
Checks prohibited terms, legal compliance β†’ Blocks if violation, else approves
7
Planner
Decision: Approved? β†’ Publish to CMS : Regenerate with feedback or queue for human review

Reactive Agent

Template Agent - Selects template based on category
Autonomy: LowStateless

Reflexive Agent

Executor Agent - Generates based on template + context
Autonomy: MediumReads context

Deliberative Agent

SEO Agent - Plans keyword strategy, optimizes iteratively
Autonomy: HighStateful

Orchestrator Agent

Planner - Coordinates all agents, makes routing decisions
Autonomy: HighestFull state management

Levels of Autonomy

L1
Tool
Human calls, agent responds
β†’ Monday's prompts
L2
Chained Tools
Sequential execution
β†’ Tuesday's code
L3
Agent
Makes decisions, can loop
β†’ SEO Agent iterates on keywords
L4
Multi-Agent
Agents collaborate autonomously
β†’ This system

Advanced ML/AI Patterns

Production ML engineering beyond basic API calls

RAG vs Fine-Tuning

Product catalogs change daily. RAG allows real-time updates without retraining. Fine-tuning would require weekly retraining ($$$).
βœ… RAG (Chosen)
Cost: $200/mo (vector DB)
Update: Real-time
How: Embed product attributes, retrieve similar descriptions
❌ Fine-Tuning
Cost: $2K/mo (training + hosting)
Update: Weekly batch
How: Train on 10K+ product descriptions
Implementation: Pinecone vector DB with product embeddings. Retrieve top 5 similar products during generation for style consistency.

Hallucination Detection

LLMs hallucinate product features (fake specs, wrong materials)
L1
Confidence scoring (flag if < 0.8)
L2
Attribute validation against product DB
L3
Fact-checking via secondary LLM call
L4
Human review for low-confidence outputs
Hallucination rate: 0.5%, 100% caught before publication

Evaluation Framework

Quality Score
89.3target: 85+
Brand Voice Match
93.1%target: 90%+
SEO Score
84.7target: 80+
Human Approval Rate
96.8%target: 95%+
Conversion Lift (A/B test)
+7.2%target: +5%
Testing: Shadow mode: 500 products parallel with human-written descriptions, A/B test for 2 weeks

Dataset Curation

1
Collect: 50K product descriptions - Scrape from top e-commerce sites
2
Clean: 42K usable - Remove duplicates, filter low quality
3
Label: 42K labeled - ($$21K)
4
Augment: +8K synthetic - Generate edge cases (missing specs, unusual products)
β†’ 50K high-quality training examples (Inter-annotator agreement: 0.89)

Agentic RAG

Agent iteratively retrieves based on reasoning
Product has 'organic cotton' β†’ RAG retrieves sustainability claims β†’ Agent reasons 'need certifications' β†’ RAG retrieves GOTS/OEKO-TEX data β†’ Description includes verified sustainability info
πŸ’‘ Not one-shot retrieval. Agent decides what context is needed for accurate, compliant descriptions.

Multi-Variant Generation

Tech Stack Summary

LLMs
GPT-4o-mini (primary), Claude 3.5 Sonnet (fallback)
Orchestration
LangGraph (agent framework)
Database
PostgreSQL (products, descriptions), Redis (cache, queue)
Vector DB
Pinecone or Weaviate
Queue
Redis Queue (startup), SQS (scale), Kafka (enterprise)
Compute
Lambda/Cloud Run (serverless), ECS/K8s (enterprise)
CMS Integration
Shopify SDK, Magento REST API, custom adapters
SEO Tools
SEMrush API, Ahrefs API (optional)
Monitoring
Prometheus + Grafana, CloudWatch, Datadog
Security
AWS Secrets Manager, KMS, WAF
πŸ—οΈ

Need Architecture Review?

We'll audit your content generation system, identify bottlenecks, and show you how to scale to 100K products/day.