← Wednesday's Workflows

Brand Storytelling System Architecture 🏗️

From 100 to 10,000 brand stories/month with AI consistency

July 31, 2025
📝 Marketing🏗️ Architecture📊 Scalable🎨 Brand Voice

From prompts to production storytelling system.

Monday: 3 core prompts for brand voice, content generation, and quality control. Tuesday: automated story generator. Wednesday: team workflows for marketers, writers, and approvers. Thursday: complete technical architecture. Multi-agent system, voice modeling, CMS integration, and quality assurance for 10,000+ stories monthly.

Key Assumptions

  • Generate 100-10,000 brand stories per month across channels
  • Voice model trained on 500-5,000 existing brand assets
  • CMS integration required (WordPress, Contentful, Sanity, etc.)
  • Human review for high-stakes content (exec comms, PR)
  • Multi-channel output (blog, social, email, ads)
  • Brand guidelines stored as structured data (tone, vocabulary, rules)
  • SOC2 compliance for enterprise customers

System Requirements

Functional

  • Voice model captures brand personality and tone
  • Content generator produces multi-format stories (blog, social, email)
  • Quality control validates brand guidelines and factual accuracy
  • CMS integration auto-publishes or queues for review
  • Review workflow routes content to appropriate approvers
  • Version control tracks content iterations and approvals
  • Analytics track performance and voice consistency metrics

Non-Functional (SLOs)

latency p95 ms8000
freshness min5
availability percent99.5
voice consistency score0.92
quality score0.95

💰 Cost Targets: {"per_story_usd":0.15,"per_1000_stories_usd":150,"infrastructure_monthly_usd":500}

Agent Layer

planner

L4

Decomposes content request into tasks, selects tools, routes workflow

🔧 VoiceModelAgent.load(), ContentGeneratorAgent.generate(), QualityControlAgent.validate(), CMSAdapter.publish()

⚡ Recovery: If voice model unavailable → use generic brand guidelines, If generation fails → retry 2x with fallback prompt, If CMS down → queue for manual publish

voice_model

L2

Loads brand-specific voice profile, injects tone/vocabulary context

🔧 ModelRegistry.getVoiceModel(brand_id), VectorDB.retrieveExamples(format), FeatureStore.getBrandMetrics()

⚡ Recovery: If model not found → fallback to generic brand guidelines, If vector DB slow → use cached examples

content_generator

L3

Generates brand-consistent content using voice model + prompt

🔧 OpenAI.createCompletion(voice_prompt + topic), PromptStore.getTemplate(format), FeatureStore.getAudienceProfile()

⚡ Recovery: If LLM timeout → retry with shorter prompt, If low quality → regenerate with stricter instructions, If API down → queue for batch processing

quality_control

L3

Validates content against brand guidelines, checks factual accuracy

🔧 GuidelineValidator.check(content, rules), FactChecker.verify(claims), ReadabilityScorer.analyze(text)

⚡ Recovery: If fact-check API down → flag for manual review, If low score (<0.8) → trigger regeneration, If critical violation → block publish

guardrail

L2

Safety checks, PII redaction, policy enforcement

🔧 PIIDetector.scan(content), PolicyEngine.enforce(rules), ToxicityClassifier.score(text)

⚡ Recovery: If PII detected → auto-redact and flag, If policy violation → block publish, If toxicity high → reject and alert

evaluator

L3

Post-generation quality assessment, performance tracking

🔧 EvaluationEngine.score(content, metrics), DriftDetector.analyze(scores_over_time), AnalyticsAPI.getEngagement(content_id)

⚡ Recovery: If metrics unavailable → use quality score only, If drift detected → trigger model retraining alert

ML Layer

Feature Store

Update: Daily batch + real-time streaming for engagement metrics

  • brand_voice_embedding (768-dim)
  • historical_content_performance
  • audience_demographics
  • topic_trends
  • format_preferences
  • tone_consistency_score

Model Registry

Strategy: Semantic versioning (major.minor.patch), blue-green deployment

  • VoiceModel
  • QualityClassifier
  • ToxicityDetector

Observability

Metrics

  • 📊 story_generation_latency_p95_ms
  • 📊 voice_consistency_score
  • 📊 quality_score_distribution
  • 📊 llm_cost_per_story_usd
  • 📊 human_approval_rate_percent
  • 📊 guideline_violation_rate
  • 📊 cms_publish_success_rate
  • 📊 agent_failure_rate
  • 📊 cache_hit_rate
  • 📊 drift_score

Dashboards

  • 📈 ops_dashboard
  • 📈 ml_dashboard
  • 📈 cost_dashboard
  • 📈 quality_dashboard
  • 📈 user_activity_dashboard

Traces

✅ Enabled

Deployment Variants

🚀 Startup

Infrastructure:

  • Serverless (Lambda/Cloud Functions)
  • Managed PostgreSQL (RDS/Cloud SQL)
  • Managed Redis (ElastiCache/MemoryStore)
  • OpenAI API (no fine-tuning initially)
  • S3/GCS for media storage
  • CloudWatch/Stackdriver for observability

Single region deployment

Synchronous processing (simple)

No custom fine-tuning (use GPT-4 with prompts)

Basic RBAC (3 roles)

Cost: $50-200/month for 100-1K stories

Deploy in 1-2 weeks

🏢 Enterprise

Infrastructure:

  • Kubernetes (EKS/GKE/AKS)
  • Multi-region PostgreSQL with read replicas
  • Redis cluster (multi-AZ)
  • Fine-tuned GPT-4 + multi-LLM failover
  • Vector DB cluster (Pinecone/Weaviate)
  • Kafka for event streaming
  • VPC with private subnets
  • BYO KMS/HSM for encryption
  • SAML/OIDC with SSO
  • Dedicated Prometheus + Grafana
  • Splunk or ELK for centralized logging

Multi-region active-active

Private networking (no public IPs)

Data residency controls (US/EU/Asia)

Advanced RBAC (10+ roles, custom policies)

SOC2 Type II compliant

99.9% SLA with disaster recovery

Cost: $3,000-10,000/month for 10K+ stories

Deploy in 2-3 months

📈 Migration: Start with startup stack. At 1K stories/month, migrate to queue-based. At 5K, introduce Kubernetes and multi-region. At 10K, add Kafka and full enterprise features. Incremental migration with zero downtime.

Risks & Mitigations

⚠️ Voice model drift - Brand evolves, model becomes stale

High (quarterly brand updates)

✓ Mitigation: Automated drift detection (weekly). Retrain monthly or when drift >5%. Maintain 10K+ training examples.

⚠️ LLM hallucination - Fake facts, wrong product names

Medium (0.5% rate)

✓ Mitigation: 4-layer detection (confidence, fact-check, consistency, human review). Block publish if critical violation. 99% catch rate.

⚠️ CMS integration failure - API down, auth expired

Low (99% uptime)

✓ Mitigation: Retry logic (3x with backoff). Queue for manual publish. Multi-CMS support (failover to secondary).

⚠️ Cost overrun - LLM API costs spike with volume

Medium (unpredictable usage)

✓ Mitigation: Set cost guardrails ($0.20/story max). Alert if daily spend >$100. Use cheaper models (GPT-3.5) for drafts, GPT-4 for final.

⚠️ PII leakage - Training data contains customer PII

Low (strict data hygiene)

✓ Mitigation: PII detection in training pipeline. Auto-redact before fine-tuning. Audit all training data. No customer data in logs.

⚠️ Quality degradation at scale - More volume = lower quality

Medium (common scaling issue)

✓ Mitigation: Quality score threshold (0.9+). Human review for low scores. Continuous evaluation (weekly quality reports).

⚠️ Vendor lock-in - Dependent on single LLM provider

High (OpenAI primary)

✓ Mitigation: Multi-LLM architecture (GPT-4, Claude, Gemini). Abstract LLM calls behind interface. Test failover monthly.

Evolution Roadmap

1

Phase 1: MVP (0-3 months)

Weeks 1-12
  • Deploy serverless architecture
  • Fine-tune GPT-4 on 1,000 brand assets
  • Integrate with 1 CMS (WordPress)
  • Basic quality control (rule-based)
  • Support 100 stories/month
2

Phase 2: Scale (3-6 months)

Weeks 13-24
  • Migrate to queue-based architecture
  • Fine-tune on 5,000 brand assets
  • Add 2 more CMS integrations (Contentful, Sanity)
  • Advanced quality control (ML classifier)
  • Support 1,000 stories/month
  • Add human review workflow
3

Phase 3: Enterprise (6-12 months)

Weeks 25-52
  • Migrate to Kubernetes multi-region
  • Fine-tune on 10,000 brand assets
  • Multi-LLM failover (GPT-4, Claude, Gemini)
  • Agentic RAG for dynamic context
  • Support 10,000 stories/month
  • SOC2 Type II compliance
  • 99.9% SLA with disaster recovery

Complete Systems Architecture

9-layer view: Presentation to Security

Presentation
Content Dashboard
Review UI
Analytics Portal
Mobile App
API Gateway
Load Balancer
Rate Limiter
Auth Middleware
API Router
Agent Layer
Planner Agent
Voice Model Agent
Content Generator Agent
Quality Control Agent
Guardrail Agent
Evaluator Agent
ML Layer
Feature Store
Model Registry
Voice Model (Fine-tuned GPT)
Quality Classifier
Prompt Store
Evaluation Engine
Integration
CMS Adapter
Style Guide DB
Review Tool API
Analytics API
Data
PostgreSQL (content, metadata)
Vector DB (brand assets)
Redis (cache, queue)
S3 (media, logs)
External
OpenAI API
CMS (WordPress/Contentful)
Style Guide Service
Review Platform
Observability
Metrics (Prometheus)
Logs (CloudWatch)
Traces (Jaeger)
Dashboards (Grafana)
Security
OIDC/SAML Auth
RBAC Engine
KMS (secrets)
Audit Logger
PII Redactor

Sequence Diagram - Story Generation Flow

MarketerAPI GatewayPlanner AgentVoice Model AgentContent GeneratorQuality ControlEvaluatorCMSPOST /generate {topic, format, audience}Route to workflowLoad brand voice profileVoice context + promptDraft content (800 words)Validate guidelines + factsQuality score: 0.94Publish or queue for review200 OK + content URL

Brand Storytelling - Agent Orchestration

6 Components
[RPC]Brand profile request[Response]Voice context[RPC]Generation task + context[Response]Draft content[RPC]Validation request[RPC]Safety check[Response]Quality score + issues[Response]Safety approval[Event]Final content + metadata[Feedback]Performance insightsPlanner Agent4 capabilitiesVoice Model Agent4 capabilitiesContent Generator Agent4 capabilitiesQuality Control Agent4 capabilitiesGuardrail Agent4 capabilitiesEvaluator Agent4 capabilities
HTTP
REST
gRPC
Event
Stream
WebSocket

Brand Storytelling - External Integrations

10 Components
[HTTP]Content requests[WebSocket]Status updates[REST]Published content[Webhook]Campaign triggers[REST]Content assets[Event]Content metadata[REST]Performance data[REST]Brand profiles[REST]Asset requests[Response]Media files[REST]Post content[Webhook]Review requests[Event]Approval decisions[REST]Scheduled requests[REST]Completion statusCore System4 capabilitiesCMS Platform4 capabilitiesBrand Portal4 capabilitiesMarketing Automation4 capabilitiesAnalytics Platform4 capabilitiesDAM System4 capabilitiesSocial Media APIs4 capabilitiesApproval Workflow4 capabilitiesBrand Guidelines DB4 capabilitiesContent Calendar4 capabilities
HTTP
REST
gRPC
Event
Stream
WebSocket

Data Flow

Content request → Published story in 10 seconds

1
Marketer0s
Submits content requestTopic, format, audience
2
API Gateway50ms
Authenticates and routesValidated request
3
Planner Agent100ms
Creates task planWorkflow steps
4
Voice Model Agent500ms
Loads brand voiceVoice context + examples
5
Content Generator5s
Generates draft800-word story
6
Guardrail Agent1s
Safety checksPII-redacted content
7
Quality Control2s
Validates guidelinesQuality score: 0.94
8
Evaluator Agent300ms
Final assessmentOverall score + decision
9
CMS Adapter800ms
Publishes or queuesPublished URL or review link
10
Audit Logger50ms
Records eventAudit trail

Scaling Patterns

Volume
0-100 stories/month
Pattern
Serverless Monolith
Architecture
Single Lambda function
OpenAI API calls
PostgreSQL (managed)
S3 for media
Cost
$50/month
8-12 sec
Volume
100-1,000 stories/month
Pattern
Queue + Workers
Architecture
API server (FastAPI/Express)
Redis queue (Bull/Celery)
Worker pool (3-5 workers)
PostgreSQL + Redis cache
Vector DB (Pinecone/Weaviate)
Cost
$200/month
5-8 sec
Volume
1,000-10,000 stories/month
Pattern
Multi-Agent Orchestration
Architecture
Load balancer (ALB/nginx)
LangGraph orchestrator
Agent pool (auto-scaling)
Message bus (SQS/Kafka)
Multi-model inference (GPT-4 + Claude)
Managed PostgreSQL + Redis cluster
Vector DB cluster
Cost
$800/month
3-5 sec
Volume
10,000+ stories/month
Pattern
Enterprise Multi-Region
Architecture
Kubernetes (EKS/GKE/AKS)
Multi-region deployment
Kafka event streaming
Multi-LLM failover (GPT-4, Claude, Gemini)
Replicated PostgreSQL (read replicas)
Distributed vector DB
CDN for media (CloudFront/Fastly)
Cost
$3,000+/month
2-4 sec

Key Integrations

CMS Integration (WordPress/Contentful/Sanity)

Protocol: REST API + Webhooks
Generate content
Format to CMS schema
POST to CMS API
Receive content ID
Update status in system

OpenAI Fine-tuning

Protocol: OpenAI API
Collect brand assets (5,000+ examples)
Format as JSONL training data
POST to /fine-tunes endpoint
Monitor training job
Deploy fine-tuned model
Update model registry

Style Guide Database

Protocol: Internal API
Fetch brand guidelines (tone, vocabulary, rules)
Cache in Redis (TTL: 1 hour)
Inject into voice context
Update cache on guideline changes

Review Tools (Slack/Asana/Monday.com)

Protocol: Webhooks + REST API
Quality score < threshold → route to review
Create task in review tool
Send notification to approver
Poll for approval status
On approval → publish to CMS

Security & Compliance

Failure Modes & Fallbacks

FailureFallbackImpactSLA
LLM API down (OpenAI outage)Switch to Claude 3.5 Sonnet or Gemini Pro (multi-LLM failover)Degraded voice consistency (no fine-tuned model), but system operational99.5% availability
Voice model returns low confidence (<0.7)Use generic brand guidelines + flag for human reviewQuality maintained, slower throughput99.9% quality
CMS API timeout or downQueue content for retry (3x with backoff), then manual publish queueDelayed publish, eventual consistency99.0% auto-publish
Quality check detects critical guideline violationBlock publish, route to review queue, alert content teamSafety first, no bad content published100% compliance
PII detected in generated contentAuto-redact PII, flag for review, do not publishPrivacy protected, delayed publish100% PII protection
Database unavailable (primary down)Switch to read replica for read operations, queue writesRead-only mode, no new content generation99.9% read availability
Vector DB slow or unavailableUse cached brand examples (Redis), degrade to generic promptsLower voice consistency, slower retrieval99.5% availability

Advanced ML/AI Patterns

Production ML engineering beyond basic LLM calls

RAG vs Fine-Tuning for Voice Modeling

Brand voice is stable (changes quarterly, not daily). Fine-tuning captures tone, vocabulary, and style better than RAG retrieval. RAG good for facts, fine-tuning good for style.
✅ RAG (Chosen)
Cost: $100/mo
Update: Daily
How: Retrieve brand examples, inject into prompt
❌ Fine-Tuning
Cost: $500/mo
Update: Monthly or quarterly
How: Fine-tune GPT-4 on 5,000 brand assets
Implementation: Fine-tune GPT-4 on 5,000 curated brand assets (blog posts, emails, social). Retrain monthly or when voice drift detected (>5% consistency drop). Use RAG for factual content (product specs, recent news).

Hallucination Detection

LLMs hallucinate facts (fake stats, wrong product names, false claims)
L1
Confidence scores - Flag if LLM confidence < 0.7 for factual claims
L2
Fact-checking API - Cross-reference claims against knowledge base
L3
Logical consistency - Check for contradictions within content
L4
Human review - Queue high-stakes content (exec comms, PR) for manual review
0.5% hallucination rate, 99% caught before publish

Evaluation Framework

Voice Consistency
0.94target: 0.92+
Guideline Adherence
97%target: 95%+
Factual Accuracy
99.5%target: 99%+
Readability
Grade 9.2target: Grade 8-10
Human Approval Rate
92%target: 90%+
Testing: Shadow mode: Generate 500 stories in parallel with human writers, compare quality scores and engagement metrics.

Dataset Curation

1
Collect: 10,000 brand assets - Scrape blog, emails, social, ads
2
Clean: 8,000 usable - Remove duplicates, low-quality, off-brand
3
Label: 8,000 labeled - ($$8,000)
4
Augment: +2,000 synthetic - Generate edge cases (new topics, formats)
10,000 high-quality examples for fine-tuning (Inter-rater reliability: Cohen's Kappa 0.89)

Agentic RAG for Dynamic Context

Agent iteratively retrieves based on reasoning, not one-shot
Topic: 'New product launch' → RAG retrieves product specs → Agent reasons 'need customer testimonials' → RAG retrieves testimonials → Agent reasons 'need competitive positioning' → RAG retrieves competitor analysis → Generate content with full context
💡 Not limited to initial retrieval. Agent decides what else it needs to know, retrieves iteratively.

Model Drift Detection

Tech Stack Summary

LLMs
Fine-tuned GPT-4 (primary), Claude 3.5 Sonnet (fallback), Gemini Pro (fallback)
Orchestration
LangGraph or CrewAI
Database
PostgreSQL (content, metadata, audit logs)
Vector DB
Pinecone or Weaviate
Cache/Queue
Redis (cache, queue), RabbitMQ or SQS (message queue)
Compute
Serverless (Lambda/Cloud Functions) for startup, Kubernetes (EKS/GKE) for enterprise
Monitoring
Prometheus + Grafana (metrics), CloudWatch/Stackdriver (logs), Jaeger (traces)
Security
AWS KMS/Azure Key Vault (secrets), Auth0/Okta (OIDC), Casbin (RBAC)
CMS Integration
WordPress REST API, Contentful Management API, Sanity HTTP API
ML Ops
MLflow (model registry), Feast (feature store), Evidently (drift detection)
🏗️

Need Architecture Review?

We'll audit your content system, identify bottlenecks, and show you how to scale to 10,000+ stories/month with brand consistency.