Automate Brand Storytelling - Consistent Voice at Scale | Randeep Bhatia

The Problem

On Monday you tested the 3 prompts in ChatGPT. You saw how brand extraction → validation → generation works. But here's reality: your content team is writing 50+ posts per week across 8 channels. You can't manually review everything. Junior writers drift off-brand. Freelancers sound like robots. Your brand voice guide sits in a PDF no one reads.

4+ hours

Per day reviewing content for brand consistency

60% inconsistent

Content fails brand voice on first draft

Can't scale

Beyond 20 pieces/week with manual review

See It Work

Watch the 3 prompts chain together automatically. This is what you'll build.

Input

The Code

Three levels: start simple, add reliability, then scale to production. Pick where you are.

Level 1: Simple API Calls

Good for: 0-100 pieces/week | Setup time: 30 minutes

# Simple API Calls (0-100 pieces/week)
import openai
import json
from typing import Dict, List

class BrandVoiceGenerator:
    def __init__(self, api_key: str, brand_guide: str):
        self.client = openai.OpenAI(api_key=api_key)
        self.brand_guide = brand_guide
    
    def extract_brand_dna(self, brand_text: str) -> Dict:
        """Step 1: Extract brand voice attributes"""
        prompt = f"""Extract brand voice DNA from this description and format as JSON.
        
Include:
- brand_name
- target_audience (list)
- voice_attributes (tone list, avoid list, positioning)
- key_phrases (list)
- tone_words (list)
- content_patterns (uses_data bool, data_format, sentence_structure, hooks)

Brand description:
{brand_text}

Output valid JSON only."""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        
        return json.loads(response.choices[0].message.content)
    
    def validate_content(self, content: str, brand_dna: Dict) -> Dict:
        """Step 2: Check if content matches brand voice"""
        prompt = f"""Review this content against brand voice guidelines. Identify violations.

Content to review:
{content}

Brand DNA:
{json.dumps(brand_dna, indent=2)}

Return JSON with:
- violations (list of {{issue, severity, why}})
- is_on_brand (boolean)
- confidence_score (0-1)

Be specific about what's wrong and why it violates the brand."""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.2
        )
        
        return json.loads(response.choices[0].message.content)
    
    def generate_content(self, brief: str, brand_dna: Dict, num_versions: int = 3) -> Dict:
        """Step 3: Generate on-brand content variations"""
        prompt = f"""Generate {num_versions} content variations for this brief, strictly following brand voice.

Brief:
{brief}

Brand DNA:
{json.dumps(brand_dna, indent=2)}

Return JSON with:
- versions (list of {{version, content, brand_score, why_works}})
- recommended (which version letter)
- reasoning (why that version is best)

Make each version genuinely different while staying on-brand."""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.7
        )
        
        return json.loads(response.choices[0].message.content)
    
    def automate_brand_content(self, brief: str) -> Dict:
        """Chain all 3 steps together"""
        # Extract brand DNA from stored guide
        brand_dna = self.extract_brand_dna(self.brand_guide)
        
        # Generate content
        generated = self.generate_content(brief, brand_dna)
        
        # Validate recommended version
        best_content = next(
            v['content'] for v in generated['versions'] 
            if v['version'] == generated['recommended']
        )
        validation = self.validate_content(best_content, brand_dna)
        
        return {
            'brand_dna': brand_dna,
            'generated': generated,
            'validation': validation,
            'final_content': best_content if validation['is_on_brand'] else None
        }

# Usage
generator = BrandVoiceGenerator(
    api_key="sk-proj-...",
    brand_guide=your_brand_guide_text
)

result = generator.automate_brand_content(
    brief="Write a LinkedIn post about our new focus timer feature"
)

if result['final_content']:
    print(f"Ready to publish:\n{result['final_content']}")
else:
    print(f"Needs revision. Issues: {result['validation']['violations']}")

Level 2: With Fine-Tuning & Error Handling

Good for: 100-1,000 pieces/week | Setup time: 2-3 hours

// With Fine-Tuning & Error Handling (100-1000 pieces/week)
import Anthropic from '@anthropic-ai/sdk';
import OpenAI from 'openai';
import { createHash } from 'crypto';

interface BrandDNA {
  brand_name: string;
  target_audience: string[];
  voice_attributes: any;
  key_phrases: string[];
  tone_words: string[];
  content_patterns: any;
}

interface GeneratedContent {
  versions: Array<{
    version: string;
    content: string;
    brand_score: number;
    why_works: string;
  }>;
  recommended: string;
  reasoning: string;
}

class BrandVoiceSystem {
  private openai: OpenAI;
  private anthropic: Anthropic;
  private fineTunedModel: string | null = null;
  private brandDNACache: Map<string, BrandDNA> = new Map();

  constructor(openaiKey: string, anthropicKey: string) {
    this.openai = new OpenAI({ apiKey: openaiKey });
    this.anthropic = new Anthropic({ apiKey: anthropicKey });
  }

  // Fine-tune a custom model on your brand voice
  async trainBrandModel(brandExamples: Array<{ input: string; output: string }>) {
    // Prepare training data in JSONL format
    const trainingData = brandExamples.map((ex) => ({
      messages: [
        { role: 'system', content: 'You are a brand voice expert.' },
        { role: 'user', content: ex.input },
        { role: 'assistant', content: ex.output },
      ],
    }));

    // Upload training file
    const file = await this.openai.files.create({
      file: Buffer.from(trainingData.map((d) => JSON.stringify(d)).join('\n')),
      purpose: 'fine-tune',
    });

    // Create fine-tuning job
    const fineTune = await this.openai.fineTuning.jobs.create({
      training_file: file.id,
      model: 'gpt-4o-mini-2024-07-18',
      hyperparameters: {
        n_epochs: 3,
      },
    });

    console.log(`Fine-tuning job started: ${fineTune.id}`);
    this.fineTunedModel = fineTune.fine_tuned_model;

    return fineTune.id;
  }

  // Extract brand DNA with caching
  async extractBrandDNA(brandText: string): Promise<BrandDNA> {
    const cacheKey = createHash('md5').update(brandText).digest('hex');

    if (this.brandDNACache.has(cacheKey)) {
      return this.brandDNACache.get(cacheKey)!;
    }

    const result = await this.retryWithBackoff(async () => {
      const response = await this.anthropic.messages.create({
        model: 'claude-3-5-sonnet-20241022',
        max_tokens: 2048,
        messages: [
          {
            role: 'user',
            content: `Extract brand voice DNA as JSON: ${brandText}`,
          },
        ],
      });

      const content = response.content[0];
      if (content.type !== 'text') throw new Error('Invalid response');
      return JSON.parse(content.text);
    });

    this.brandDNACache.set(cacheKey, result);
    return result;
  }

  // Validate content with detailed scoring
  async validateContent(content: string, brandDNA: BrandDNA) {
    return await this.retryWithBackoff(async () => {
      const response = await this.anthropic.messages.create({
        model: 'claude-3-5-sonnet-20241022',
        max_tokens: 1024,
        messages: [
          {
            role: 'user',
            content: `Validate this content against brand voice. Return JSON with violations, is_on_brand, confidence_score:\n\nContent: ${content}\n\nBrand DNA: ${JSON.stringify(brandDNA)}`,
          },
        ],
      });

      const contentBlock = response.content[0];
      if (contentBlock.type !== 'text') throw new Error('Invalid response');
      return JSON.parse(contentBlock.text);
    });
  }

  // Generate content using fine-tuned model if available
  async generateContent(
    brief: string,
    brandDNA: BrandDNA,
    numVersions: number = 3
  ): Promise<GeneratedContent> {
    const model = this.fineTunedModel || 'gpt-4';

    return await this.retryWithBackoff(async () => {
      const response = await this.openai.chat.completions.create({
        model,
        messages: [
          {
            role: 'system',
            content: `You are a brand voice expert. Generate content strictly following this brand DNA: ${JSON.stringify(brandDNA)}`,
          },
          {
            role: 'user',
            content: `Generate ${numVersions} on-brand variations for: ${brief}`,
          },
        ],
        temperature: 0.7,
      });

      return JSON.parse(response.choices[0].message.content || '{}');
    });
  }

  // Retry helper with exponential backoff
  private async retryWithBackoff<T>(
    fn: () => Promise<T>,
    maxRetries: number = 3
  ): Promise<T> {
    let lastError: Error | null = null;

    for (let attempt = 0; attempt < maxRetries; attempt++) {
      try {
        return await Promise.race([
          fn(),
          new Promise<never>((_, reject) =>
            setTimeout(() => reject(new Error('Timeout')), 45000)
          ),
        ]);
      } catch (error) {
        lastError = error as Error;
        if (attempt < maxRetries - 1) {
          await new Promise((resolve) =>
            setTimeout(resolve, Math.pow(2, attempt) * 1000)
          );
        }
      }
    }

    throw lastError;
  }

  // Full automation pipeline
  async automateContent(brief: string, brandGuide: string) {
    const brandDNA = await this.extractBrandDNA(brandGuide);
    const generated = await this.generateContent(brief, brandDNA);

    // Validate recommended version
    const bestContent = generated.versions.find(
      (v) => v.version === generated.recommended
    )!.content;

    const validation = await this.validateContent(bestContent, brandDNA);

    return {
      brandDNA,
      generated,
      validation,
      finalContent: validation.is_on_brand ? bestContent : null,
      needsReview: !validation.is_on_brand,
    };
  }
}

// Usage
const system = new BrandVoiceSystem(
  process.env.OPENAI_API_KEY!,
  process.env.ANTHROPIC_API_KEY!
);

// Optional: Train custom model on your brand examples
await system.trainBrandModel([
  {
    input: 'Write about our new feature',
    output: 'Your on-brand example content here...',
  },
  // ... 50-100 examples for best results
]);

// Generate content
const result = await system.automateContent(
  'LinkedIn post about focus timer feature',
  brandGuideText
);

if (result.finalContent) {
  console.log('Ready to publish:', result.finalContent);
} else {
  console.log('Needs review:', result.validation.violations);
}

Level 3: Production Pattern with Multi-Model Orchestration

Good for: 1,000+ pieces/week | Setup time: 1 day

# Production Pattern with Multi-Model Orchestration (1000+ pieces/week)
from langgraph.graph import Graph, END
from typing import TypedDict, List, Dict
import openai
import anthropic
import asyncio
import json
from dataclasses import dataclass

@dataclass
class BrandVoiceConfig:
    brand_guide: str
    fine_tuned_model: str = None
    validation_threshold: float = 0.85
    max_iterations: int = 3

class ContentState(TypedDict):
    brief: str
    brand_dna: Dict
    generated_versions: List[Dict]
    selected_content: str
    validation_result: Dict
    iteration: int
    is_approved: bool
    feedback: str

class BrandVoiceOrchestrator:
    def __init__(self, config: BrandVoiceConfig):
        self.config = config
        self.openai = openai.OpenAI()
        self.anthropic = anthropic.Anthropic()
        self.graph = self._build_graph()
    
    def _extract_brand_dna(self, state: ContentState) -> ContentState:
        """Node 1: Extract brand DNA (cached after first run)"""
        if state.get('brand_dna'):
            return state
        
        response = self.anthropic.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=2048,
            messages=[{
                "role": "user",
                "content": f"Extract brand DNA as JSON: {self.config.brand_guide}"
            }]
        )
        
        state['brand_dna'] = json.loads(response.content[0].text)
        return state
    
    def _generate_content(self, state: ContentState) -> ContentState:
        """Node 2: Generate multiple versions using best available model"""
        model = self.config.fine_tuned_model or "gpt-4"
        
        # If we have feedback from previous iteration, incorporate it
        brief = state['brief']
        if state.get('feedback'):
            brief += f"\n\nPrevious attempt feedback: {state['feedback']}"
        
        response = self.openai.chat.completions.create(
            model=model,
            messages=[
                {
                    "role": "system",
                    "content": f"Generate on-brand content. Brand DNA: {json.dumps(state['brand_dna'])}"
                },
                {
                    "role": "user",
                    "content": f"Generate 3 variations for: {brief}"
                }
            ],
            temperature=0.7
        )
        
        state['generated_versions'] = json.loads(response.choices[0].message.content)
        state['selected_content'] = state['generated_versions']['versions'][0]['content']
        return state
    
    def _validate_brand_voice(self, state: ContentState) -> ContentState:
        """Node 3: Validate against brand guidelines"""
        response = self.anthropic.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=1024,
            messages=[{
                "role": "user",
                "content": f"""Validate this content against brand voice. Return JSON:
                
Content: {state['selected_content']}

Brand DNA: {json.dumps(state['brand_dna'])}

Include: violations (list), is_on_brand (bool), confidence_score (0-1), feedback (string)"""
            }]
        )
        
        state['validation_result'] = json.loads(response.content[0].text)
        return state
    
    def _check_approval(self, state: ContentState) -> str:
        """Router: Decide if content is approved or needs regeneration"""
        validation = state['validation_result']
        
        # Approved if score above threshold
        if validation['confidence_score'] >= self.config.validation_threshold:
            state['is_approved'] = True
            return "approved"
        
        # Max iterations reached - send to human review
        if state['iteration'] >= self.config.max_iterations:
            state['is_approved'] = False
            return "human_review"
        
        # Try again with feedback
        state['iteration'] += 1
        state['feedback'] = validation.get('feedback', '')
        return "regenerate"
    
    def _human_review_node(self, state: ContentState) -> ContentState:
        """Node 4: Flag for human review"""
        state['is_approved'] = False
        # In production, this would send to Slack/Teams/review queue
        print(f"⚠️  Flagged for human review after {state['iteration']} attempts")
        print(f"Issues: {state['validation_result']['violations']}")
        return state
    
    def _build_graph(self) -> Graph:
        """Build LangGraph workflow"""
        graph = Graph()
        
        # Add nodes
        graph.add_node("extract_dna", self._extract_brand_dna)
        graph.add_node("generate", self._generate_content)
        graph.add_node("validate", self._validate_brand_voice)
        graph.add_node("human_review", self._human_review_node)
        
        # Add edges
        graph.set_entry_point("extract_dna")
        graph.add_edge("extract_dna", "generate")
        graph.add_edge("generate", "validate")
        
        # Conditional routing after validation
        graph.add_conditional_edges(
            "validate",
            self._check_approval,
            {
                "approved": END,
                "regenerate": "generate",
                "human_review": "human_review"
            }
        )
        graph.add_edge("human_review", END)
        
        return graph.compile()
    
    async def process_batch(self, briefs: List[str]) -> List[Dict]:
        """Process multiple content briefs in parallel"""
        tasks = [self.process_brief(brief) for brief in briefs]
        return await asyncio.gather(*tasks)
    
    async def process_brief(self, brief: str) -> Dict:
        """Process a single content brief through the graph"""
        initial_state = {
            "brief": brief,
            "brand_dna": None,
            "generated_versions": [],
            "selected_content": "",
            "validation_result": {},
            "iteration": 0,
            "is_approved": False,
            "feedback": ""
        }
        
        result = self.graph.invoke(initial_state)
        
        return {
            "brief": brief,
            "content": result['selected_content'],
            "approved": result['is_approved'],
            "iterations": result['iteration'],
            "validation": result['validation_result']
        }

# Usage
config = BrandVoiceConfig(
    brand_guide=your_brand_guide,
    fine_tuned_model="ft:gpt-4o-mini-2024-07-18:your-org:brand-voice:abc123",
    validation_threshold=0.85,
    max_iterations=3
)

orchestrator = BrandVoiceOrchestrator(config)

# Process single brief
result = await orchestrator.process_brief(
    "Write LinkedIn post about new focus timer feature"
)

if result['approved']:
    print(f"✅ Auto-approved after {result['iterations']} iterations")
    print(result['content'])
else:
    print(f"⚠️  Needs human review")
    print(f"Issues: {result['validation']['violations']}")

# Process batch (100 briefs in parallel)
briefs = [
    "LinkedIn post about feature X",
    "Twitter thread about customer story Y",
    "Blog intro for topic Z",
    # ... 97 more
]

results = await orchestrator.process_batch(briefs)
approved = [r for r in results if r['approved']]
print(f"Auto-approved: {len(approved)}/{len(results)}")

When to Level Up

Start: Simple API Calls

0-100 pieces/week

Sequential API calls to GPT-4/Claude
Manual brand guide as text input
Basic validation with print statements
Copy-paste approved content to CMS

Scale: Fine-Tuned Model + Automation

100-1,000 pieces/week

Fine-tune GPT-4o-mini on 50-100 brand examples
Automatic retries with exponential backoff
Brand DNA caching (avoid re-extraction)
Slack/Teams notifications for review queue
Direct CMS integration (WordPress/Contentful API)

Production: Multi-Model Orchestration

1,000-5,000 pieces/week

LangGraph workflow with conditional routing
Multi-model fallback (Claude → GPT-4 → Gemini)
Automatic regeneration with feedback loop (max 3 tries)
Human review queue for edge cases
Batch processing (100+ briefs in parallel)
A/B testing different brand voice variations

Enterprise: Multi-Brand Multi-Channel System

5,000+ pieces/week

Separate fine-tuned models per brand/sub-brand
Channel-specific voice adaptation (LinkedIn vs Twitter vs Blog)
Real-time brand drift monitoring dashboard
Automated localization for international markets
Integration with DAM systems for visual brand consistency
ML-powered brand evolution tracking over time

Marketing-Specific Gotchas

The code examples above work. But marketing has unique challenges you need to handle.

Brand Voice Drift Over Time

Your brand voice evolves. Content from 2023 sounds different than 2025. If you train on old examples, your AI generates outdated content. Solution: Weight recent examples higher, retrain quarterly.

import datetime
from typing import List, Dict

def prepare_weighted_training_data(examples: List[Dict]) -> List[Dict]:
    """Weight recent examples higher to prevent brand drift"""
    now = datetime.datetime.now()
    weighted_examples = []
    
    for ex in examples:
        # Calculate age in months
        age_months = (now - ex['date']).days / 30
        
        # Exponential decay: recent = 1.0, 12 months old = 0.5, 24+ months = 0.25
        weight = max(0.25, 1.0 * (0.5 ** (age_months / 12)))
        
        # Duplicate recent examples based on weight
        num_copies = int(weight * 4)  # 4x for recent, 1x for old
        weighted_examples.extend([ex] * num_copies)
    
    return weighted_examples

# Usage
training_data = prepare_weighted_training_data(brand_examples)
# Recent content appears 4x more often in training

Multi-Channel Voice Adaptation

Your LinkedIn voice isn't your Twitter voice. Same brand, different channels need different tone. Don't use one model for all channels. Train channel-specific variants or use prompt engineering.

interface ChannelConfig {
  platform: string;
  max_length: number;
  tone_shift: string;
  formatting_rules: string[];
}

const CHANNEL_CONFIGS: Record<string, ChannelConfig> = {
  linkedin: {
    platform: 'LinkedIn',
    max_length: 3000,
    tone_shift: 'slightly more professional, use data/stats',
    formatting_rules: ['Use line breaks for readability', 'Bold key points', 'Add relevant hashtags (3-5)']
  },
  twitter: {
    platform: 'Twitter',
    max_length: 280,
    tone_shift: 'more casual, punchy, use thread format for longer ideas',
    formatting_rules: ['One idea per tweet', 'Use emojis sparingly', 'End with CTA or question']
  },
  blog: {
    platform: 'Blog',
    max_length: 10000,
    tone_shift: 'more depth, storytelling, educational',
    formatting_rules: ['H2/H3 headers', 'Code examples', 'Internal links']
  }
};

async function generateForChannel(
  brief: string,
  channel: string,
  brandDNA: any
): Promise<string> {
  const config = CHANNEL_CONFIGS[channel];
  
  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [
      {
        role: 'system',
        content: `Generate content for ${config.platform}. Brand DNA: ${JSON.stringify(brandDNA)}. Tone shift: ${config.tone_shift}. Max length: ${config.max_length} chars.`
      },
      { role: 'user', content: brief }
    ]
  });
  
  return response.choices[0].message.content || '';
}

// Generate same brief for different channels
const linkedinPost = await generateForChannel(brief, 'linkedin', brandDNA);
const twitterThread = await generateForChannel(brief, 'twitter', brandDNA);

Handling Sensitive Topics & PR Crises

Your AI doesn't know when to shut up. During a PR crisis or sensitive topic, you need human review. Build a keyword blocklist that triggers mandatory review.

from typing import List, Dict
import re

class SensitiveTopicFilter:
    def __init__(self):
        # Keywords that trigger mandatory human review
        self.sensitive_keywords = [
            'layoff', 'firing', 'lawsuit', 'investigation',
            'controversy', 'scandal', 'apology', 'mistake',
            'data breach', 'security', 'privacy violation',
            'discrimination', 'harassment', 'complaint'
        ]
        
        # Competitors (never mention without approval)
        self.competitor_names = [
            'competitor_a', 'competitor_b', 'competitor_c'
        ]
    
    def requires_review(self, content: str) -> Dict:
        """Check if content needs human review before publishing"""
        content_lower = content.lower()
        
        # Check sensitive keywords
        found_sensitive = [
            kw for kw in self.sensitive_keywords 
            if kw in content_lower
        ]
        
        # Check competitor mentions
        found_competitors = [
            comp for comp in self.competitor_names
            if comp in content_lower
        ]
        
        needs_review = bool(found_sensitive or found_competitors)
        
        return {
            'needs_review': needs_review,
            'reason': 'sensitive_topic' if found_sensitive else 'competitor_mention' if found_competitors else None,
            'flagged_terms': found_sensitive + found_competitors,
            'action': 'BLOCK_AUTO_PUBLISH' if needs_review else 'ALLOW'
        }

# Usage
filter = SensitiveTopicFilter()
check = filter.requires_review(generated_content)

if check['needs_review']:
    # Send to review queue, don't auto-publish
    send_to_slack_review_channel(
        content=generated_content,
        reason=check['reason'],
        flagged_terms=check['flagged_terms']
    )
else:
    # Safe to auto-publish
    publish_to_cms(generated_content)

Legal & Compliance Review for Regulated Industries

Finance, healthcare, legal industries can't auto-publish everything. Build a compliance layer that flags claims, statistics, or medical/financial advice for legal review.

interface ComplianceCheck {
  needs_legal_review: boolean;
  flagged_claims: string[];
  risk_level: 'low' | 'medium' | 'high';
  review_type: string[];
}

class ComplianceValidator {
  private readonly CLAIM_PATTERNS = [
    /\d+%\s+(increase|decrease|improvement|growth)/gi,  // Performance claims
    /guaranteed|promise|ensure|100%/gi,  // Absolute claims
    /\$[\d,]+\s+(saved|earned|profit|return)/gi,  // Financial claims
    /(cure|treat|diagnose|prevent)\s+\w+/gi,  // Medical claims
  ];

  async validateContent(content: string): Promise<ComplianceCheck> {
    const flaggedClaims: string[] = [];
    const reviewTypes: Set<string> = new Set();

    // Check for unsubstantiated claims
    for (const pattern of this.CLAIM_PATTERNS) {
      const matches = content.match(pattern);
      if (matches) {
        flaggedClaims.push(...matches);
        
        if (pattern.source.includes('saved|earned')) {
          reviewTypes.add('financial_claims');
        }
        if (pattern.source.includes('cure|treat')) {
          reviewTypes.add('medical_claims');
        }
        if (pattern.source.includes('guaranteed')) {
          reviewTypes.add('absolute_claims');
        }
      }
    }

    // Risk assessment
    let riskLevel: 'low' | 'medium' | 'high' = 'low';
    if (flaggedClaims.length > 3) riskLevel = 'high';
    else if (flaggedClaims.length > 0) riskLevel = 'medium';

    return {
      needs_legal_review: flaggedClaims.length > 0,
      flagged_claims: flaggedClaims,
      risk_level: riskLevel,
      review_type: Array.from(reviewTypes),
    };
  }
}

// Usage
const validator = new ComplianceValidator();
const check = await validator.validateContent(generatedContent);

if (check.needs_legal_review) {
  // Route to appropriate review queue
  await sendToLegalReview({
    content: generatedContent,
    risk_level: check.risk_level,
    flagged_claims: check.flagged_claims,
    review_types: check.review_type,
  });
} else {
  // Safe to publish
  await publishContent(generatedContent);
}

SEO Keyword Integration Without Keyword Stuffing

Marketing needs SEO keywords, but LLMs sometimes over-optimize and create unnatural content. Solution: Post-process to check keyword density and readability scores.

import re
from collections import Counter
from typing import Dict, List

class SEOValidator:
    def __init__(self, target_keywords: List[str]):
        self.target_keywords = [kw.lower() for kw in target_keywords]
        self.max_keyword_density = 0.03  # 3% max density
    
    def calculate_keyword_density(self, content: str) -> Dict:
        """Check if keywords are used naturally, not stuffed"""
        words = re.findall(r'\w+', content.lower())
        total_words = len(words)
        
        keyword_counts = {}
        for keyword in self.target_keywords:
            # Count exact matches and variations
            keyword_words = keyword.split()
            count = 0
            
            # Sliding window to find multi-word keywords
            for i in range(len(words) - len(keyword_words) + 1):
                window = ' '.join(words[i:i+len(keyword_words)])
                if window == keyword:
                    count += 1
            
            density = count / total_words if total_words > 0 else 0
            keyword_counts[keyword] = {
                'count': count,
                'density': density,
                'is_stuffed': density > self.max_keyword_density
            }
        
        # Overall assessment
        is_keyword_stuffed = any(kw['is_stuffed'] for kw in keyword_counts.values())
        has_all_keywords = all(kw['count'] > 0 for kw in keyword_counts.values())
        
        return {
            'keyword_analysis': keyword_counts,
            'is_keyword_stuffed': is_keyword_stuffed,
            'has_all_keywords': has_all_keywords,
            'needs_revision': is_keyword_stuffed or not has_all_keywords,
            'total_words': total_words
        }
    
    def generate_seo_feedback(self, analysis: Dict) -> str:
        """Generate feedback for regeneration if needed"""
        feedback = []
        
        for keyword, data in analysis['keyword_analysis'].items():
            if data['is_stuffed']:
                feedback.append(
                    f"Keyword '{keyword}' appears {data['count']} times ({data['density']:.1%} density). "
                    f"Reduce to max 3% density. Use synonyms and natural variations."
                )
            elif data['count'] == 0:
                feedback.append(
                    f"Missing target keyword '{keyword}'. Include naturally in content."
                )
        
        return ' '.join(feedback) if feedback else 'SEO optimization looks good.'

# Usage
seo_validator = SEOValidator(
    target_keywords=['brand voice automation', 'AI content generation', 'marketing automation']
)

analysis = seo_validator.calculate_keyword_density(generated_content)

if analysis['needs_revision']:
    feedback = seo_validator.generate_seo_feedback(analysis)
    # Regenerate with SEO feedback
    regenerated = generate_content_with_feedback(brief, feedback)
else:
    # SEO looks good, proceed
    publish_content(generated_content)

Cost Calculator

Manual Brand Review Process

Content Manager reviewing drafts

$45/hour × 4 hours/day

Revision cycles (avg 2.3 per piece)

20 pieces/week × 15 min/revision

Brand guideline updates

4 hours/month × $65/hour

Freelancer content that's off-brand

30% rejection rate × $150/piece

Total:$4,960/month

per month

Limitations:

• Can't scale beyond 20 pieces/week
• 60% of first drafts need revision
• 4+ hour daily review bottleneck
• Inconsistent voice across team members

Automated Brand Voice System

OpenAI API (GPT-4 + fine-tuning)

100 pieces/week × $0.15/piece

Claude API (validation)

100 pieces/week × $0.08/piece

Infrastructure (AWS/hosting)

Lambda + S3 + CloudWatch

Human review (10% flagged)

10 pieces/week × 10 min × $45/hour

Total:$513/month