Affiliate & Advocacy System Architecture - Scale to 10K | Randeep Bhatia

🎯This Week's Journey

From prompts to production affiliate system.

Monday: 3 prompts for advocate identification, content generation, and performance tracking. Tuesday: automated code for referral link creation and commission calculation. Wednesday: team workflows for Growth, Marketing, and Finance. Thursday: complete technical architecture with 4 specialized agents, ML evaluation, and GDPR compliance for 10,000+ advocates daily.

📋

Key Assumptions

Track 100-10,000 active advocates concurrently

Real-time attribution (< 500ms click-to-conversion)

GDPR/CCPA compliance for EU/CA advocates

Multi-tier commission structures (5-25% range)

Integration with Stripe, PayPal, Wise for payouts

System Requirements

Functional

Identify high-potential advocates from customer data
Generate personalized referral content (emails, social posts, landing pages)
Track clicks, conversions, and attribution across channels
Calculate tiered commissions with fraud detection
Automate payouts via Stripe/PayPal/Wise
Provide advocate dashboards with real-time stats
Handle GDPR deletion requests within 30 days

Non-Functional (SLOs)

latency p95 ms500

freshness min5

availability percent99.5

attribution accuracy percent99.9

💰 Cost Targets: {"per_advocate_per_month_usd":0.5,"per_conversion_tracked_usd":0.02,"per_payout_usd":0.25}

Agent Layer

planner

Decompose high-level tasks into atomic actions

🔧 TaskDecomposer (LLM-based), ToolRegistry (maps actions to tools)

⚡ Recovery: If decomposition unclear: request human clarification, If tool unavailable: suggest alternative action sequence

executor

Execute action sequences with retry logic

🔧 CRM API, Payment Gateway API, Email Service API, Content Generation LLM

⚡ Recovery: Retry with exponential backoff (3 attempts), If API timeout: queue for async processing, If critical failure: escalate to human operator

evaluator

Validate outputs for quality and business rules

🔧 Advocate Scoring Model, Content Quality Classifier, Business Rule Engine

⚡ Recovery: If quality < threshold: flag for human review, If model unavailable: use fallback heuristics

guardrail

Enforce safety, compliance, and fraud checks

🔧 Fraud Detection Model, PII Redaction Service, GDPR Compliance Checker

⚡ Recovery: If high fraud risk: block transaction, alert ops team, If PII detected: auto-redact, log incident, If compliance violation: halt workflow, escalate

content_generator

Create personalized referral content

🔧 Claude/GPT for text generation, DALL-E for image generation, Brand Guideline Validator

⚡ Recovery: If generation fails: use template fallback, If brand violation: regenerate with stricter prompt

attribution

Track clicks, conversions, and multi-touch attribution

🔧 Attribution Model (last-click, multi-touch), Fraud Detection (click farms, bots)

⚡ Recovery: If ambiguous attribution: split credit proportionally, If fraud detected: withhold commission, flag for review

ML Layer

Feature Store

Update: Hourly for real-time features, daily for batch features

• advocate_ltv_usd (customer lifetime value)
• advocate_purchase_frequency (orders per month)
• advocate_network_size (social followers estimate)
• advocate_engagement_score (email open rate, click rate)
• conversion_rate_7d (conversions / clicks, 7-day window)
• avg_order_value_usd (mean order value from referrals)
• fraud_risk_score (0-100, from historical patterns)

Model Registry

Strategy: Semantic versioning (MAJOR.MINOR.PATCH), git-backed

• advocate_scoring_v3
• fraud_detection_v2
• content_quality_classifier
• attribution_model

Observability Stack

Real-time monitoring, tracing & alerting

0 active

SOURCES

Apps, Services, Infra

COLLECTION

9 Metrics

PROCESSING

Aggregate & Transform

DASHBOARDS

4 Views

ALERTS

Enabled

📊Metrics(9)

📝Logs(Structured)

🔗Traces(Distributed)

advocate_signup_rate

✓

referral_click_rate

✓

conversion_rate

✓

commission_usd_total

✓

payout_success_rate

✓

fraud_detection_accuracy

✓

Deployment Variants

🚀

Startup Architecture

Fast to deploy, cost-efficient, scales to 100 competitors

Infrastructure

✓

Vercel (Next.js hosting)

✓

Supabase (PostgreSQL + Auth)

✓

Upstash (Redis)

✓

Anthropic API (Claude)

✓

Stripe (payments)

✓

SendGrid (email)

→Single-region (us-east-1)

→Managed services only

→No custom VPC

→Cost: ~$200/mo for 500 advocates

→Deploy in 1 week

Risks & Mitigations

⚠️ LLM hallucination in content generation (fake stats, false claims)

Medium

✓ Mitigation: 4-layer hallucination detection (confidence scores, DB cross-reference, logical checks, human review). Target: < 1% hallucination rate.

⚠️ Fraud (click farms, fake conversions)

High

✓ Mitigation: Isolation Forest fraud detection model, IP geolocation checks, velocity limits (max 100 clicks/day per advocate), manual review for high-risk conversions.

⚠️ Attribution disputes (multiple advocates, same customer)

Medium

✓ Mitigation: Shapley value multi-touch attribution, transparent credit splitting, manual review queue for disputes (< 1% of conversions).

⚠️ GDPR compliance failure (data not deleted within 30 days)

Low

✓ Mitigation: Automated deletion workflow, audit trail, manual verification, quarterly compliance audits. SLA: 100% deletion within 30 days.

⚠️ Payment gateway failure (Stripe outage, insufficient funds)

Low

✓ Mitigation: Multi-gateway failover (Stripe → PayPal → Wise), retry logic (3x exponential backoff), finance team alert, advocate notification.

⚠️ LLM API cost explosion (10x traffic spike)

Medium

✓ Mitigation: Cost guardrails ($5K/day limit), auto-throttling at 80% budget, caching (50% cache hit rate), fallback to cheaper models (GPT-3.5) for non-critical tasks.

⚠️ Model drift (advocate scoring accuracy drops over time)

High

✓ Mitigation: Weekly drift detection (KL divergence), monthly retraining, A/B test new models (10% traffic), automatic rollback if accuracy < 95%.

🧬

Evolution Roadmap

Progressive transformation from MVP to scale

🌱

Phase 1Weeks 1-12

Phase 1: MVP (0-3 months)

Launch with 100 advocates

Basic content generation (email, social)

Last-click attribution

Manual payout approval

Complexity Level

▼

🌿

Phase 2Weeks 13-26

Phase 2: Scale (3-6 months)

Scale to 1,000 advocates

Multi-touch attribution (Shapley value)

Automated fraud detection

Self-serve advocate dashboard

Complexity Level

▼

🌳

Phase 3Weeks 27-52

Phase 3: Enterprise (6-12 months)

Scale to 10,000 advocates

Multi-region deployment (US, EU, APAC)

99.95% SLA

White-label for enterprise customers

Complexity Level

🚀Production Ready

🏗️

Complete Systems Architecture

End-to-end layer view with 4 agents and ML evaluation

🌐

Presentation

3 components

Advocate Dashboard (React)

Admin Portal (Next.js)

Public Referral Pages

⚙️

API Gateway

3 components

Load Balancer (ALB/CloudFlare)

Rate Limiter (Redis)

Auth (OAuth 2.0 + JWT)

💾

Agent Layer

4 components

Planner Agent (task decomposition)

Executor Agent (workflow orchestration)

Evaluator Agent (quality checks)

Guardrail Agent (fraud detection, PII redaction)

🔌

ML Layer

4 components

Feature Store (advocate metrics)

Model Registry (LLMs, classifiers)

Evaluation Loop (quality, cost, drift)

Prompt Store (versioned templates)

📊

Integration

4 components

CRM Connector (Salesforce, HubSpot)

Payment Gateway (Stripe, PayPal, Wise)

Email Service (SendGrid, Postmark)

Analytics (Segment, Amplitude)

🌐

Data

4 components

PostgreSQL (transactional)

Redis (caching, queues)

S3 (content assets)

TimescaleDB (time-series metrics)

⚙️

External

4 components

Anthropic/OpenAI APIs

Stripe API

CRM APIs

Email APIs

💾

Observability

4 components

Metrics (Prometheus/CloudWatch)

Logs (Loki/CloudWatch Logs)

Traces (Jaeger/X-Ray)

Dashboards (Grafana)

🔌

Security

4 components

KMS (encryption keys)

WAF (DDoS protection)

PII Redaction Service

Audit Log Store

🔄

Sequence Diagram - Advocate Onboarding Flow

Automated data flow every hour

Step 0 of 12

Data Flow - Advocate Onboarding to First Payout

Customer0ms

Requests to become advocate → Email, customer_id

API Gateway50ms

Authenticates, rate limits → JWT token

Planner Agent150ms

Decomposes task: fetch_crm, score, onboard → Action sequence

Executor Agent450ms

Fetches customer data from CRM → Purchase history, LTV

Evaluator Agent600ms

Scores advocate potential → Score: 87/100

Guardrail Agent800ms

Checks fraud risk, PII compliance → Risk: 12/100, PII redacted

Executor Agent1100ms

Creates Stripe Connect account → Payout account_id

Content Generator Agent4100ms

Generates email, social posts → 3 content assets

Database4150ms

Saves advocate record → advocate_id, referral_code

Customer4200ms

Receives dashboard link → 200 OK + dashboard_url

Attribution Agent4250ms

Tracks first referral click → Click event logged

Attribution Agent604800000ms

Detects conversion (7 days later) → Conversion: $120 order

Executor Agent604800100ms

Calculates commission (15%) → $18 commission

Payment Gateway604802000ms

Initiates payout via Stripe → Payout: $18 → advocate

Volume

0-100 advocates/day

Pattern

Monolith

🏗️

Architecture

Single Next.js app

PostgreSQL (managed)

Redis (managed)

Anthropic/OpenAI APIs

Cost & Performance

$100/mo

per month

4-5s

Volume

100-1K advocates/day

Pattern

Queue + Workers

🏗️

Architecture

API server (Node.js/Python)

Message queue (SQS/RabbitMQ)

Worker processes (3-5 instances)

PostgreSQL (replica for reads)

Redis (caching + queue)

Cost & Performance

$400/mo

per month

2-3s

Volume

1K-10K advocates/day

Pattern

Multi-Agent Orchestration

🏗️

Architecture

Load balancer (ALB)

Agent framework (LangGraph)

Message bus (Kafka/EventBridge)

Serverless functions (Lambda/Cloud Run)

TimescaleDB (time-series metrics)

S3 (content assets)

Cost & Performance

$1200/mo

per month

1-2s

Recommended

Volume

10K+ advocates/day

Pattern

Enterprise Multi-Region

🏗️

Architecture

Kubernetes (EKS/GKE)

Kafka (event streaming)

Multi-LLM failover (Claude + GPT + Gemini)

Replicated DB (multi-region)

Global CDN (CloudFront/Cloudflare)

Dedicated fraud detection cluster

Cost & Performance

$5K+/mo

per month

500ms-1s

Key Integrations

CRM (Salesforce, HubSpot)

Protocol: REST API + OAuth 2.0

Fetch customer profile

Get purchase history

Calculate LTV

Update advocate status

Payment Gateway (Stripe Connect)

Protocol: REST API + webhook events

Create Connect account for advocate

Calculate commission

Initiate payout

Handle webhook (payout.succeeded)

Email Service (SendGrid, Postmark)

Protocol: REST API

Send advocate invitation

Send performance reports

Send payout notifications

Analytics (Segment, Amplitude)

Protocol: HTTP tracking API

Track advocate signup

Track referral clicks

Track conversions

Track payouts

PII Redaction (AWS Comprehend)

Protocol: AWS SDK

Detect PII in customer data

Redact before sending to LLM

Log redaction events

Security & Compliance

Failure Modes & Fallbacks

Failure	Fallback	Impact	SLA
LLM API down (Anthropic outage)	Switch to GPT-4 (multi-LLM failover), queue for retry if both down	Degraded (slower response), not broken	99.5%
Content generation low quality (< 0.7 score)	Use template fallback, flag for human review	Quality maintained, manual review queue grows	99.0%
Fraud detection false positive	Manual review by ops team, temporary hold on payout	Delayed payout (24-48h), advocate notified	< 2% false positive rate
Stripe payout fails (insufficient funds)	Retry 3x with exponential backoff, escalate to finance team	Delayed payout, advocate notified via email	99.9% payout success
Database unavailable (RDS failover)	Switch to read replica (read-only mode), queue writes	Read-only for 2-5 min, writes queued	99.95% availability
Attribution ambiguous (multiple advocates, same customer)	Split credit proportionally (Shapley value), log for review	Fair attribution, potential disputes	< 1% disputed conversions
GDPR deletion request fails (data in 3rd-party CRM)	Delete from primary DB, log CRM deletion task, escalate	Partial deletion, compliance risk	100% deletion within 30 days

System Architecture

┌──────────────┐
│ Orchestrator │ ← Coordinates all agents
└──────┬───────┘
       │
   ┌───┴───┬────────┬──────────┬─────────┬────────┐
   │       │        │          │         │        │
┌──▼──┐ ┌─▼──┐  ┌──▼───┐  ┌──▼────┐ ┌──▼───┐ ┌─▼────┐
│Plan │ │Exec│  │Eval  │  │Guard  │ │Content│ │Attrib│
│Agent│ │Agent│  │Agent │  │Agent  │ │Agent  │ │Agent │
└──┬──┘ └─┬──┘  └──┬───┘  └──┬────┘ └──┬───┘ └─┬────┘
   │      │        │          │         │       │
   └──────┴────────┴──────────┴─────────┴───────┘
                    │
                 ┌──▼─────┐
                 │   DB   │
                 │  CRM   │
                 │Payments│
                 └────────┘

🔄Agent Collaboration Flow

Orchestrator

Receives advocate signup request, routes to Planner Agent

Planner Agent

Decomposes task: [fetch_crm, score_advocate, check_fraud, create_payout_account, generate_content]

Executor Agent

Executes action sequence: fetches CRM data, creates Stripe account

Evaluator Agent

Scores advocate potential (87/100), validates against threshold (> 70)

Guardrail Agent

Checks fraud risk (12/100), redacts PII, validates GDPR consent

Content Generator Agent

Generates personalized email + social posts using RAG (retrieves similar advocates)

Orchestrator

Aggregates results, saves to DB, returns dashboard link to customer

Attribution Agent

Tracks referral clicks, attributes conversions using Shapley value, calculates commissions

🎭Agent Types

Reactive Agent

Low

Attribution Agent - Responds to click events, logs conversions

Stateless (event-driven)

Reflexive Agent

Medium

Evaluator Agent - Uses rules + context (advocate score > 70)

Reads context (thresholds)

Deliberative Agent

High

Content Generator Agent - Plans content strategy, retrieves examples via RAG

Stateful (RAG context)

Orchestrator Agent

Highest

Coordinator - Routes tasks, handles failures, retries

Full state management

📈Levels of Autonomy

Tool

Human calls, agent responds

→ Monday's prompts (manual execution)

Chained Tools

Sequential execution (no branching)

→ Tuesday's code (fixed workflow)

Agent

Makes decisions, can loop, retry

→ Evaluator Agent (pass/fail routing)

Multi-Agent

Agents collaborate autonomously, adaptive workflows

→ This system (6 agents working together)

RAG vs Fine-Tuning

Advocate profiles and brand guidelines change frequently. RAG allows daily updates without retraining. Fine-tuning would require quarterly retraining ($10K+ per iteration).

✅ RAG (Chosen)

Cost: $100/mo

Update: Daily

How: Add new docs to vector DB (Pinecone)

❌ Fine-Tuning

Cost: $10K/quarter

Update: Quarterly

How: Retrain entire model on new data

Implementation: Vector DB (Pinecone/Weaviate) with advocate profiles, brand guidelines, past high-performing content. Retrieved during content generation (top 5 similar examples).

Hallucination Detection

LLMs hallucinate advocate stats (fake conversion numbers, false testimonials)

Confidence scores (< 0.7 = flag for review)

Cross-reference with DB (verify advocate stats)

Logical consistency checks (conversion rate can't exceed 100%)

Human review queue (ops team validates flagged content)

0.8% hallucination rate, 99.2% caught before publication

Evaluation Framework

Advocate Scoring Accuracy

96.3%target: 95%+

Content Quality Score

0.84target: 0.8+

Attribution Accuracy

99.4%target: 99%+

Fraud False Positive Rate

1.3%target: < 2%

Testing: Shadow mode: 500 real advocates processed in parallel with manual workflow. Accuracy measured against human-labeled ground truth.

Dataset Curation

Collect: 5K advocate profiles - Historical data + synthetic generation

Clean: 4.2K usable - Remove duplicates, incomplete profiles

Label: 4.2K labeled - ($$8.4K)

Augment: +1K synthetic - Edge case generation (low-engagement advocates, high-fraud-risk)

→ 5.2K high-quality examples (inter-rater agreement: 0.89 Cohen's Kappa)

Agentic RAG

Agent iteratively retrieves based on reasoning

Advocate mentions 'fitness niche' → RAG retrieves fitness-related content examples → Agent reasons 'need engagement metrics' → RAG retrieves similar advocates' performance → Content generated with full context.

💡 Not one-shot retrieval. Agent decides what else it needs to know, retrieves iteratively until confident.

Multi-Touch Attribution

Tech Stack Summary

LLMs

Claude 3.5 Sonnet (primary), GPT-4 (fallback), Gemini (tertiary)

Orchestration

LangGraph (agent framework), Temporal (workflow engine)

Database

PostgreSQL (transactional), TimescaleDB (time-series metrics)

Caching

Redis (session cache, queue), CloudFront (CDN)

Queue

SQS (simple), Kafka (high-throughput)

Compute

Lambda (serverless), EKS (containers for enterprise)

Monitoring

CloudWatch (AWS), Datadog (enterprise), Sentry (errors)

Security

AWS KMS (encryption), WAF (DDoS), Comprehend (PII detection)

Payments

Stripe Connect (primary), PayPal (fallback), Wise (international)

Analytics

Segment (event tracking), Amplitude (product analytics)

🏗️

Need Architecture Review?

We'll audit your affiliate system design, identify bottlenecks, and show you how to scale 10x.

No part of this content may be reproduced, distributed, or transmitted in any form without prior written permission.

Affiliate & Advocacy System Architecture 🏗️

From prompts to production affiliate system.

Key Assumptions

System Requirements

Functional

Non-Functional (SLOs)

Agent Layer

planner

executor

evaluator

guardrail

content_generator

attribution

ML Layer

Feature Store

Model Registry

Observability Stack

Deployment Variants

Startup Architecture

Infrastructure

Risks & Mitigations

⚠️ LLM hallucination in content generation (fake stats, false claims)

⚠️ Fraud (click farms, fake conversions)

⚠️ Attribution disputes (multiple advocates, same customer)

⚠️ GDPR compliance failure (data not deleted within 30 days)

⚠️ Payment gateway failure (Stripe outage, insufficient funds)

⚠️ LLM API cost explosion (10x traffic spike)

⚠️ Model drift (advocate scoring accuracy drops over time)

Evolution Roadmap

Phase 1: MVP (0-3 months)

Phase 2: Scale (3-6 months)

Phase 3: Enterprise (6-12 months)

Complete Systems Architecture

Presentation

API Gateway

Agent Layer

ML Layer

Integration

Data

External

Observability

Security

Sequence Diagram - Advocate Onboarding Flow

Data Flow - Advocate Onboarding to First Payout

Scaling Patterns

Key Integrations

CRM (Salesforce, HubSpot)

Payment Gateway (Stripe Connect)

Email Service (SendGrid, Postmark)

Analytics (Segment, Amplitude)

PII Redaction (AWS Comprehend)

Security & Compliance

Failure Modes & Fallbacks

Multi-Agent Architecture

🔄Agent Collaboration Flow

🎭Agent Types

Reactive Agent

Reflexive Agent

Deliberative Agent

Orchestrator Agent

📈Levels of Autonomy

Advanced ML/AI Patterns

RAG vs Fine-Tuning

Hallucination Detection

Evaluation Framework

Dataset Curation

Agentic RAG

Multi-Touch Attribution

Tech Stack Summary

Need Architecture Review?