← Wednesday's Workflows

Competitive Intelligence System Architecture 🏗️

From 10 to 1,000 competitors with real-time monitoring and AI-powered insights

June 5, 2025
📊 Strategy🏗️ Architecture🤖 Multi-Agent⚡ Real-Time

From manual tracking to automated intelligence.

Monday: 3 prompts for competitive analysis. Tuesday: automated scraping and analysis code. Wednesday: team workflows across strategy, product, and sales. Thursday: complete production architecture. Agents, ML pipelines, scaling patterns, and real-time alerting for 1,000+ competitors.

Key Assumptions

  • Monitor 10-1,000 competitors across web, social, job boards, press releases
  • Hourly scrapes for critical competitors, daily for tier-2, weekly for tier-3
  • GDPR-compliant: public data only, respect robots.txt, rate limiting
  • Multi-tenant SaaS: each customer has isolated data, configurable alert rules
  • 99.5% uptime SLA, <5min alert latency for critical signals

System Requirements

Functional

  • Web scraping: competitor websites, pricing pages, product pages, blogs, press releases
  • Data extraction: pricing tables, product features, job postings, funding announcements
  • Change detection: diff analysis, semantic similarity, anomaly detection
  • Alert generation: rule-based + AI-powered, routed to Slack/Email/Dashboard
  • Insight synthesis: weekly summaries, trend analysis, strategic recommendations
  • Search & query: natural language search over historical data
  • Dashboard: real-time updates, competitor profiles, alert history, trend charts

Non-Functional (SLOs)

latency p95 ms3000
freshness min60
availability percent99.5
alert latency min5
scrape success rate percent98

💰 Cost Targets: {"per_competitor_per_month_usd":5,"per_alert_usd":0.02,"per_insight_usd":0.5}

Agent Layer

planner

L3

Decomposes monitoring tasks, prioritizes scraping targets, schedules work

🔧 Database query (competitor config), Redis queue (enqueue tasks)

⚡ Recovery: If DB unavailable: use cached config, If queue full: backpressure + delay

scraper

L2

Executes web scraping, handles anti-bot measures, stores raw HTML

🔧 Playwright (headless browser), BrightData proxy (rotation), S3 (store HTML)

⚡ Recovery: Retry 3x with exponential backoff, Switch proxy on 429/403, Fallback to cached version if all fail

analyzer

L3

Extracts structured data, detects changes, generates insights

🔧 Claude API (extraction), Embedding service (semantic similarity), Diff algorithm (text comparison)

⚡ Recovery: If LLM fails: use rule-based extraction, If low confidence (<0.7): flag for human review

alert

L2

Routes alerts based on rules, prioritizes, deduplicates

🔧 Rule engine (match conditions), Slack API, SendGrid API

⚡ Recovery: If Slack fails: fallback to email, If both fail: queue for retry + dashboard notification

evaluator

L3

Validates extraction quality, detects hallucinations, flags low confidence

🔧 Ground truth DB (labeled examples), Consistency checker (logical rules)

⚡ Recovery: If ground truth unavailable: use heuristics, If quality <80%: trigger human review pipeline

guardrail

L4

Enforces policy (robots.txt, rate limits), redacts PII, safety checks

🔧 robots.txt parser, PII detection service, Rate limiter

⚡ Recovery: If PII detection fails: block by default, If rate limit hit: queue + delay

ML Layer

Feature Store

Update: Hourly for real-time features, daily for aggregates

  • competitor_activity_score (hourly)
  • pricing_volatility_7d
  • feature_release_velocity
  • hiring_momentum (jobs posted/week)
  • sentiment_score (press/social)

Model Registry

Strategy: Semantic versioning, blue-green deployment

  • extraction_llm
  • change_classifier
  • priority_scorer
  • embedding_model

Observability

Metrics

  • 📊 scrape_success_rate
  • 📊 scrape_latency_p95_ms
  • 📊 extraction_accuracy
  • 📊 change_detection_rate
  • 📊 alert_latency_ms
  • 📊 llm_cost_per_scrape_usd
  • 📊 queue_depth
  • 📊 worker_utilization_percent

Dashboards

  • 📈 ops_dashboard
  • 📈 ml_dashboard
  • 📈 cost_dashboard
  • 📈 customer_health_dashboard

Traces

✅ Enabled

Deployment Variants

🚀 Startup

Infrastructure:

  • AWS Lambda (scraper + analyzer)
  • RDS PostgreSQL (single instance, gp3)
  • Redis (ElastiCache single node)
  • S3 (HTML storage)
  • CloudWatch (logs + metrics)
  • Anthropic API (Claude)

Serverless-first: pay-per-use, no idle costs

Single region (us-east-1)

Managed services: no ops overhead

Ship in 2 weeks

Cost: $200-500/mo for 10-50 competitors

🏢 Enterprise

Infrastructure:

  • Kubernetes (EKS multi-region)
  • Aurora PostgreSQL (multi-region, read replicas)
  • Redis Cluster (sharded, multi-AZ)
  • S3 (versioned, cross-region replication)
  • Private VPC (isolated network)
  • BYO KMS/HSM (customer-managed keys)
  • SSO/SAML (Okta/Azure AD)
  • Dedicated LLM endpoint (AWS Bedrock or self-hosted)

Multi-tenant with VPC isolation per customer

Data residency: EU/US/APAC regions

99.99% SLA with multi-region failover

Audit logs: 7-year retention, tamper-proof

Custom scraping rules per customer

Dedicated support: Slack channel + on-call

📈 Migration: Phase 1 (0-3mo): Migrate to ECS/EKS, add read replicas. Phase 2 (3-6mo): Multi-region deployment, SSO. Phase 3 (6-12mo): VPC isolation, BYO KMS, dedicated endpoints.

Risks & Mitigations

⚠️ Competitor sites block scraping (anti-bot measures)

High

✓ Mitigation: Multi-layered: (1) Respect robots.txt, (2) Rotate proxies (BrightData), (3) Use managed scraping (Firecrawl), (4) Fallback to search API (Google), (5) Human fallback (manual check).

⚠️ LLM hallucinations (fake competitor data)

Medium

✓ Mitigation: 4-layer validation: (1) Confidence scores, (2) Historical cross-reference, (3) Logical consistency, (4) Human review (2% sample). 0.5% hallucination rate, 100% caught.

⚠️ Cost overruns (LLM API costs)

Medium

✓ Mitigation: Cost guardrails: (1) Set max budget per tenant, (2) Rate limit scrapes, (3) Cache embeddings (reduce API calls), (4) Use cheaper models for non-critical (DeepSeek), (5) Monthly cost reviews.

⚠️ Data privacy violations (scraping PII)

Low

✓ Mitigation: Guardrail Agent: (1) Pre-scrape policy check (robots.txt), (2) Post-scrape PII scan (redact emails, phones), (3) Audit logs (7yr retention), (4) Fail-safe: block if PII detection fails.

⚠️ Scaling bottlenecks (DB/queue overload)

Medium

✓ Mitigation: Horizontal scaling: (1) Read replicas (DB), (2) Sharded Redis, (3) Autoscaling workers (K8s HPA), (4) Rate limiting (per tenant), (5) Load testing (monthly).

⚠️ Alert fatigue (too many low-value alerts)

High

✓ Mitigation: Smart prioritization: (1) ML-based priority scoring, (2) User feedback loop (thumbs up/down), (3) Configurable alert rules, (4) Daily digest (batched alerts), (5) Weekly tuning based on CTR.

⚠️ Model drift (accuracy degrades over time)

Medium

✓ Mitigation: Continuous evaluation: (1) Weekly eval on labeled data, (2) Monitor confidence distribution, (3) Track extraction success rate, (4) Auto-alert if accuracy drops >5%, (5) Quarterly model retraining.

Evolution Roadmap

1

Phase 1: MVP (0-3 months)

Weeks 1-12
  • Launch with 10-50 competitors
  • Basic scraping + extraction + alerts
  • Single-tenant, single-region
  • Serverless architecture (Lambda + RDS)
2

Phase 2: Scale (3-6 months)

Months 4-6
  • Scale to 100-500 competitors
  • Multi-tenant SaaS
  • Advanced ML (RAG, hallucination detection)
  • Self-service onboarding
3

Phase 3: Enterprise (6-12 months)

Months 7-12
  • Scale to 1,000+ competitors
  • Multi-region deployment
  • Enterprise features (SSO, RBAC, audit)
  • 99.99% SLA

Complete Systems Architecture

9-layer architecture from presentation to observability

Presentation
Web Dashboard (React)
Mobile App (React Native)
Email Alerts
Slack Bot
API Gateway
Load Balancer (ALB/CloudFlare)
Rate Limiter (per tenant)
Auth Service (OIDC/SAML)
API Gateway (Kong/AWS API Gateway)
Agent Layer
Planner Agent (task decomposition)
Scraper Agent (web data collection)
Analyzer Agent (change detection + insights)
Alert Agent (routing + prioritization)
Evaluator Agent (quality checks)
Guardrail Agent (policy enforcement)
ML Layer
Feature Store (derived metrics)
Model Registry (LLMs, classifiers)
Embedding Service (semantic search)
Evaluation Service (quality metrics)
Integration
Web Scraper (Playwright/Puppeteer)
Google Search API
LinkedIn API
Crunchbase API
Slack API
SendGrid (Email)
Data
PostgreSQL (structured data)
Vector DB (Pinecone/Weaviate)
Redis (cache + queue)
S3 (raw HTML snapshots)
External
Anthropic API (Claude)
OpenAI API (GPT-4)
Firecrawl (managed scraping)
BrightData (proxy network)
Observability
CloudWatch/Datadog (metrics)
Sentry (error tracking)
OpenTelemetry (traces)
Grafana (dashboards)
Security
AWS KMS (encryption)
WAF (bot protection)
Audit Log (7yr retention)
RBAC Service

Request Flow - Hourly Scrape Cycle

SchedulerPlannerScraperAnalyzerAlertSlackTrigger hourly scrape for Competitor XScrape pricing page + blogHTML content (2 pages)Extract pricing, detect changesDetected: 15% price increaseCheck alert rules, prioritizePOST alert to #competitive-intelNotification delivered

Competitive Intelligence - Agent Orchestration

7 Components
[RPC]Generate monitoring plan[Event]Scraping schedule[RPC]Validate targets[Event]Compliance status[RPC]Execute scraping tasks[Event]Raw HTML data[RPC]Extract & analyze[Event]Structured insights[RPC]Validate results[Event]Quality scores[RPC]Trigger alerts[Event]Delivery confirmationOrchestrator4 capabilitiesPlanner Agent4 capabilitiesScraper Agent4 capabilitiesAnalyzer Agent4 capabilitiesAlert Agent4 capabilitiesEvaluator Agent4 capabilitiesGuardrail Agent4 capabilities
HTTP
REST
gRPC
Event
Stream
WebSocket

Competitive Intelligence - External Integrations

10 Components
[HTTP]Scraping requests[REST]Social monitoring[WebSocket]Real-time updates[WebSocket]Live insights[Webhook]Alert notifications[REST]User commands[SMTP]Digest reports[Event]Intelligence data[REST]Competitive insights[Webhook]Deal triggers[HTTP]Proxied requests[REST]Analysis requests[REST]Generated insightsCore System4 capabilitiesCompetitor Websites4 capabilitiesSocial Media APIs4 capabilitiesWeb Dashboard4 capabilitiesSlack Workspace4 capabilitiesEmail Service4 capabilitiesData Warehouse4 capabilitiesCRM System4 capabilitiesProxy Network4 capabilitiesLLM Provider4 capabilities
HTTP
REST
gRPC
Event
Stream
WebSocket

End-to-End Data Flow

Hourly scrape cycle: 13 seconds from trigger to alert

1
Scheduler (Cron)0s
Triggers hourly scrape jobTimestamp
2
Planner Agent0.1s
Queries competitor config, enqueues tasks50 scrape tasks (prioritized)
3
Guardrail Agent0.2s
Pre-scrape policy check48 approved, 2 blocked
4
Scraper Agent8s
Scrapes 48 URLs (parallel)HTML (avg 50KB/page)
5
S30.5s
Stores raw HTML snapshots2.4MB total
6
Analyzer Agent3s
Extracts structured data (LLM)Pricing, features, jobs
7
Analyzer Agent0.5s
Diff vs previous snapshot12 changes detected
8
Evaluator Agent1s
Validates extraction quality46 approved, 2 flagged
9
Guardrail Agent0.2s
PII scan + redaction1 email redacted
10
PostgreSQL0.1s
Writes changes to DB12 change records
11
Alert Agent0.3s
Applies alert rules, prioritizes3 critical alerts
12
Slack API1s
Posts alerts to #competitive-intel3 messages
13
User0.5s
Receives notificationAlert delivered

Scaling Tiers

Volume
10 competitors, 100 scrapes/day
Pattern
Serverless Monolith
Architecture
AWS Lambda (scraper + analyzer)
RDS PostgreSQL (single instance)
S3 (HTML storage)
EventBridge (scheduler)
Cost
$200/mo
10-15 sec/scrape
Volume
100 competitors, 2,400 scrapes/day
Pattern
Queue + Workers
Architecture
ECS Fargate (5 worker containers)
Redis (task queue + cache)
RDS PostgreSQL (multi-AZ)
S3 + CloudFront (HTML + assets)
Cost
$800/mo
5-8 sec/scrape
Volume
500 competitors, 12,000 scrapes/day
Pattern
Multi-Agent Orchestration
Architecture
Kubernetes (EKS, 10-20 pods)
Redis Cluster (sharded)
Aurora PostgreSQL (read replicas)
Vector DB (Pinecone)
Message bus (Kafka)
Cost
$3,000/mo
3-5 sec/scrape
Volume
1,000+ competitors, 50,000+ scrapes/day
Pattern
Enterprise Multi-Region
Architecture
Multi-region Kubernetes (3 regions)
Global load balancer (CloudFlare)
Multi-region Aurora (cross-region replication)
Kafka (multi-datacenter)
Managed scraping (Firecrawl Enterprise)
Cost
$10,000+/mo
1-3 sec/scrape

Key Integrations

Web Scraping (Playwright + Firecrawl)

Protocol: HTTP/HTTPS + WebDriver
Agent requests scrape
Guardrail checks robots.txt + rate limits
Playwright launches headless Chrome
Page loads, JS executes
HTML extracted, stored to S3
Metadata logged

Google Search API (Competitor News)

Protocol: REST API
Daily cron job
Query: 'Competitor X' + date range
Parse results (title, URL, snippet)
Store to DB
Analyzer extracts key events

Slack (Alert Delivery)

Protocol: Slack Web API
Alert Agent generates message
Format as Slack Block Kit
POST to chat.postMessage
Receive confirmation
Log delivery

Crunchbase API (Funding Data)

Protocol: REST API
Weekly sync
Query competitor funding rounds
Parse JSON (amount, date, investors)
Store to DB
Alert if new round detected

Security & Compliance

Failure Modes & Recovery

FailureFallbackImpactSLA
Scraper blocked (429/403)Retry 3x with exponential backoff → Switch proxy → Fallback to cached snapshotDegraded freshness (stale data up to 24hr)98% scrape success rate
LLM API down (Anthropic/OpenAI)Switch to backup LLM (GPT → Claude or vice versa) → Rule-based extraction → Manual queueReduced accuracy (rule-based ~85% vs LLM 99%)99.5% uptime (multi-LLM redundancy)
Database unavailableRead from replica → Degrade to read-only mode → Queue writesNo new data ingestion, alerts delayed99.9% uptime (multi-AZ RDS)
Extraction low confidence (<0.7)Flag for human review → Use previous snapshot → Skip alertMissed changes until human review (4-24hr delay)95% extraction confidence
Alert delivery failure (Slack/Email down)Retry 3x → Fallback channel (Email if Slack fails) → Dashboard notificationDelayed alert (up to 5min)99% alert delivery
PII detection service downBlock all processing (fail-safe) → Queue for laterNo new data until service recovers100% PII protection (safety first)
Web scraping service down (Firecrawl)Switch to Playwright (self-hosted) → Reduce scrape frequencySlower scraping (8sec → 15sec/page), lower success rate95% uptime (multi-provider redundancy)

Multi-Agent Collaboration Architecture

6 specialized agents orchestrated via message bus

┌──────────────────────────────────────────────────┐
│            Message Bus (Redis Streams)           │
└───────┬──────────┬──────────┬──────────┬─────────┘
        │          │          │          │
   ┌────▼───┐  ┌──▼───┐  ┌───▼────┐  ┌──▼──────┐
   │Planner │  │Scraper│  │Analyzer│  │  Alert  │
   │ Agent  │  │ Agent │  │ Agent  │  │  Agent  │
   └────┬───┘  └──┬───┘  └───┬────┘  └──┬──────┘
        │         │          │          │
        └─────────┴──────────┴──────────┘
                  │          │
            ┌─────▼────┐  ┌──▼────────┐
            │Evaluator │  │ Guardrail │
            │  Agent   │  │   Agent   │
            └──────────┘  └───────────┘

Agent Collaboration Flow

1
Planner
Receives hourly trigger → Queries competitor config → Enqueues 50 scrape tasks (prioritized by tier)
2
Guardrail
Pre-scrape check: validates robots.txt, rate limits → Approves 48/50 (2 blocked)
3
Scraper
Dequeues tasks → Scrapes 48 URLs (parallel, 5 workers) → Stores HTML to S3 → Publishes 'scrape_complete' events
4
Analyzer
Consumes events → Extracts pricing/features (LLM) → Compares to previous snapshot → Detects 12 changes
5
Evaluator
Validates extractions → Flags 2 low-confidence (<0.7) for review → Approves 46/48
6
Alert
Receives 12 change events → Applies alert rules → Prioritizes 3 as 'critical' → Routes to Slack
7
Guardrail
Post-extraction PII scan → Redacts 1 email found in job posting → Logs audit event

Reactive Agent

Scraper - Receives URL, returns HTML
Autonomy: LowStateless

Reflexive Agent

Alert - Uses rules + context to route
Autonomy: MediumReads config

Deliberative Agent

Analyzer - Plans extraction strategy, iterates on low confidence
Autonomy: HighStateful (remembers previous snapshots)

Orchestrator Agent

Planner - Coordinates all agents, handles failures, rebalances work
Autonomy: HighestFull state management

Levels of Autonomy

L1
Tool
Human invokes, agent responds
Monday's prompts
L2
Chained Tools
Sequential execution
Scraper → Analyzer
L3
Agent
Makes decisions, loops, retries
Analyzer iterates on low confidence
L4
Multi-Agent
Agents collaborate autonomously
This system (6 agents coordinated)

Advanced ML/AI Patterns

RAG vs Fine-Tuning

Competitor data changes daily (pricing, products, jobs). RAG allows real-time updates without retraining. Fine-tuning would require weekly retraining ($5K/mo) vs RAG ($200/mo).
✅ RAG (Chosen)
Cost: $200/mo (vector DB + embeddings)
Update: Real-time (add new docs to vector DB)
How: Retrieve relevant context → LLM generates
❌ Fine-Tuning
Cost: $5K/mo (training compute + data labeling)
Update: Weekly (retrain on new data)
How: Retrain entire model
Implementation: Vector DB (Pinecone) with competitor docs (pricing pages, press releases, job postings). Retrieved during extraction. Top-3 relevant docs passed to LLM as context.

Hallucination Detection

LLMs hallucinate competitor data (fake pricing, non-existent products)
L1
Confidence scores (<0.7 = flag for review)
L2
Cross-reference historical data (price can't jump 10x overnight)
L3
Logical consistency (can't have negative price)
L4
Human review queue (2% of extractions)
0.5% hallucination rate, 100% caught before alerting

Evaluation Framework

Extraction Accuracy
99.2%target: 99%+
Change Detection Recall
97.1%target: 95%+
Alert Relevance
92.3%target: 90%+
False Positive Rate
3.2%target: <5%
Testing: Shadow mode: Run new model in parallel with production for 1 week. Compare metrics. Auto-rollback if quality drops >5%.

Dataset Curation

1
Collect: 5K competitor pages - Scrape top 100 competitors
2
Clean: 4.2K usable - Remove duplicates, broken pages
3
Label: 4.2K labeled - ($$21K)
4
Augment: +800 synthetic - Edge case generation (missing fields, typos)
5K high-quality examples. Cohen's Kappa: 0.89 (strong agreement).

Agentic RAG

Agent iteratively retrieves based on reasoning. Not one-shot retrieval.
Competitor mentions 'new pricing tier' → RAG retrieves pricing history → Agent reasons 'need to compare to old tiers' → RAG retrieves old pricing page → Agent generates comparison → Alert with full context.
💡 Context-aware retrieval. Agent decides what additional info it needs. Reduces hallucinations by 40%.

Prompt Engineering

Technology Stack

LLMs
Claude 3.5 Sonnet (primary), GPT-4 (backup), DeepSeek (cost optimization)
Agent Framework
LangGraph (orchestration), LangChain (tools), CrewAI (team collaboration)
Web Scraping
Playwright (headless browser), Firecrawl (managed scraping), BrightData (proxy network)
Database
PostgreSQL (Aurora), Redis (ElastiCache), Vector DB (Pinecone)
Message Queue
Redis Streams (startup), Kafka (enterprise)
Compute
AWS Lambda (startup), ECS Fargate (scale-up), EKS (enterprise)
Observability
Datadog (metrics + logs + traces), Sentry (error tracking), Grafana (dashboards)
Security
AWS KMS (encryption), WAF (bot protection), Auth0 (OIDC), Secrets Manager
CI/CD
GitHub Actions (CI), ArgoCD (CD), Terraform (IaC)
🏗️

Need a Custom Competitive Intelligence System?

We'll architect your system, handle scaling, and integrate with your stack. From 10 to 1,000 competitors.