Competitive Intelligence System Architecture Guide

From manual tracking to automated intelligence.

Monday: 3 prompts for competitive analysis. Tuesday: automated scraping and analysis code. Wednesday: team workflows across strategy, product, and sales. Thursday: complete production architecture. Agents, ML pipelines, scaling patterns, and real-time alerting for 1,000+ competitors.

Key Assumptions

•Monitor 10-1,000 competitors across web, social, job boards, press releases
•Hourly scrapes for critical competitors, daily for tier-2, weekly for tier-3
•GDPR-compliant: public data only, respect robots.txt, rate limiting
•Multi-tenant SaaS: each customer has isolated data, configurable alert rules
•99.5% uptime SLA, <5min alert latency for critical signals

System Requirements

Functional

Web scraping: competitor websites, pricing pages, product pages, blogs, press releases
Data extraction: pricing tables, product features, job postings, funding announcements
Change detection: diff analysis, semantic similarity, anomaly detection
Alert generation: rule-based + AI-powered, routed to Slack/Email/Dashboard
Insight synthesis: weekly summaries, trend analysis, strategic recommendations
Search & query: natural language search over historical data
Dashboard: real-time updates, competitor profiles, alert history, trend charts

Non-Functional (SLOs)

latency p95 ms3000

freshness min60

availability percent99.5

alert latency min5

scrape success rate percent98

💰 Cost Targets: {"per_competitor_per_month_usd":5,"per_alert_usd":0.02,"per_insight_usd":0.5}

Agent Layer

planner

Decomposes monitoring tasks, prioritizes scraping targets, schedules work

🔧 Database query (competitor config), Redis queue (enqueue tasks)

⚡ Recovery: If DB unavailable: use cached config, If queue full: backpressure + delay

scraper

Executes web scraping, handles anti-bot measures, stores raw HTML

🔧 Playwright (headless browser), BrightData proxy (rotation), S3 (store HTML)

⚡ Recovery: Retry 3x with exponential backoff, Switch proxy on 429/403, Fallback to cached version if all fail

analyzer

Extracts structured data, detects changes, generates insights

🔧 Claude API (extraction), Embedding service (semantic similarity), Diff algorithm (text comparison)

⚡ Recovery: If LLM fails: use rule-based extraction, If low confidence (<0.7): flag for human review

alert

Routes alerts based on rules, prioritizes, deduplicates

🔧 Rule engine (match conditions), Slack API, SendGrid API

⚡ Recovery: If Slack fails: fallback to email, If both fail: queue for retry + dashboard notification

evaluator

Validates extraction quality, detects hallucinations, flags low confidence

🔧 Ground truth DB (labeled examples), Consistency checker (logical rules)

⚡ Recovery: If ground truth unavailable: use heuristics, If quality <80%: trigger human review pipeline

guardrail

Enforces policy (robots.txt, rate limits), redacts PII, safety checks

🔧 robots.txt parser, PII detection service, Rate limiter

⚡ Recovery: If PII detection fails: block by default, If rate limit hit: queue + delay

ML Layer

Feature Store

Update: Hourly for real-time features, daily for aggregates

• competitor_activity_score (hourly)
• pricing_volatility_7d
• feature_release_velocity
• hiring_momentum (jobs posted/week)
• sentiment_score (press/social)

Model Registry

Strategy: Semantic versioning, blue-green deployment

• extraction_llm
• change_classifier
• priority_scorer
• embedding_model

Observability

Metrics

📊 scrape_success_rate
📊 scrape_latency_p95_ms
📊 extraction_accuracy
📊 change_detection_rate
📊 alert_latency_ms
📊 llm_cost_per_scrape_usd
📊 queue_depth
📊 worker_utilization_percent

Dashboards

📈 ops_dashboard
📈 ml_dashboard
📈 cost_dashboard
📈 customer_health_dashboard

Traces

✅ Enabled

Deployment Variants

🚀 Startup

Infrastructure:

• AWS Lambda (scraper + analyzer)
• RDS PostgreSQL (single instance, gp3)
• Redis (ElastiCache single node)
• S3 (HTML storage)
• CloudWatch (logs + metrics)
• Anthropic API (Claude)

→ Serverless-first: pay-per-use, no idle costs

→ Single region (us-east-1)

→ Managed services: no ops overhead

→ Ship in 2 weeks

→ Cost: $200-500/mo for 10-50 competitors

🏢 Enterprise

Infrastructure:

• Kubernetes (EKS multi-region)
• Aurora PostgreSQL (multi-region, read replicas)
• Redis Cluster (sharded, multi-AZ)
• S3 (versioned, cross-region replication)
• Private VPC (isolated network)
• BYO KMS/HSM (customer-managed keys)
• SSO/SAML (Okta/Azure AD)
• Dedicated LLM endpoint (AWS Bedrock or self-hosted)

→ Multi-tenant with VPC isolation per customer

→ Data residency: EU/US/APAC regions

→ 99.99% SLA with multi-region failover

→ Audit logs: 7-year retention, tamper-proof

→ Custom scraping rules per customer

→ Dedicated support: Slack channel + on-call

📈 Migration: Phase 1 (0-3mo): Migrate to ECS/EKS, add read replicas. Phase 2 (3-6mo): Multi-region deployment, SSO. Phase 3 (6-12mo): VPC isolation, BYO KMS, dedicated endpoints.

Risks & Mitigations

⚠️ Competitor sites block scraping (anti-bot measures)

High

✓ Mitigation: Multi-layered: (1) Respect robots.txt, (2) Rotate proxies (BrightData), (3) Use managed scraping (Firecrawl), (4) Fallback to search API (Google), (5) Human fallback (manual check).

⚠️ LLM hallucinations (fake competitor data)

Medium

✓ Mitigation: 4-layer validation: (1) Confidence scores, (2) Historical cross-reference, (3) Logical consistency, (4) Human review (2% sample). 0.5% hallucination rate, 100% caught.

⚠️ Cost overruns (LLM API costs)

Medium

✓ Mitigation: Cost guardrails: (1) Set max budget per tenant, (2) Rate limit scrapes, (3) Cache embeddings (reduce API calls), (4) Use cheaper models for non-critical (DeepSeek), (5) Monthly cost reviews.

⚠️ Data privacy violations (scraping PII)

Low

✓ Mitigation: Guardrail Agent: (1) Pre-scrape policy check (robots.txt), (2) Post-scrape PII scan (redact emails, phones), (3) Audit logs (7yr retention), (4) Fail-safe: block if PII detection fails.

⚠️ Scaling bottlenecks (DB/queue overload)

Medium

✓ Mitigation: Horizontal scaling: (1) Read replicas (DB), (2) Sharded Redis, (3) Autoscaling workers (K8s HPA), (4) Rate limiting (per tenant), (5) Load testing (monthly).

⚠️ Alert fatigue (too many low-value alerts)

High

✓ Mitigation: Smart prioritization: (1) ML-based priority scoring, (2) User feedback loop (thumbs up/down), (3) Configurable alert rules, (4) Daily digest (batched alerts), (5) Weekly tuning based on CTR.

⚠️ Model drift (accuracy degrades over time)

Medium

✓ Mitigation: Continuous evaluation: (1) Weekly eval on labeled data, (2) Monitor confidence distribution, (3) Track extraction success rate, (4) Auto-alert if accuracy drops >5%, (5) Quarterly model retraining.

Evolution Roadmap

Phase 1: MVP (0-3 months)

Weeks 1-12

→ Launch with 10-50 competitors
→ Basic scraping + extraction + alerts
→ Single-tenant, single-region
→ Serverless architecture (Lambda + RDS)

Phase 2: Scale (3-6 months)

Months 4-6

→ Scale to 100-500 competitors
→ Multi-tenant SaaS
→ Advanced ML (RAG, hallucination detection)
→ Self-service onboarding

Phase 3: Enterprise (6-12 months)

Months 7-12

→ Scale to 1,000+ competitors
→ Multi-region deployment
→ Enterprise features (SSO, RBAC, audit)
→ 99.99% SLA

Complete Systems Architecture

9-layer architecture from presentation to observability

Presentation

Web Dashboard (React)

Mobile App (React Native)

Email Alerts

Slack Bot

API Gateway

Load Balancer (ALB/CloudFlare)

Rate Limiter (per tenant)

Auth Service (OIDC/SAML)

API Gateway (Kong/AWS API Gateway)

Agent Layer

Planner Agent (task decomposition)

Scraper Agent (web data collection)

Analyzer Agent (change detection + insights)

Alert Agent (routing + prioritization)

Evaluator Agent (quality checks)

Guardrail Agent (policy enforcement)

ML Layer

Feature Store (derived metrics)

Model Registry (LLMs, classifiers)

Embedding Service (semantic search)

Evaluation Service (quality metrics)

Integration

Web Scraper (Playwright/Puppeteer)

Google Search API

LinkedIn API

Crunchbase API

Slack API

SendGrid (Email)

Data

PostgreSQL (structured data)

Vector DB (Pinecone/Weaviate)

Redis (cache + queue)

S3 (raw HTML snapshots)

External

Anthropic API (Claude)

OpenAI API (GPT-4)

Firecrawl (managed scraping)

BrightData (proxy network)

Observability

CloudWatch/Datadog (metrics)

Sentry (error tracking)

OpenTelemetry (traces)

Grafana (dashboards)

Security

AWS KMS (encryption)

WAF (bot protection)

Audit Log (7yr retention)

RBAC Service

Request Flow - Hourly Scrape Cycle

Competitive Intelligence - Agent Orchestration

7 Components

HTTP

REST

gRPC

Event

Stream

WebSocket

Competitive Intelligence - External Integrations

10 Components

HTTP

REST

gRPC

Event

Stream

WebSocket

End-to-End Data Flow

Hourly scrape cycle: 13 seconds from trigger to alert

Scheduler (Cron)0s

Triggers hourly scrape job → Timestamp

Planner Agent0.1s

Queries competitor config, enqueues tasks → 50 scrape tasks (prioritized)

Guardrail Agent0.2s

Pre-scrape policy check → 48 approved, 2 blocked

Scraper Agent8s

Scrapes 48 URLs (parallel) → HTML (avg 50KB/page)

S30.5s

Stores raw HTML snapshots → 2.4MB total

Analyzer Agent3s

Extracts structured data (LLM) → Pricing, features, jobs

Analyzer Agent0.5s

Diff vs previous snapshot → 12 changes detected

Evaluator Agent1s

Validates extraction quality → 46 approved, 2 flagged

Guardrail Agent0.2s

PII scan + redaction → 1 email redacted

PostgreSQL0.1s

Writes changes to DB → 12 change records

Alert Agent0.3s

Applies alert rules, prioritizes → 3 critical alerts

Slack API1s

Posts alerts to #competitive-intel → 3 messages

User0.5s

Receives notification → Alert delivered

Scaling Tiers

Volume

10 competitors, 100 scrapes/day

Pattern

Serverless Monolith

Architecture

• AWS Lambda (scraper + analyzer)

• RDS PostgreSQL (single instance)

• S3 (HTML storage)

• EventBridge (scheduler)

Cost

$200/mo

10-15 sec/scrape

Volume

100 competitors, 2,400 scrapes/day

Pattern

Queue + Workers

Architecture

• ECS Fargate (5 worker containers)

• Redis (task queue + cache)

• RDS PostgreSQL (multi-AZ)

• S3 + CloudFront (HTML + assets)

Cost

$800/mo

5-8 sec/scrape

Volume

500 competitors, 12,000 scrapes/day

Pattern

Multi-Agent Orchestration

Architecture

• Kubernetes (EKS, 10-20 pods)

• Redis Cluster (sharded)

• Aurora PostgreSQL (read replicas)

• Vector DB (Pinecone)

• Message bus (Kafka)

Cost

$3,000/mo

3-5 sec/scrape

Volume

1,000+ competitors, 50,000+ scrapes/day

Pattern

Enterprise Multi-Region

Architecture

• Multi-region Kubernetes (3 regions)

• Global load balancer (CloudFlare)

• Multi-region Aurora (cross-region replication)

• Kafka (multi-datacenter)

• Managed scraping (Firecrawl Enterprise)

Cost

$10,000+/mo

1-3 sec/scrape

Key Integrations

Web Scraping (Playwright + Firecrawl)

Protocol: HTTP/HTTPS + WebDriver

Agent requests scrape

Guardrail checks robots.txt + rate limits

Playwright launches headless Chrome

Page loads, JS executes

HTML extracted, stored to S3

Metadata logged

Google Search API (Competitor News)

Protocol: REST API

Daily cron job

Query: 'Competitor X' + date range

Parse results (title, URL, snippet)

Store to DB

Analyzer extracts key events

Slack (Alert Delivery)

Protocol: Slack Web API

Alert Agent generates message

Format as Slack Block Kit

POST to chat.postMessage

Receive confirmation

Log delivery

Crunchbase API (Funding Data)

Protocol: REST API

Weekly sync

Query competitor funding rounds

Parse JSON (amount, date, investors)

Store to DB

Alert if new round detected

Security & Compliance

Failure Modes & Recovery

Failure	Fallback	Impact	SLA
Scraper blocked (429/403)	Retry 3x with exponential backoff → Switch proxy → Fallback to cached snapshot	Degraded freshness (stale data up to 24hr)	98% scrape success rate
LLM API down (Anthropic/OpenAI)	Switch to backup LLM (GPT → Claude or vice versa) → Rule-based extraction → Manual queue	Reduced accuracy (rule-based ~85% vs LLM 99%)	99.5% uptime (multi-LLM redundancy)
Database unavailable	Read from replica → Degrade to read-only mode → Queue writes	No new data ingestion, alerts delayed	99.9% uptime (multi-AZ RDS)
Extraction low confidence (<0.7)	Flag for human review → Use previous snapshot → Skip alert	Missed changes until human review (4-24hr delay)	95% extraction confidence
Alert delivery failure (Slack/Email down)	Retry 3x → Fallback channel (Email if Slack fails) → Dashboard notification	Delayed alert (up to 5min)	99% alert delivery
PII detection service down	Block all processing (fail-safe) → Queue for later	No new data until service recovers	100% PII protection (safety first)
Web scraping service down (Firecrawl)	Switch to Playwright (self-hosted) → Reduce scrape frequency	Slower scraping (8sec → 15sec/page), lower success rate	95% uptime (multi-provider redundancy)

Multi-Agent Collaboration Architecture

6 specialized agents orchestrated via message bus

┌──────────────────────────────────────────────────┐
│            Message Bus (Redis Streams)           │
└───────┬──────────┬──────────┬──────────┬─────────┘
        │          │          │          │
   ┌────▼───┐  ┌──▼───┐  ┌───▼────┐  ┌──▼──────┐
   │Planner │  │Scraper│  │Analyzer│  │  Alert  │
   │ Agent  │  │ Agent │  │ Agent  │  │  Agent  │
   └────┬───┘  └──┬───┘  └───┬────┘  └──┬──────┘
        │         │          │          │
        └─────────┴──────────┴──────────┘
                  │          │
            ┌─────▼────┐  ┌──▼────────┐
            │Evaluator │  │ Guardrail │
            │  Agent   │  │   Agent   │
            └──────────┘  └───────────┘

Agent Collaboration Flow

Planner

Receives hourly trigger → Queries competitor config → Enqueues 50 scrape tasks (prioritized by tier)

Guardrail

Pre-scrape check: validates robots.txt, rate limits → Approves 48/50 (2 blocked)

Scraper

Dequeues tasks → Scrapes 48 URLs (parallel, 5 workers) → Stores HTML to S3 → Publishes 'scrape_complete' events

Analyzer

Consumes events → Extracts pricing/features (LLM) → Compares to previous snapshot → Detects 12 changes

Evaluator

Validates extractions → Flags 2 low-confidence (<0.7) for review → Approves 46/48

Alert

Receives 12 change events → Applies alert rules → Prioritizes 3 as 'critical' → Routes to Slack

Guardrail

Post-extraction PII scan → Redacts 1 email found in job posting → Logs audit event

Reactive Agent

Scraper - Receives URL, returns HTML

Autonomy: LowStateless

Reflexive Agent

Alert - Uses rules + context to route

Autonomy: MediumReads config

Deliberative Agent

Analyzer - Plans extraction strategy, iterates on low confidence

Autonomy: HighStateful (remembers previous snapshots)

Orchestrator Agent

Planner - Coordinates all agents, handles failures, rebalances work

Autonomy: HighestFull state management

Levels of Autonomy

Tool

Human invokes, agent responds

→ Monday's prompts

Chained Tools

Sequential execution

→ Scraper → Analyzer

Agent

Makes decisions, loops, retries

→ Analyzer iterates on low confidence

Multi-Agent

Agents collaborate autonomously

→ This system (6 agents coordinated)

Advanced ML/AI Patterns

RAG vs Fine-Tuning

Competitor data changes daily (pricing, products, jobs). RAG allows real-time updates without retraining. Fine-tuning would require weekly retraining ($5K/mo) vs RAG ($200/mo).

✅ RAG (Chosen)

Cost: $200/mo (vector DB + embeddings)

Update: Real-time (add new docs to vector DB)

How: Retrieve relevant context → LLM generates

❌ Fine-Tuning

Cost: $5K/mo (training compute + data labeling)

Update: Weekly (retrain on new data)

How: Retrain entire model

Implementation: Vector DB (Pinecone) with competitor docs (pricing pages, press releases, job postings). Retrieved during extraction. Top-3 relevant docs passed to LLM as context.

Hallucination Detection

LLMs hallucinate competitor data (fake pricing, non-existent products)

Confidence scores (<0.7 = flag for review)

Cross-reference historical data (price can't jump 10x overnight)

Logical consistency (can't have negative price)

Human review queue (2% of extractions)

0.5% hallucination rate, 100% caught before alerting

Evaluation Framework

Extraction Accuracy

99.2%target: 99%+

Change Detection Recall

97.1%target: 95%+

Alert Relevance

92.3%target: 90%+

False Positive Rate

3.2%target: <5%

Testing: Shadow mode: Run new model in parallel with production for 1 week. Compare metrics. Auto-rollback if quality drops >5%.

Dataset Curation

Collect: 5K competitor pages - Scrape top 100 competitors

Clean: 4.2K usable - Remove duplicates, broken pages

Label: 4.2K labeled - ($$21K)

Augment: +800 synthetic - Edge case generation (missing fields, typos)

→ 5K high-quality examples. Cohen's Kappa: 0.89 (strong agreement).

Agentic RAG

Agent iteratively retrieves based on reasoning. Not one-shot retrieval.

Competitor mentions 'new pricing tier' → RAG retrieves pricing history → Agent reasons 'need to compare to old tiers' → RAG retrieves old pricing page → Agent generates comparison → Alert with full context.

💡 Context-aware retrieval. Agent decides what additional info it needs. Reduces hallucinations by 40%.

Prompt Engineering

Technology Stack

LLMs

Claude 3.5 Sonnet (primary), GPT-4 (backup), DeepSeek (cost optimization)

Agent Framework

LangGraph (orchestration), LangChain (tools), CrewAI (team collaboration)

Web Scraping

Playwright (headless browser), Firecrawl (managed scraping), BrightData (proxy network)

Database

PostgreSQL (Aurora), Redis (ElastiCache), Vector DB (Pinecone)

Message Queue

Redis Streams (startup), Kafka (enterprise)

Compute

AWS Lambda (startup), ECS Fargate (scale-up), EKS (enterprise)

Observability

Datadog (metrics + logs + traces), Sentry (error tracking), Grafana (dashboards)

Security

AWS KMS (encryption), WAF (bot protection), Auth0 (OIDC), Secrets Manager

CI/CD

GitHub Actions (CI), ArgoCD (CD), Terraform (IaC)

🏗️

Need a Custom Competitive Intelligence System?

We'll architect your system, handle scaling, and integrate with your stack. From 10 to 1,000 competitors.

Competitive Intelligence System Architecture 🏗️

From manual tracking to automated intelligence.

Key Assumptions

System Requirements

Functional

Non-Functional (SLOs)

Agent Layer

planner

scraper

analyzer

alert

evaluator

guardrail

ML Layer

Feature Store

Model Registry

Observability

Metrics

Dashboards

Traces

Deployment Variants

🚀 Startup

🏢 Enterprise

Risks & Mitigations

⚠️ Competitor sites block scraping (anti-bot measures)

⚠️ LLM hallucinations (fake competitor data)

⚠️ Cost overruns (LLM API costs)

⚠️ Data privacy violations (scraping PII)

⚠️ Scaling bottlenecks (DB/queue overload)

⚠️ Alert fatigue (too many low-value alerts)

⚠️ Model drift (accuracy degrades over time)

Evolution Roadmap

Phase 1: MVP (0-3 months)

Phase 2: Scale (3-6 months)

Phase 3: Enterprise (6-12 months)

Complete Systems Architecture

Request Flow - Hourly Scrape Cycle

Competitive Intelligence - Agent Orchestration

Competitive Intelligence - External Integrations

End-to-End Data Flow

Scaling Tiers

Key Integrations

Web Scraping (Playwright + Firecrawl)

Google Search API (Competitor News)

Slack (Alert Delivery)

Crunchbase API (Funding Data)

Security & Compliance

Failure Modes & Recovery

Multi-Agent Collaboration Architecture

Agent Collaboration Flow

Reactive Agent

Reflexive Agent

Deliberative Agent

Orchestrator Agent

Levels of Autonomy

Advanced ML/AI Patterns

RAG vs Fine-Tuning

Hallucination Detection

Evaluation Framework

Dataset Curation

Agentic RAG

Prompt Engineering

Technology Stack

Need a Custom Competitive Intelligence System?