Sales Intelligence System Architecture: AI-Powered Pipeline

From prompts to production sales intelligence.

Monday: 3 core prompts for account enrichment, signal detection, and insight generation. Tuesday: automated code with CRM sync. Wednesday: team workflows for SDRs, AEs, and RevOps. Thursday: complete technical architecture with multi-agent orchestration, ML pipelines, and enterprise-grade scaling patterns.

Key Assumptions

•Monitor 10-1,000 target accounts initially, scaling to 10K+
•Hourly signal detection for tier-1 accounts, daily for tier-2/3
•GDPR/CCPA compliant data handling with PII redaction
•Salesforce or HubSpot as primary CRM
•Budget: $500-5K/month depending on scale

System Requirements

Functional

Account enrichment from ZoomInfo, LinkedIn, Clearbit, news APIs
Real-time signal detection (funding, hiring, tech stack changes)
AI-generated insights with confidence scores and citations
Bi-directional CRM sync (Salesforce, HubSpot)
Role-based dashboards (SDR, AE, RevOps)
Webhook notifications for high-priority signals
Historical trend analysis and pattern recognition

Non-Functional (SLOs)

latency p95 ms3000

freshness min60

availability percent99.5

enrichment accuracy percent95

signal detection recall percent92

💰 Cost Targets: {"per_account_per_month_usd":5,"per_enrichment_usd":0.15,"per_insight_usd":0.08}

Agent Layer

planner

Orchestrate enrichment workflow, select data sources, prioritize signals

🔧 account_lookup, source_selector, cost_estimator

⚡ Recovery: Fallback to cached plan if planning fails, Use default source set if selector unavailable

enrichment_executor

Execute data fetching from ZoomInfo, LinkedIn, Clearbit, news APIs

🔧 zoominfo_api, linkedin_api, clearbit_api, news_api, cache_lookup

⚡ Recovery: Retry with exponential backoff (3x), Use cached data if API fails, Skip source if timeout > 5s

signal_detector

Detect buying signals: funding, hiring, tech changes, executive moves

🔧 pattern_matcher, llm_classifier, trend_analyzer, priority_ranker

⚡ Recovery: Use rule-based fallback if LLM fails, Return empty signals with alert if all detection fails

insight_generator

Generate actionable sales insights with citations and next steps

🔧 llm_api, citation_extractor, action_recommender

⚡ Recovery: Use template-based insights if LLM fails, Return signals without insights as fallback

evaluator

Validate data quality, check confidence thresholds, flag low-quality outputs

🔧 schema_validator, confidence_checker, citation_verifier, hallucination_detector

⚡ Recovery: Flag for human review if validation fails, Block sync to CRM if quality < threshold

guardrail

PII redaction, policy compliance, rate limiting, safety filters

🔧 pii_detector, policy_engine, rate_limiter, audit_logger

⚡ Recovery: Block request if PII detection fails, Return 429 if rate limit exceeded, Alert security team on policy violation

ML Layer

Feature Store

Update: Hourly for tier-1, daily for tier-2/3

• account_engagement_score (0-100)
• signal_velocity (signals/week)
• enrichment_staleness_days
• tech_stack_similarity (0-1)
• buying_intent_score (0-100)
• historical_conversion_rate

Model Registry

Strategy: Semantic versioning with A/B testing

• signal_classifier
• priority_ranker
• insight_generator

Observability

Metrics

📊 enrichment_success_rate
📊 signal_detection_recall
📊 insight_relevance_score
📊 llm_latency_p95_ms
📊 api_error_rate
📊 crm_sync_lag_seconds
📊 cost_per_account_usd
📊 cache_hit_rate

Dashboards

📈 ops_dashboard
📈 ml_dashboard
📈 sales_kpi_dashboard
📈 cost_dashboard

Traces

✅ Enabled

Deployment Variants

🚀 Startup

Infrastructure:

• Vercel/Netlify for frontend + API routes
• Supabase (Postgres + Auth + Storage)
• Upstash Redis (serverless cache)
• Direct LLM API calls (OpenAI/Anthropic)
• No Kubernetes, no Kafka

→ Deploy in 1 day

→ Cost: $200-500/mo

→ Scales to 1K accounts

→ Manual monitoring with Vercel Analytics

🏢 Enterprise

Infrastructure:

• Kubernetes (EKS/GKE) with auto-scaling
• Kafka + Schema Registry for event streaming
• Aurora Global Database (multi-region)
• Private VPC with VPN/Direct Connect
• SSO/SAML (Okta/Azure AD)
• BYO KMS/HSM for encryption keys
• Multi-LLM routing (OpenAI + Anthropic + self-hosted)
• Dedicated support + SLA

→ Multi-region deployment (US + EU)

→ Cost: $12K+/mo

→ Scales to 100K+ accounts

→ 99.9% SLA with auto-failover

→ SOC2 Type II + GDPR compliant

📈 Migration: Start with startup stack. At 1K accounts, migrate to queue-based. At 5K accounts, add agent orchestration. At 10K accounts, move to Kubernetes + multi-region. Total migration time: 6-9 months.

Risks & Mitigations

⚠️ LinkedIn scraping detected, account banned

Medium

✓ Mitigation: Use official Sales Navigator API where possible. Rotate proxy IPs. Implement rate limiting (10 req/min). Have fallback to ZoomInfo for employee data. Budget for multiple LinkedIn accounts.

⚠️ LLM API costs spiral out of control

High

✓ Mitigation: Set per-account cost caps ($0.50 max). Use cheaper models for low-priority accounts. Cache aggressively (24hr TTL). Monitor cost per enrichment daily. Alert if > $0.20/account.

⚠️ Signal detection false positives annoy sales team

Medium

✓ Mitigation: Require 2+ signals for high-priority alerts. Show confidence scores. Allow feedback (thumbs up/down). Retrain monthly on feedback data. Target < 10% false positive rate.

⚠️ CRM sync conflicts overwrite manual updates

Medium

✓ Mitigation: Use last-modified timestamp for conflict resolution. Never overwrite manually edited fields. Show diff before sync. Allow rollback within 24 hours. Log all changes to audit trail.

⚠️ GDPR violation from storing EU customer data in US

Low

✓ Mitigation: Deploy EU region (eu-west-1) for EU customers. Data residency checks in API gateway. No cross-region data transfer. Annual GDPR audit. DPA with all vendors.

⚠️ Key engineer leaves, system becomes unmaintainable

Medium

✓ Mitigation: Document architecture (this doc!). Use standard frameworks (LangGraph, not custom). Automated tests (80%+ coverage). Pair programming on critical paths. Hire 2+ engineers familiar with stack.

⚠️ Enrichment sources change APIs, break integrations

High

✓ Mitigation: Version all API clients. Monitor for breaking changes (webhooks). Test against sandbox environments. Have fallback sources. Budget 1 day/month for API maintenance.

Evolution Roadmap

Phase 1: MVP (0-3 months)

Weeks 1-12

→ Launch with 10-100 accounts
→ Prove signal detection accuracy (> 90%)
→ Get 5 sales team advocates
→ Achieve < $5/account cost

Phase 2: Scale (3-6 months)

Weeks 13-24

→ Scale to 1,000 accounts
→ Automate CRM sync (Salesforce API)
→ Add insight generation with LLM
→ Reduce cost to < $3/account

Phase 3: Enterprise (6-12 months)

Weeks 25-52

→ Scale to 10,000+ accounts
→ Multi-region deployment (US + EU)
→ 99.9% SLA with auto-failover
→ SOC2 Type II certified

Complete Systems Architecture

9-layer architecture from presentation to security

Presentation

Sales Dashboard (React)

Mobile App (React Native)

Slack/Teams Bots

API Gateway

Load Balancer (ALB/Cloud LB)

Rate Limiter (Redis)

Auth Service (Auth0/Cognito)

API Gateway (Kong/AWS API Gateway)

Agent Layer

Planner Agent

Enrichment Executor

Signal Detector

Insight Generator

Evaluator Agent

Guardrail Agent

ML Layer

Feature Store (Feast/Tecton)

Model Registry (MLflow)

Inference Service

Evaluation Pipeline

Prompt Store

Integration

CRM Adapter (Salesforce/HubSpot)

Data Enrichment APIs

News/Social APIs

Webhook Manager

Data

PostgreSQL (accounts, signals)

Redis (cache, queue)

S3/GCS (raw data, logs)

Vector DB (embeddings)

External

Salesforce API

ZoomInfo API

LinkedIn Sales Navigator

News APIs (AlphaVantage, NewsAPI)

LLM APIs (OpenAI, Anthropic)

Observability

Metrics (Prometheus/Datadog)

Logs (CloudWatch/Loki)

Traces (Jaeger/Honeycomb)

Dashboards (Grafana)

Security

KMS/HSM (secrets)

WAF

PII Redaction Service

Audit Logger

RBAC Service

Request Flow - Account Enrichment

Sales Intelligence - Hub Orchestration

7 Components

HTTP

REST

gRPC

Event

Stream

WebSocket

Sales Intelligence - Feedback & Refinement Network

7 Components

HTTP

REST

gRPC

Event

Stream

WebSocket

Data Flow - End-to-End

From SDR request to CRM sync in under 8 seconds

SDR0s

Clicks 'Enrich Account' → account_id

API Gateway50ms

Authenticates, rate limits → JWT token

Guardrail Agent200ms

Policy check, PII scan → Sanitized request

Planner Agent100ms

Selects data sources, creates plan → Execution plan

Enrichment Executor2.5s

Fetches from ZoomInfo, LinkedIn, news → Raw data JSON

Signal Detector1.5s

Analyzes for buying signals → Signals array

Insight Generator2s

Generates insights + citations → Insight text

Evaluator Agent300ms

Validates quality, confidence → Quality score

CRM Sync800ms

Updates Salesforce record → SFDC object

Webhook200ms

Notifies Slack if high-priority → Slack message

SDR100ms

Sees enriched account + insights → UI update

Scaling Patterns

Volume

10-100 accounts

Pattern

Serverless Monolith

Architecture

• Next.js API routes

• Vercel/Netlify hosting

• Supabase (Postgres + Auth)

• Upstash Redis (cache)

• Direct API calls to enrichment sources

Cost

$200/mo

5-8s

Volume

100-1,000 accounts

Pattern

Queue-Based Workers

Architecture

• API server (Node/Python)

• Redis queue (BullMQ/Celery)

• Worker pool (3-5 workers)

• PostgreSQL (managed)

• S3/GCS for raw data

Cost

$800/mo

3-5s

Volume

1,000-10,000 accounts

Pattern

Multi-Agent Orchestration

Architecture

• Load balancer

• Agent framework (LangGraph/CrewAI)

• SQS/Kafka message bus

• Lambda/Cloud Run functions

• RDS Multi-AZ

• Vector DB (Pinecone/Weaviate)

Cost

$3,500/mo

2-4s

Volume

10,000+ accounts

Pattern

Enterprise Multi-Region

Architecture

• Kubernetes (EKS/GKE)

• Kafka + Schema Registry

• Multi-LLM routing

• Aurora Global Database

• CDN (CloudFront/Fastly)

• Private VPC, SSO/SAML

Cost

$12K+/mo

1-3s

Key Integrations

Salesforce CRM

Protocol: REST API (SOAP for legacy)

Enrich account data

Map to SFDC Account/Opportunity objects

Upsert via Bulk API (batch) or REST (real-time)

Handle conflicts with last-modified timestamp

ZoomInfo

Protocol: REST API v2

Search by domain or company name

Fetch company profile + contacts

Parse technographics, firmographics

Cache for 24 hours

LinkedIn Sales Navigator

Protocol: Unofficial API (scraping with Playwright)

Search for company employees

Scrape profile data (title, tenure, posts)

Detect hiring signals from job changes

Respect rate limits (10 req/min)

News APIs

Protocol: REST (NewsAPI, AlphaVantage)

Query by company name + keywords

Filter by date, relevance

Extract funding, partnership, product launch signals

Deduplicate articles

Security & Compliance

Failure Modes & Recovery

Failure	Fallback	Impact	SLA
ZoomInfo API down	Use Clearbit + LinkedIn as backup sources	Slightly lower data quality, 90% coverage maintained	99.5%
LLM API timeout	Use cached insights from similar accounts, or template-based insights	Degraded insight quality, but not blocked	99.0%
Salesforce sync fails	Queue for retry (3 attempts over 1 hour)	Delayed sync, eventual consistency	99.5%
Signal detection low confidence	Flag for human review, show raw data to SDR	Quality maintained, manual effort required	100% (safety first)
Database connection lost	Read from replica, queue writes	Read-only mode for 5-10 min	99.9%

Advanced ML Patterns

Production ML engineering beyond basic LLM calls

RAG vs Fine-Tuning

Hallucination Detection

Evaluation Framework

Dataset Curation

Agentic RAG

Tech Stack Summary

Frontend

Next.js 14 (App Router), React, TailwindCSS, shadcn/ui

Backend

Node.js (Express/Fastify) or Python (FastAPI)

LLMs

OpenAI GPT-4, Anthropic Claude 3.5 Sonnet, DeepSeek (self-hosted fallback)

Orchestration

LangGraph or CrewAI

Database

PostgreSQL (Aurora/RDS), Redis (ElastiCache/Upstash)

Vector DB

Pinecone or Weaviate

Message Queue

Redis (BullMQ) for startup, Kafka for enterprise

Compute

Vercel/Netlify (startup), Lambda/Cloud Run (scale), EKS/GKE (enterprise)

Monitoring

Datadog or Grafana + Prometheus

Security

Auth0/Cognito (auth), AWS KMS (secrets), Presidio (PII detection)

CRM Integration

jsforce (Salesforce), HubSpot Node SDK

Data Enrichment

ZoomInfo API, Clearbit API, Playwright (LinkedIn scraping)

🏗️

Need Architecture Review?

We'll audit your sales intelligence system, identify bottlenecks, and show you how to scale 10x while cutting costs 50%.

Sales Intelligence System Architecture 🏗️

From prompts to production sales intelligence.

Key Assumptions

System Requirements

Functional

Non-Functional (SLOs)

Agent Layer

planner

enrichment_executor

signal_detector

insight_generator

evaluator

guardrail

ML Layer

Feature Store

Model Registry

Observability

Metrics

Dashboards

Traces

Deployment Variants

🚀 Startup

🏢 Enterprise

Risks & Mitigations

⚠️ LinkedIn scraping detected, account banned

⚠️ LLM API costs spiral out of control

⚠️ Signal detection false positives annoy sales team

⚠️ CRM sync conflicts overwrite manual updates

⚠️ GDPR violation from storing EU customer data in US

⚠️ Key engineer leaves, system becomes unmaintainable

⚠️ Enrichment sources change APIs, break integrations

Evolution Roadmap

Phase 1: MVP (0-3 months)

Phase 2: Scale (3-6 months)

Phase 3: Enterprise (6-12 months)

Complete Systems Architecture

Request Flow - Account Enrichment

Sales Intelligence - Hub Orchestration

Sales Intelligence - Feedback & Refinement Network

Data Flow - End-to-End

Scaling Patterns

Key Integrations

Salesforce CRM

ZoomInfo

LinkedIn Sales Navigator

News APIs

Security & Compliance

Failure Modes & Recovery

Advanced ML Patterns

RAG vs Fine-Tuning

Hallucination Detection

Evaluation Framework

Dataset Curation

Agentic RAG

Tech Stack Summary

Need Architecture Review?