Feedback Analysis System Architecture - Product AI Design | Randeep Bhatia

From prompts to production feedback intelligence.

Monday: 3 core prompts (extraction, theme clustering, prioritization). Tuesday: automated pipeline code. Wednesday: team workflows (PM, Eng, CS). Thursday: complete technical architecture with multi-agent orchestration, NLP engine, and production scaling patterns.

Key Assumptions

•Feedback volume: 1K-100K items/month across channels (support tickets, surveys, reviews, sales calls)
•Data sources: Intercom, Zendesk, Typeform, G2, App Store, Salesforce notes
•Compliance: SOC2 Type II, GDPR (PII redaction), no PHI/PCI
•Deployment: Cloud-native (AWS/GCP/Azure), multi-region for enterprise
•Team size: 1-2 engineers for startup, 5-10 for enterprise

System Requirements

Functional

Ingest feedback from 6+ sources (API polling, webhooks, CSV uploads)
Extract structured data (sentiment, topic, feature request, bug, customer segment)
Cluster similar feedback into themes (unsupervised + LLM-assisted)
Prioritize themes by impact score (volume × sentiment × customer tier)
Generate actionable insights (weekly summaries, trend detection)
Route high-priority items to Linear/Jira with context
Support multi-language feedback (auto-translate to English for analysis)

Non-Functional (SLOs)

latency p95 ms5000

freshness min15

availability percent99.5

accuracy percent95

theme recall percent90

💰 Cost Targets: {"per_feedback_item_usd":0.02,"per_theme_cluster_usd":0.5,"monthly_infra_startup_usd":500,"monthly_infra_enterprise_usd":5000}

Agent Layer

planner

Orchestrate feedback processing pipeline, route tasks to specialized agents

🔧 task_decomposer, agent_selector, priority_router

⚡ Recovery: Retry with exponential backoff (3x), Fallback to manual queue if all retries fail, Log failure context for debugging

executor

Execute the primary feedback analysis workflow (extraction → clustering → prioritization)

🔧 extraction_agent, theme_agent, prioritization_agent, linear_adapter

⚡ Recovery: Partial batch processing (skip failed items, continue with rest), Retry failed items individually, Mark items for human review if extraction confidence < 0.7

evaluator

Validate output quality, detect anomalies, trigger reprocessing if needed

🔧 accuracy_checker, anomaly_detector, drift_monitor

⚡ Recovery: Flag low-confidence items for human review, Trigger reprocessing with alternative model if quality < threshold, Alert on-call engineer if systemic quality drop detected

guardrail

Enforce safety policies, redact PII, block toxic content, ensure compliance

🔧 pii_detector (AWS Comprehend or spaCy), toxicity_classifier, policy_engine

⚡ Recovery: Block processing if PII detection fails (fail-safe), Quarantine items with policy violations, Alert security team on repeated violations

extraction

Extract structured data from raw feedback (sentiment, topic, feature request, bug)

🔧 openai_gpt4, anthropic_claude, sentiment_classifier

⚡ Recovery: Retry with alternative model if primary fails, Use rule-based fallback for simple cases, Flag for human review if confidence < 0.7

theme

Cluster similar feedback into themes using embeddings + HDBSCAN

🔧 embedding_service, hdbscan_clusterer, theme_labeler (LLM-based)

⚡ Recovery: Fallback to simple keyword-based clustering if embedding service fails, Manual theme assignment for outliers, Reclustering with adjusted parameters if quality metrics degrade

prioritization

Score themes by impact (volume × sentiment × customer tier) and urgency

🔧 impact_scorer, urgency_calculator, linear_api

⚡ Recovery: Use default weights if business rules unavailable, Manual prioritization for high-value customers, Alert PM if critical theme detected

ML Layer

Feature Store

Update: Real-time for sentiment/recency, daily batch for customer data

• feedback_sentiment (real-time)
• customer_tier (batch)
• customer_mrr (batch)
• feedback_volume_7d (batch)
• theme_recency (real-time)
• customer_churn_risk (batch)

Model Registry

Strategy: Semantic versioning (major.minor.patch), Git-based lineage

• sentiment_classifier
• topic_extractor
• theme_labeler

Observability

Metrics

📊 feedback_ingestion_rate
📊 extraction_latency_p95_ms
📊 extraction_accuracy_percent
📊 theme_clustering_time_ms
📊 theme_count_total
📊 priority_score_distribution
📊 linear_issue_creation_rate
📊 llm_api_error_rate
📊 llm_cost_per_item_usd
📊 queue_depth
📊 worker_utilization_percent

Dashboards

📈 ops_dashboard
📈 ml_dashboard
📈 cost_dashboard
📈 quality_dashboard

Traces

✅ Enabled

Deployment Variants

🚀 Startup

Infrastructure:

• AWS Lambda (serverless)
• RDS PostgreSQL (single instance)
• Pinecone (managed vector DB)
• OpenAI API (GPT-4 + embeddings)
• S3 (raw feedback storage)
• CloudWatch (basic monitoring)

→ Deploy in 1 week

→ No Kubernetes complexity

→ Pay-per-use pricing

→ Single region (us-east-1)

→ Manual scaling (Lambda auto-scales)

→ Cost: $200-500/mo for 1K-10K items

🏢 Enterprise

Infrastructure:

• EKS (Kubernetes for agent orchestration)
• Aurora PostgreSQL Global Database (multi-region)
• Weaviate (self-hosted vector DB in VPC)
• Multi-LLM (OpenAI + Anthropic + local Llama)
• S3 + Glacier (compliance archival)
• Datadog (full observability)
• VPC with private subnets
• AWS KMS (customer-managed keys)
• Multi-region deployment (US + EU)

→ Data residency (GDPR compliance)

→ BYO encryption keys

→ SSO/SAML integration

→ 99.9% uptime SLA

→ Dedicated support

→ Cost: $5K-10K/mo for 50K-100K items

📈 Migration: Start with startup architecture, migrate to EKS when volume exceeds 20K items/month. Add multi-region when EU customers require data residency. Transition to self-hosted vector DB when Pinecone costs exceed $1K/mo.

Risks & Mitigations

⚠️ LLM API cost explosion (100K items/mo = $2K+/mo)

High

✓ Mitigation: Implement cost guardrails (max $5K/mo), use smaller models for simple tasks (GPT-3.5 for sentiment), cache embeddings, batch processing to reduce API calls by 40%

⚠️ Theme quality degradation over time (model drift)

Medium

✓ Mitigation: Weekly PM review of 50 random themes, automated drift detection (embedding distribution shift), monthly model retraining with new data, A/B testing before deployment

⚠️ PII leakage (customer data exposed in logs/LLM prompts)

Low

✓ Mitigation: Guardrail Agent blocks processing if PII detection fails, no PII in application logs, encrypted storage (AES-256), audit trail for all data access, SOC2 compliance audit

⚠️ Integration failures (Intercom/Linear API changes)

Medium

✓ Mitigation: Version pinning for APIs, adapter pattern for easy swapping, automated integration tests (daily), fallback to manual queue if API unavailable, monitoring for API deprecation notices

⚠️ False positives in prioritization (low-value themes ranked high)

Medium

✓ Mitigation: PM review of top 10 themes weekly, feedback loop to adjust scoring weights, A/B test new scoring algorithms, manual override capability, track Linear issue resolution rate

⚠️ Vendor lock-in (Pinecone, OpenAI)

High

✓ Mitigation: Abstract integrations behind adapters, support multiple LLM providers (OpenAI + Anthropic + local), plan for self-hosted vector DB (Weaviate) for enterprise, export data regularly

⚠️ Scalability bottleneck at 50K+ items/month

Medium

✓ Mitigation: Horizontal scaling with ECS/EKS, read replicas for PostgreSQL, caching layer (Redis), batch processing for non-urgent items, auto-scaling based on queue depth

Evolution Roadmap

Phase 1: MVP (0-3 months)

Weeks 1-12

→ Process 1K-5K feedback items/month
→ Integrate Intercom + Zendesk
→ Basic theme extraction (keyword-based + LLM)
→ Manual prioritization with AI suggestions

Phase 2: Scale (3-6 months)

Weeks 13-24

→ Scale to 10K-20K items/month
→ Add Linear/Jira integration
→ Automated prioritization (no manual review)
→ Multi-language support (auto-translate)

Phase 3: Enterprise (6-12 months)

Weeks 25-52

→ Scale to 50K-100K items/month
→ Multi-region deployment (US + EU)
→ SSO/SAML integration
→ Advanced ML (active learning, drift detection)
→ 99.9% uptime SLA

Complete Systems Architecture

9-layer architecture from ingestion to insights

Presentation

Admin Dashboard (React)

Insight Reports (PDF/Email)

API Client SDKs

API Gateway

Load Balancer (ALB/Cloud LB)

Rate Limiter (Redis)

Auth Middleware (JWT/OAuth)

Agent Layer

Planner Agent

Executor Agent

Evaluator Agent

Guardrail Agent

Extraction Agent

Theme Agent

Prioritization Agent

ML Layer

Feature Store (Feast/Tecton)

Model Registry (MLflow)

Embedding Service (OpenAI/Cohere)

Clustering Engine (HDBSCAN)

Evaluation Pipeline

Integration

Intercom Adapter

Zendesk Adapter

Linear/Jira Adapter

Webhook Receiver

Translation API (DeepL/Google)

Data

PostgreSQL (metadata)

Vector DB (Pinecone/Weaviate)

S3/GCS (raw feedback)

Redis (cache)

External

OpenAI API

Anthropic API

Intercom API

Zendesk API

Linear API

Observability

CloudWatch/Datadog

Sentry (errors)

Grafana (dashboards)

MLflow (model metrics)

Security

AWS KMS (secrets)

IAM/RBAC

Audit Logger

PII Redactor

Sequence Diagram - Feedback Processing Flow

Feedback Analysis System - Hub Architecture

7 Components

HTTP

REST

gRPC

Event

Stream

WebSocket

Feedback Analysis System - Feedback Loops & Refinement

6 Components

HTTP

REST

gRPC

Event

Stream

WebSocket

Data Flow - Feedback to Action

From Intercom ticket to Linear issue in 15 minutes

Intercom0s

New support ticket created → Raw text + customer metadata

Webhook Receiver50ms

Receives webhook, queues for processing → Ticket ID + payload

Planner Agent100ms

Generates execution plan → Agent sequence + routing decision

Guardrail Agent200ms

Redacts PII, checks toxicity → Sanitized text

Extraction Agent2500ms

Extracts structured data (sentiment, topic, etc.) → JSON with confidence scores

Embedding Service300ms

Generates vector embedding → 1536-dim vector

Theme Agent800ms

Clusters into existing/new theme → Theme ID + similarity score

Prioritization Agent300ms

Calculates impact score → Priority score (0-100)

Evaluator Agent500ms

Validates output quality → Quality report + confidence

Linear Adapter500ms

Creates Linear issue (if priority > 80) → Issue ID + URL

Notification Service200ms

Alerts PM via Slack → Theme summary + Linear link

Database100ms

Persists all data (feedback, theme, score) → Complete audit trail

Scaling Patterns

Volume

0-1,000 items/month

Pattern

Serverless Monolith

Architecture

• AWS Lambda (single function)

• OpenAI API (GPT-4)

• PostgreSQL (RDS)

• S3 (raw feedback storage)

Cost

$200/mo

5-8s per item

Volume

1K-10K items/month

Pattern

Queue + Workers

Architecture

• API Gateway (REST)

• SQS (message queue)

• Lambda workers (parallel processing)

• RDS PostgreSQL

• Redis (cache)

• Pinecone (vector DB)

Cost

$500/mo

3-5s per item

Volume

10K-50K items/month

Pattern

Multi-Agent Orchestration

Architecture

• Load Balancer (ALB)

• ECS Fargate (agent containers)

• Step Functions (orchestration)

• Aurora PostgreSQL (read replicas)

• ElastiCache Redis

• Pinecone (vector DB)

• CloudWatch (observability)

Cost

$2,000/mo

2-4s per item

Volume

50K-100K+ items/month

Pattern

Enterprise Multi-Region

Architecture

• Global Load Balancer

• EKS (Kubernetes)

• Kafka (event streaming)

• Multi-LLM failover (OpenAI + Anthropic + local models)

• Aurora Global Database

• Weaviate (self-hosted vector DB)

• Datadog (full observability)

• Multi-region deployment (US, EU)

Cost

$8,000+/mo

1-3s per item

Key Integrations

Intercom (Support Tickets)

Protocol: REST API + Webhooks

Webhook: New ticket created → Queue

API: Fetch ticket details + conversation history

API: Update ticket with AI-generated tags

Zendesk (Support Tickets)

Protocol: REST API + Webhooks

Webhook: Ticket status change → Queue

API: Fetch ticket + comments

API: Add internal note with theme analysis

Linear (Issue Tracking)

Protocol: GraphQL API

POST: Create issue with theme summary

POST: Add comment with related feedback links

GET: Fetch issue status for reporting

Jira (Issue Tracking)

Protocol: REST API

POST: Create issue in Product backlog

POST: Link related tickets

GET: Fetch issue metadata for analytics

OpenAI (Embeddings + GPT-4)

Protocol: REST API

POST /embeddings: Generate 1536-dim vectors

POST /chat/completions: Extract structured data

Anthropic (Claude for extraction)

Protocol: REST API

POST /messages: Extract sentiment + topics with Claude

Security & Compliance

Failure Modes & Recovery

Failure	Fallback	Impact	SLA
OpenAI API down	Failover to Anthropic Claude → Local Llama model → Manual queue	Latency +2s, accuracy -3%, no data loss	99.5%
Extraction confidence < 0.7	Flag for human review, do not cluster	Quality maintained, throughput reduced	99.9%
Theme clustering fails (HDBSCAN error)	Fallback to keyword-based clustering	Lower quality themes, manual review needed	99.0%
Linear API timeout	Retry 3x with exponential backoff → Queue for manual creation	Delayed issue creation (up to 1h)	99.5%
PII detection service fails	Block all processing (fail-safe mode)	No feedback processed until fixed	100%
Vector DB (Pinecone) unavailable	Queue embeddings for later clustering	No real-time theme assignment, batch processing later	99.0%
Database (PostgreSQL) read replica lag > 5min	Read from primary (higher load)	Increased primary DB load, potential slowdown	99.9%

Multi-Agent Architecture

How specialized agents collaborate autonomously

┌─────────────┐
│   Planner   │ ← Orchestrates all agents
│    Agent    │
└──────┬──────┘
       │
   ┌───┴────┬──────────┬──────────┬──────────┐
   │        │          │          │          │
┌──▼───┐ ┌─▼────┐ ┌───▼────┐ ┌──▼─────┐ ┌─▼────┐
│Guard │ │Extract│ │ Theme  │ │Priority│ │Eval  │
│rail  │ │ Agent │ │ Agent  │ │ Agent  │ │Agent │
└──┬───┘ └──┬───┘ └───┬────┘ └───┬────┘ └──┬───┘
   │        │         │          │         │
   └────────┴─────────┴──────────┴─────────┘
             │
          ┌──▼─────┐
          │Executor│
          │ Agent  │
          └────────┘
             │
          ┌──▼─────┐
          │ Linear │
          │  API   │
          └────────┘

Agent Collaboration Flow

Planner Agent

Receives feedback item, generates execution plan (guardrail → extract → theme → priority)

Guardrail Agent

Redacts PII, checks toxicity → Returns sanitized text or blocks processing

Extraction Agent

Extracts sentiment, topic, feature request, bug → Returns JSON with confidence scores

Theme Agent

Generates embedding, clusters into theme → Returns theme ID + similarity score

Prioritization Agent

Calculates impact score (volume × sentiment × customer tier) → Returns priority (0-100)

Evaluator Agent

Validates output quality, checks confidence scores → Flags for human review if needed

Executor Agent

If priority > 80: Create Linear issue. If < 80: Store for batch analysis

Planner Agent

Logs execution, updates dashboard, triggers notifications

Reactive Agent

Extraction Agent - Responds to input, returns output

Autonomy: LowStateless

Reflexive Agent

Guardrail Agent - Uses rules + context (PII patterns, toxicity thresholds)

Autonomy: MediumReads policy DB

Deliberative Agent

Theme Agent - Plans clustering strategy based on existing themes

Autonomy: HighStateful (reads theme history)

Orchestrator Agent

Planner Agent - Makes routing decisions, handles loops, coordinates all agents

Autonomy: HighestFull state management

Levels of Autonomy

Tool

Human calls, agent responds

→ Monday's prompts (manual copy-paste)

Chained Tools

Sequential execution, no branching

→ Tuesday's code (extract → cluster → prioritize)

Agent

Makes decisions, can loop, handles failures

→ Planner Agent (routes based on priority, retries on failure)

Multi-Agent

Agents collaborate autonomously, self-healing

→ This system (agents coordinate, failover, self-optimize)

Advanced ML/AI Patterns

Production ML engineering beyond basic API calls

RAG vs Fine-Tuning

Product terminology changes frequently (new features, rebranding). RAG allows daily updates without retraining. Fine-tuning would require monthly retraining at $2K+ per cycle.

✅ RAG (Chosen)

Cost: $100/mo (vector DB)

Update: Daily (add new product docs)

How:

❌ Fine-Tuning

Cost: $2K/mo (retraining)

Update: Monthly

How:

Implementation: Vector DB (Pinecone) with product docs, feature specs, past themes. Retrieved during theme labeling for context-aware naming.

Hallucination Detection

LLMs hallucinate feature requests that don't exist, or invent customer quotes

Confidence scores (< 0.7 = flag for review)

Cross-reference with product roadmap DB

Fact-check customer quotes against original text

Human review queue for low-confidence items

Hallucination rate: 1.2% detected, 0.1% escaped to production (caught by PM review)

Evaluation Framework

Extraction Accuracy

96.3%target: 95%+

Theme Recall

91.7%target: 90%+

Theme Precision

88.2%target: 85%+

Prioritization Accuracy

83.5%target: 80%+

PM Satisfaction Score

4.2/5target: 4.0/5

Testing: Shadow mode: 500 feedback items processed in parallel (AI vs manual). Weekly PM review of 50 random themes.

Dataset Curation

Collect: 5K feedback items - Historical data from Intercom/Zendesk

Clean: 4.2K usable - Remove spam, duplicates, non-English

Label: 4.2K labeled - ($$8.4K)

Augment: +800 synthetic - GPT-4 generated edge cases (rare features, multi-language)

→ 5K high-quality examples, inter-rater agreement (Cohen's Kappa): 0.88

Agentic RAG

Theme Agent iteratively retrieves context based on reasoning

Feedback mentions 'dark mode' → RAG retrieves past dark mode requests → Agent reasons 'need UI context' → RAG retrieves UI specs → Theme labeled 'UI: Dark Mode Implementation' with full context

💡 Not one-shot retrieval. Agent decides what additional context it needs, leading to more accurate theme labeling.

Active Learning Loop

Tech Stack Summary

LLMs

OpenAI GPT-4, Anthropic Claude, local Llama (fallback)

Embeddings

OpenAI text-embedding-3-large (1536-dim)

Vector DB

Pinecone (startup), Weaviate (enterprise)

Orchestration

LangGraph, AWS Step Functions, or custom Python

Database

PostgreSQL (RDS/Aurora)

Queue

AWS SQS (startup), Kafka (enterprise)

Compute

AWS Lambda (startup), ECS Fargate (scale), EKS (enterprise)

Monitoring

CloudWatch (startup), Datadog (enterprise)

Security

AWS KMS, IAM, Secrets Manager, Comprehend (PII)

ML Ops

MLflow (model registry), Feast (feature store)

🏗️

Need a Custom Feedback System?

We'll design and build your feedback analysis pipeline - from architecture to production deployment.

Feedback Analysis System Architecture 🏗️

From prompts to production feedback intelligence.

Key Assumptions

System Requirements

Functional

Non-Functional (SLOs)

Agent Layer

planner

executor

evaluator

guardrail

extraction

theme

prioritization

ML Layer

Feature Store

Model Registry

Observability

Metrics

Dashboards

Traces

Deployment Variants

🚀 Startup

🏢 Enterprise

Risks & Mitigations

⚠️ LLM API cost explosion (100K items/mo = $2K+/mo)

⚠️ Theme quality degradation over time (model drift)

⚠️ PII leakage (customer data exposed in logs/LLM prompts)

⚠️ Integration failures (Intercom/Linear API changes)

⚠️ False positives in prioritization (low-value themes ranked high)

⚠️ Vendor lock-in (Pinecone, OpenAI)

⚠️ Scalability bottleneck at 50K+ items/month

Evolution Roadmap

Phase 1: MVP (0-3 months)

Phase 2: Scale (3-6 months)

Phase 3: Enterprise (6-12 months)

Complete Systems Architecture

Sequence Diagram - Feedback Processing Flow

Feedback Analysis System - Hub Architecture

Feedback Analysis System - Feedback Loops & Refinement

Data Flow - Feedback to Action

Scaling Patterns

Key Integrations

Intercom (Support Tickets)

Zendesk (Support Tickets)

Linear (Issue Tracking)

Jira (Issue Tracking)

OpenAI (Embeddings + GPT-4)

Anthropic (Claude for extraction)

Security & Compliance

Failure Modes & Recovery

Multi-Agent Architecture

Agent Collaboration Flow

Reactive Agent

Reflexive Agent

Deliberative Agent

Orchestrator Agent

Levels of Autonomy

Advanced ML/AI Patterns

RAG vs Fine-Tuning

Hallucination Detection

Evaluation Framework

Dataset Curation

Agentic RAG

Active Learning Loop

Tech Stack Summary

Need a Custom Feedback System?