← Wednesday's Workflows

Feedback Analysis System Architecture πŸ—οΈ

From 1,000 to 100,000 feedback items/month with AI-powered theme extraction

August 28, 2025
πŸ“Š ProductπŸ—οΈ ArchitectureπŸ€– Multi-AgentπŸ“ˆ Scalable

From prompts to production feedback intelligence.

Monday: 3 core prompts (extraction, theme clustering, prioritization). Tuesday: automated pipeline code. Wednesday: team workflows (PM, Eng, CS). Thursday: complete technical architecture with multi-agent orchestration, NLP engine, and production scaling patterns.

Key Assumptions

  • β€’Feedback volume: 1K-100K items/month across channels (support tickets, surveys, reviews, sales calls)
  • β€’Data sources: Intercom, Zendesk, Typeform, G2, App Store, Salesforce notes
  • β€’Compliance: SOC2 Type II, GDPR (PII redaction), no PHI/PCI
  • β€’Deployment: Cloud-native (AWS/GCP/Azure), multi-region for enterprise
  • β€’Team size: 1-2 engineers for startup, 5-10 for enterprise

System Requirements

Functional

  • Ingest feedback from 6+ sources (API polling, webhooks, CSV uploads)
  • Extract structured data (sentiment, topic, feature request, bug, customer segment)
  • Cluster similar feedback into themes (unsupervised + LLM-assisted)
  • Prioritize themes by impact score (volume Γ— sentiment Γ— customer tier)
  • Generate actionable insights (weekly summaries, trend detection)
  • Route high-priority items to Linear/Jira with context
  • Support multi-language feedback (auto-translate to English for analysis)

Non-Functional (SLOs)

latency p95 ms5000
freshness min15
availability percent99.5
accuracy percent95
theme recall percent90

πŸ’° Cost Targets: {"per_feedback_item_usd":0.02,"per_theme_cluster_usd":0.5,"monthly_infra_startup_usd":500,"monthly_infra_enterprise_usd":5000}

Agent Layer

planner

L4

Orchestrate feedback processing pipeline, route tasks to specialized agents

πŸ”§ task_decomposer, agent_selector, priority_router

⚑ Recovery: Retry with exponential backoff (3x), Fallback to manual queue if all retries fail, Log failure context for debugging

executor

L3

Execute the primary feedback analysis workflow (extraction β†’ clustering β†’ prioritization)

πŸ”§ extraction_agent, theme_agent, prioritization_agent, linear_adapter

⚑ Recovery: Partial batch processing (skip failed items, continue with rest), Retry failed items individually, Mark items for human review if extraction confidence < 0.7

evaluator

L3

Validate output quality, detect anomalies, trigger reprocessing if needed

πŸ”§ accuracy_checker, anomaly_detector, drift_monitor

⚑ Recovery: Flag low-confidence items for human review, Trigger reprocessing with alternative model if quality < threshold, Alert on-call engineer if systemic quality drop detected

guardrail

L4

Enforce safety policies, redact PII, block toxic content, ensure compliance

πŸ”§ pii_detector (AWS Comprehend or spaCy), toxicity_classifier, policy_engine

⚑ Recovery: Block processing if PII detection fails (fail-safe), Quarantine items with policy violations, Alert security team on repeated violations

extraction

L2

Extract structured data from raw feedback (sentiment, topic, feature request, bug)

πŸ”§ openai_gpt4, anthropic_claude, sentiment_classifier

⚑ Recovery: Retry with alternative model if primary fails, Use rule-based fallback for simple cases, Flag for human review if confidence < 0.7

theme

L3

Cluster similar feedback into themes using embeddings + HDBSCAN

πŸ”§ embedding_service, hdbscan_clusterer, theme_labeler (LLM-based)

⚑ Recovery: Fallback to simple keyword-based clustering if embedding service fails, Manual theme assignment for outliers, Reclustering with adjusted parameters if quality metrics degrade

prioritization

L3

Score themes by impact (volume Γ— sentiment Γ— customer tier) and urgency

πŸ”§ impact_scorer, urgency_calculator, linear_api

⚑ Recovery: Use default weights if business rules unavailable, Manual prioritization for high-value customers, Alert PM if critical theme detected

ML Layer

Feature Store

Update: Real-time for sentiment/recency, daily batch for customer data

  • β€’ feedback_sentiment (real-time)
  • β€’ customer_tier (batch)
  • β€’ customer_mrr (batch)
  • β€’ feedback_volume_7d (batch)
  • β€’ theme_recency (real-time)
  • β€’ customer_churn_risk (batch)

Model Registry

Strategy: Semantic versioning (major.minor.patch), Git-based lineage

  • β€’ sentiment_classifier
  • β€’ topic_extractor
  • β€’ theme_labeler

Observability

Metrics

  • πŸ“Š feedback_ingestion_rate
  • πŸ“Š extraction_latency_p95_ms
  • πŸ“Š extraction_accuracy_percent
  • πŸ“Š theme_clustering_time_ms
  • πŸ“Š theme_count_total
  • πŸ“Š priority_score_distribution
  • πŸ“Š linear_issue_creation_rate
  • πŸ“Š llm_api_error_rate
  • πŸ“Š llm_cost_per_item_usd
  • πŸ“Š queue_depth
  • πŸ“Š worker_utilization_percent

Dashboards

  • πŸ“ˆ ops_dashboard
  • πŸ“ˆ ml_dashboard
  • πŸ“ˆ cost_dashboard
  • πŸ“ˆ quality_dashboard

Traces

βœ… Enabled

Deployment Variants

πŸš€ Startup

Infrastructure:

  • β€’ AWS Lambda (serverless)
  • β€’ RDS PostgreSQL (single instance)
  • β€’ Pinecone (managed vector DB)
  • β€’ OpenAI API (GPT-4 + embeddings)
  • β€’ S3 (raw feedback storage)
  • β€’ CloudWatch (basic monitoring)

β†’ Deploy in 1 week

β†’ No Kubernetes complexity

β†’ Pay-per-use pricing

β†’ Single region (us-east-1)

β†’ Manual scaling (Lambda auto-scales)

β†’ Cost: $200-500/mo for 1K-10K items

🏒 Enterprise

Infrastructure:

  • β€’ EKS (Kubernetes for agent orchestration)
  • β€’ Aurora PostgreSQL Global Database (multi-region)
  • β€’ Weaviate (self-hosted vector DB in VPC)
  • β€’ Multi-LLM (OpenAI + Anthropic + local Llama)
  • β€’ S3 + Glacier (compliance archival)
  • β€’ Datadog (full observability)
  • β€’ VPC with private subnets
  • β€’ AWS KMS (customer-managed keys)
  • β€’ Multi-region deployment (US + EU)

β†’ Data residency (GDPR compliance)

β†’ BYO encryption keys

β†’ SSO/SAML integration

β†’ 99.9% uptime SLA

β†’ Dedicated support

β†’ Cost: $5K-10K/mo for 50K-100K items

πŸ“ˆ Migration: Start with startup architecture, migrate to EKS when volume exceeds 20K items/month. Add multi-region when EU customers require data residency. Transition to self-hosted vector DB when Pinecone costs exceed $1K/mo.

Risks & Mitigations

⚠️ LLM API cost explosion (100K items/mo = $2K+/mo)

High

βœ“ Mitigation: Implement cost guardrails (max $5K/mo), use smaller models for simple tasks (GPT-3.5 for sentiment), cache embeddings, batch processing to reduce API calls by 40%

⚠️ Theme quality degradation over time (model drift)

Medium

βœ“ Mitigation: Weekly PM review of 50 random themes, automated drift detection (embedding distribution shift), monthly model retraining with new data, A/B testing before deployment

⚠️ PII leakage (customer data exposed in logs/LLM prompts)

Low

βœ“ Mitigation: Guardrail Agent blocks processing if PII detection fails, no PII in application logs, encrypted storage (AES-256), audit trail for all data access, SOC2 compliance audit

⚠️ Integration failures (Intercom/Linear API changes)

Medium

βœ“ Mitigation: Version pinning for APIs, adapter pattern for easy swapping, automated integration tests (daily), fallback to manual queue if API unavailable, monitoring for API deprecation notices

⚠️ False positives in prioritization (low-value themes ranked high)

Medium

βœ“ Mitigation: PM review of top 10 themes weekly, feedback loop to adjust scoring weights, A/B test new scoring algorithms, manual override capability, track Linear issue resolution rate

⚠️ Vendor lock-in (Pinecone, OpenAI)

High

βœ“ Mitigation: Abstract integrations behind adapters, support multiple LLM providers (OpenAI + Anthropic + local), plan for self-hosted vector DB (Weaviate) for enterprise, export data regularly

⚠️ Scalability bottleneck at 50K+ items/month

Medium

βœ“ Mitigation: Horizontal scaling with ECS/EKS, read replicas for PostgreSQL, caching layer (Redis), batch processing for non-urgent items, auto-scaling based on queue depth

Evolution Roadmap

1

Phase 1: MVP (0-3 months)

Weeks 1-12
  • β†’ Process 1K-5K feedback items/month
  • β†’ Integrate Intercom + Zendesk
  • β†’ Basic theme extraction (keyword-based + LLM)
  • β†’ Manual prioritization with AI suggestions
2

Phase 2: Scale (3-6 months)

Weeks 13-24
  • β†’ Scale to 10K-20K items/month
  • β†’ Add Linear/Jira integration
  • β†’ Automated prioritization (no manual review)
  • β†’ Multi-language support (auto-translate)
3

Phase 3: Enterprise (6-12 months)

Weeks 25-52
  • β†’ Scale to 50K-100K items/month
  • β†’ Multi-region deployment (US + EU)
  • β†’ SSO/SAML integration
  • β†’ Advanced ML (active learning, drift detection)
  • β†’ 99.9% uptime SLA

Complete Systems Architecture

9-layer architecture from ingestion to insights

Presentation
Admin Dashboard (React)
Insight Reports (PDF/Email)
API Client SDKs
API Gateway
Load Balancer (ALB/Cloud LB)
Rate Limiter (Redis)
Auth Middleware (JWT/OAuth)
Agent Layer
Planner Agent
Executor Agent
Evaluator Agent
Guardrail Agent
Extraction Agent
Theme Agent
Prioritization Agent
ML Layer
Feature Store (Feast/Tecton)
Model Registry (MLflow)
Embedding Service (OpenAI/Cohere)
Clustering Engine (HDBSCAN)
Evaluation Pipeline
Integration
Intercom Adapter
Zendesk Adapter
Linear/Jira Adapter
Webhook Receiver
Translation API (DeepL/Google)
Data
PostgreSQL (metadata)
Vector DB (Pinecone/Weaviate)
S3/GCS (raw feedback)
Redis (cache)
External
OpenAI API
Anthropic API
Intercom API
Zendesk API
Linear API
Observability
CloudWatch/Datadog
Sentry (errors)
Grafana (dashboards)
MLflow (model metrics)
Security
AWS KMS (secrets)
IAM/RBAC
Audit Logger
PII Redactor

Sequence Diagram - Feedback Processing Flow

IntercomAPI GatewayPlanner AgentExtraction AgentGuardrail AgentTheme AgentPrioritization AgentLinear APIPOST /webhook (new ticket)route(ticket_data)extract(text)validate(extracted_json)cluster(embeddings)score(theme_id, metadata)decision(priority=high)POST /issue (auto-create)200 OK

Feedback Analysis System - Hub Architecture

7 Components
[RPC]Raw feedback validation[RPC]Execute analysis pipeline[RPC]Extract structured data[RPC]Cluster feedback themes[RPC]Score & rank themes[Event]Extraction results[Event]Theme clusters[Event]Prioritized themes[RPC]Validate final output[Event]Quality metrics & flags[Event]Sanitized feedbackPipeline Orchestrator4 capabilitiesWorkflow Executor4 capabilitiesQuality Validator4 capabilitiesSafety Guardian4 capabilitiesNLP Extraction Agent4 capabilitiesTheme Clustering Agent4 capabilitiesPrioritization Engine4 capabilities
HTTP
REST
gRPC
Event
Stream
WebSocket

Feedback Analysis System - Feedback Loops & Refinement

6 Components
[Stream]Sanitized feedback[RPC]Extraction results[Feedback]Reprocess low-confidence[Stream]Structured feedback[RPC]Cluster validation[Feedback]Refine clusters[Stream]Theme groups[RPC]Score validation[Feedback]Model updates[Feedback]Clustering tuning[Feedback]Scoring adjustments[Event]Quality metrics[Event]Performance dataSafety Guardian4 capabilitiesNLP Extraction Agent4 capabilitiesQuality Validator4 capabilitiesTheme Clustering Agent4 capabilitiesPrioritization Engine4 capabilitiesFeedback Loop Monitor4 capabilities
HTTP
REST
gRPC
Event
Stream
WebSocket

Data Flow - Feedback to Action

From Intercom ticket to Linear issue in 15 minutes

1
Intercom0s
New support ticket created β†’ Raw text + customer metadata
2
Webhook Receiver50ms
Receives webhook, queues for processing β†’ Ticket ID + payload
3
Planner Agent100ms
Generates execution plan β†’ Agent sequence + routing decision
4
Guardrail Agent200ms
Redacts PII, checks toxicity β†’ Sanitized text
5
Extraction Agent2500ms
Extracts structured data (sentiment, topic, etc.) β†’ JSON with confidence scores
6
Embedding Service300ms
Generates vector embedding β†’ 1536-dim vector
7
Theme Agent800ms
Clusters into existing/new theme β†’ Theme ID + similarity score
8
Prioritization Agent300ms
Calculates impact score β†’ Priority score (0-100)
9
Evaluator Agent500ms
Validates output quality β†’ Quality report + confidence
10
Linear Adapter500ms
Creates Linear issue (if priority > 80) β†’ Issue ID + URL
11
Notification Service200ms
Alerts PM via Slack β†’ Theme summary + Linear link
12
Database100ms
Persists all data (feedback, theme, score) β†’ Complete audit trail

Scaling Patterns

Volume
0-1,000 items/month
Pattern
Serverless Monolith
Architecture
β€’ AWS Lambda (single function)
β€’ OpenAI API (GPT-4)
β€’ PostgreSQL (RDS)
β€’ S3 (raw feedback storage)
Cost
$200/mo
5-8s per item
Volume
1K-10K items/month
Pattern
Queue + Workers
Architecture
β€’ API Gateway (REST)
β€’ SQS (message queue)
β€’ Lambda workers (parallel processing)
β€’ RDS PostgreSQL
β€’ Redis (cache)
β€’ Pinecone (vector DB)
Cost
$500/mo
3-5s per item
Volume
10K-50K items/month
Pattern
Multi-Agent Orchestration
Architecture
β€’ Load Balancer (ALB)
β€’ ECS Fargate (agent containers)
β€’ Step Functions (orchestration)
β€’ Aurora PostgreSQL (read replicas)
β€’ ElastiCache Redis
β€’ Pinecone (vector DB)
β€’ CloudWatch (observability)
Cost
$2,000/mo
2-4s per item
Volume
50K-100K+ items/month
Pattern
Enterprise Multi-Region
Architecture
β€’ Global Load Balancer
β€’ EKS (Kubernetes)
β€’ Kafka (event streaming)
β€’ Multi-LLM failover (OpenAI + Anthropic + local models)
β€’ Aurora Global Database
β€’ Weaviate (self-hosted vector DB)
β€’ Datadog (full observability)
β€’ Multi-region deployment (US, EU)
Cost
$8,000+/mo
1-3s per item

Key Integrations

Intercom (Support Tickets)

Protocol: REST API + Webhooks
Webhook: New ticket created β†’ Queue
API: Fetch ticket details + conversation history
API: Update ticket with AI-generated tags

Zendesk (Support Tickets)

Protocol: REST API + Webhooks
Webhook: Ticket status change β†’ Queue
API: Fetch ticket + comments
API: Add internal note with theme analysis

Linear (Issue Tracking)

Protocol: GraphQL API
POST: Create issue with theme summary
POST: Add comment with related feedback links
GET: Fetch issue status for reporting

Jira (Issue Tracking)

Protocol: REST API
POST: Create issue in Product backlog
POST: Link related tickets
GET: Fetch issue metadata for analytics

OpenAI (Embeddings + GPT-4)

Protocol: REST API
POST /embeddings: Generate 1536-dim vectors
POST /chat/completions: Extract structured data

Anthropic (Claude for extraction)

Protocol: REST API
POST /messages: Extract sentiment + topics with Claude

Security & Compliance

Failure Modes & Recovery

FailureFallbackImpactSLA
OpenAI API downFailover to Anthropic Claude β†’ Local Llama model β†’ Manual queueLatency +2s, accuracy -3%, no data loss99.5%
Extraction confidence < 0.7Flag for human review, do not clusterQuality maintained, throughput reduced99.9%
Theme clustering fails (HDBSCAN error)Fallback to keyword-based clusteringLower quality themes, manual review needed99.0%
Linear API timeoutRetry 3x with exponential backoff β†’ Queue for manual creationDelayed issue creation (up to 1h)99.5%
PII detection service failsBlock all processing (fail-safe mode)No feedback processed until fixed100%
Vector DB (Pinecone) unavailableQueue embeddings for later clusteringNo real-time theme assignment, batch processing later99.0%
Database (PostgreSQL) read replica lag > 5minRead from primary (higher load)Increased primary DB load, potential slowdown99.9%

Multi-Agent Architecture

How specialized agents collaborate autonomously

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Planner   β”‚ ← Orchestrates all agents
β”‚    Agent    β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
   β”Œβ”€β”€β”€β”΄β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚        β”‚          β”‚          β”‚          β”‚
β”Œβ”€β”€β–Όβ”€β”€β”€β” β”Œβ”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β–Όβ”€β”€β”€β”€β”€β” β”Œβ”€β–Όβ”€β”€β”€β”€β”
β”‚Guard β”‚ β”‚Extractβ”‚ β”‚ Theme  β”‚ β”‚Priorityβ”‚ β”‚Eval  β”‚
β”‚rail  β”‚ β”‚ Agent β”‚ β”‚ Agent  β”‚ β”‚ Agent  β”‚ β”‚Agent β”‚
β””β”€β”€β”¬β”€β”€β”€β”˜ β””β”€β”€β”¬β”€β”€β”€β”˜ β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”¬β”€β”€β”€β”˜
   β”‚        β”‚         β”‚          β”‚         β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
          β”Œβ”€β”€β–Όβ”€β”€β”€β”€β”€β”
          β”‚Executorβ”‚
          β”‚ Agent  β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
          β”Œβ”€β”€β–Όβ”€β”€β”€β”€β”€β”
          β”‚ Linear β”‚
          β”‚  API   β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Agent Collaboration Flow

1
Planner Agent
Receives feedback item, generates execution plan (guardrail β†’ extract β†’ theme β†’ priority)
2
Guardrail Agent
Redacts PII, checks toxicity β†’ Returns sanitized text or blocks processing
3
Extraction Agent
Extracts sentiment, topic, feature request, bug β†’ Returns JSON with confidence scores
4
Theme Agent
Generates embedding, clusters into theme β†’ Returns theme ID + similarity score
5
Prioritization Agent
Calculates impact score (volume Γ— sentiment Γ— customer tier) β†’ Returns priority (0-100)
6
Evaluator Agent
Validates output quality, checks confidence scores β†’ Flags for human review if needed
7
Executor Agent
If priority > 80: Create Linear issue. If < 80: Store for batch analysis
8
Planner Agent
Logs execution, updates dashboard, triggers notifications

Reactive Agent

Extraction Agent - Responds to input, returns output
Autonomy: LowStateless

Reflexive Agent

Guardrail Agent - Uses rules + context (PII patterns, toxicity thresholds)
Autonomy: MediumReads policy DB

Deliberative Agent

Theme Agent - Plans clustering strategy based on existing themes
Autonomy: HighStateful (reads theme history)

Orchestrator Agent

Planner Agent - Makes routing decisions, handles loops, coordinates all agents
Autonomy: HighestFull state management

Levels of Autonomy

L1
Tool
Human calls, agent responds
β†’ Monday's prompts (manual copy-paste)
L2
Chained Tools
Sequential execution, no branching
β†’ Tuesday's code (extract β†’ cluster β†’ prioritize)
L3
Agent
Makes decisions, can loop, handles failures
β†’ Planner Agent (routes based on priority, retries on failure)
L4
Multi-Agent
Agents collaborate autonomously, self-healing
β†’ This system (agents coordinate, failover, self-optimize)

Advanced ML/AI Patterns

Production ML engineering beyond basic API calls

RAG vs Fine-Tuning

Product terminology changes frequently (new features, rebranding). RAG allows daily updates without retraining. Fine-tuning would require monthly retraining at $2K+ per cycle.
βœ… RAG (Chosen)
Cost: $100/mo (vector DB)
Update: Daily (add new product docs)
How:
❌ Fine-Tuning
Cost: $2K/mo (retraining)
Update: Monthly
How:
Implementation: Vector DB (Pinecone) with product docs, feature specs, past themes. Retrieved during theme labeling for context-aware naming.

Hallucination Detection

LLMs hallucinate feature requests that don't exist, or invent customer quotes
L1
Confidence scores (< 0.7 = flag for review)
L2
Cross-reference with product roadmap DB
L3
Fact-check customer quotes against original text
L4
Human review queue for low-confidence items
Hallucination rate: 1.2% detected, 0.1% escaped to production (caught by PM review)

Evaluation Framework

Extraction Accuracy
96.3%target: 95%+
Theme Recall
91.7%target: 90%+
Theme Precision
88.2%target: 85%+
Prioritization Accuracy
83.5%target: 80%+
PM Satisfaction Score
4.2/5target: 4.0/5
Testing: Shadow mode: 500 feedback items processed in parallel (AI vs manual). Weekly PM review of 50 random themes.

Dataset Curation

1
Collect: 5K feedback items - Historical data from Intercom/Zendesk
2
Clean: 4.2K usable - Remove spam, duplicates, non-English
3
Label: 4.2K labeled - ($$8.4K)
4
Augment: +800 synthetic - GPT-4 generated edge cases (rare features, multi-language)
β†’ 5K high-quality examples, inter-rater agreement (Cohen's Kappa): 0.88

Agentic RAG

Theme Agent iteratively retrieves context based on reasoning
Feedback mentions 'dark mode' β†’ RAG retrieves past dark mode requests β†’ Agent reasons 'need UI context' β†’ RAG retrieves UI specs β†’ Theme labeled 'UI: Dark Mode Implementation' with full context
πŸ’‘ Not one-shot retrieval. Agent decides what additional context it needs, leading to more accurate theme labeling.

Active Learning Loop

Tech Stack Summary

LLMs
OpenAI GPT-4, Anthropic Claude, local Llama (fallback)
Embeddings
OpenAI text-embedding-3-large (1536-dim)
Vector DB
Pinecone (startup), Weaviate (enterprise)
Orchestration
LangGraph, AWS Step Functions, or custom Python
Database
PostgreSQL (RDS/Aurora)
Queue
AWS SQS (startup), Kafka (enterprise)
Compute
AWS Lambda (startup), ECS Fargate (scale), EKS (enterprise)
Monitoring
CloudWatch (startup), Datadog (enterprise)
Security
AWS KMS, IAM, Secrets Manager, Comprehend (PII)
ML Ops
MLflow (model registry), Feast (feature store)
πŸ—οΈ

Need a Custom Feedback System?

We'll design and build your feedback analysis pipeline - from architecture to production deployment.