Building Reactive ML Systems That Scale with Events
Event-driven architecture represents a fundamental shift in how we design ML systems, moving from synchronous request-response patterns to reactive, loosely-coupled systems that can handle millions of inference requests without breaking a sweat. In this chapter, you'll master AWS EventBridge and SQS to build ML pipelines that automatically scale, gracefully handle failures, and process predictions asynchronously across distributed systems.
Key Insight
Events Are the Native Language of Modern ML Systems
Traditional ML architectures treat inference as a synchronous operation—request comes in, model processes, response goes out. But real-world ML systems operate in a world of events: a user uploads an image, a transaction occurs, a sensor reading arrives, a document is modified.
73%
Reduction in ML inference latency variance
Organizations that migrated from synchronous to event-driven ML architectures reported 73% reduction in latency variance (p99 vs p50).
Synchronous vs Event-Driven ML Architecture
Synchronous ML
Client waits for inference completion, blocking user experie...
Direct coupling between request rate and model endpoint capa...
Failures cascade immediately to users with no retry mechanis...
Scaling requires over-provisioning for peak load, wasting 60...
Event-Driven ML
Immediate acknowledgment with async result delivery via webh...
Queue-based buffering absorbs traffic spikes without endpoin...
Built-in retry with exponential backoff and dead letter queu...
Scale to actual demand with queue depth-based auto-scaling p...
Define clear event routing rules using EventBridge patterns. Determine which events trigger which mo...
Enqueue
Buffer events in SQS queues with appropriate visibility timeouts and retention periods. Size your qu...
Acknowledge
Implement proper acknowledgment patterns—only delete messages after successful inference completion....
Compensate
Design dead letter queues and compensation logic for failed inferences. Every DLQ message should tri...
EventBridge Rule for ML Model Routingtypescript
123456789101112
import * as cdk from 'aws-cdk-lib';
import * as events from 'aws-cdk-lib/aws-events';
import * as targets from 'aws-cdk-lib/aws-events-targets';
import * as sqs from 'aws-cdk-lib/aws-sqs';
// Create queues for different model tiers
const standardInferenceQueue = new sqs.Queue(this, 'StandardInferenceQueue', {
visibilityTimeout: cdk.Duration.seconds(120),
retentionPeriod: cdk.Duration.days(7),
deadLetterQueue: {
queue: dlq,
maxReceiveCount: 3
Event Ordering Is Not Guaranteed in EventBridge
EventBridge delivers events with at-least-once semantics but does not guarantee ordering. If your ML pipeline requires sequential processing (e.g., processing video frames in order), you must implement ordering at the consumer level using sequence numbers in your event payload.
Key Insight
SQS as the Shock Absorber for ML Inference
SQS queues serve as the critical buffer between unpredictable event arrival rates and your carefully tuned ML inference capacity. Without this buffer, a sudden spike in requests—say, a viral social media post triggering 100x normal image classification requests—would either overwhelm your model endpoint or require massive over-provisioning.
Event-Driven ML Inference Pipeline Architecture
Event Sources (API, ...
EventBridge (Route &...
SQS Queues (Buffer &...
Lambda (Process & In...
Anti-Pattern: The Unbounded Fan-Out Explosion
❌ Problem
Systems that work fine in development collapse in production when real traffic h...
✓ Solution
Implement fan-out budgets: explicitly limit how many targets can receive each ev...
Implementing Your First Event-Driven Inference Pipeline
1
Define Your Event Schema
2
Create Your Event Bus and Rules
3
Configure SQS Queues with Appropriate Settings
4
Build the Lambda Consumer
5
Implement Result Delivery
Event-Driven ML Architecture Readiness Checklist
N
Notion
AI-Powered Search Through Event-Driven Embedding Generation
Reduced embedding generation costs by 70% through intelligent debouncing and pri...
Use EventBridge Archives for ML Model Retraining
Enable EventBridge archive on your ML event bus with a 90-day retention period. When you deploy a new model version, you can replay historical events through the new model to validate performance before switching production traffic.
Key Insight
The Hidden Cost of Synchronous Fallback Patterns
Many teams implement event-driven ML but keep a synchronous API 'just in case' for urgent requests. This seemingly reasonable fallback creates architectural debt that undermines the entire system.
Practice Exercise
Build an Event-Driven Image Classification Pipeline
45 min
Essential Resources for Event-Driven ML on AWS
AWS EventBridge User Guide - Content Filtering
article
Building Event-Driven Architectures on AWS - O'Reilly
book
AWS re:Invent 2023: Event-Driven ML at Scale
video
Serverless Land - Event-Driven Patterns
tool
Framework
Event-Driven ML Architecture Maturity Model
Level 1: Request-Response
All ML inference happens synchronously within API request cycles. This works for simple use cases bu...
Level 2: Basic Async
Long-running predictions are offloaded to background workers using simple queues. Clients poll for r...
Level 3: Event-Sourced
All ML operations emit events that capture intent and results. Systems can replay events to rebuild ...
Level 4: Reactive Orchestration
EventBridge rules automatically route events to appropriate handlers based on content. Step Function...
A
Anthropic
Building Claude's Async Inference Pipeline
Reduced client-perceived latency by 73% through streaming checkpoints, achieved ...
SQS Standard vs FIFO for ML Workloads
SQS Standard Queues
Nearly unlimited throughput—handles millions of messages per...
At-least-once delivery means duplicate handling logic is req...
Best-effort ordering works well for independent predictions ...
Lower cost at $0.40 per million requests, ideal for high-vol...
SQS FIFO Queues
Limited to 3,000 messages/second with batching, or 300/secon...
The 'Claim Check' Pattern Solves Large Payload Problems
EventBridge has a 256KB event size limit, but ML inference requests often include large prompts, images, or document embeddings that exceed this. The claim check pattern solves this elegantly: store the large payload in S3, then include only the S3 reference in your event.
Anti-Pattern: The Synchronous Fan-Out Trap
❌ Problem
This creates a fragile chain where any downstream failure causes the entire oper...
✓ Solution
Use EventBridge for true asynchronous fan-out. Publish a single 'ModelDeployed' ...
Implementing Reliable Async Inference with SQS
1
Design Your Message Schema
2
Configure Queue Settings for ML Workloads
3
Implement Dead Letter Queue Strategy
4
Build Idempotent Inference Handlers
5
Implement Graceful Scaling
S
Stripe
Event-Driven Fraud Detection at Scale
Achieved 99.999% availability for fraud scoring, reduced false positive rate by ...
Event-Driven ML Architecture Security Checklist
847ms
Average latency reduction when moving from synchronous to event-driven ML inference
Organizations that migrated from synchronous API-based ML inference to event-driven architectures saw dramatic latency improvements.
Fan-Out Pattern for Multi-Model Ensemble Inference
API Gateway
EventBridge
[Parallel Fan-Out]
Model A Queue | Mode...
EventBridge Archive Costs Can Explode
EventBridge Archive stores all matched events for replay capability, but costs $0.10 per GB stored per month. A high-volume ML system generating 1KB events at 1000/second accumulates 2.6TB monthly—$260/month just for archive storage.
Framework
The REACT Framework for ML Event Design
R - Request Context
Every event must include complete request context: unique request_id, timestamp, source service, use...
E - Event Versioning
Include explicit schema_version in every event. Use semantic versioning (major.minor.patch) where ma...
A - Actionable Payload
Events should contain everything needed to process them without additional lookups. Include model id...
C - Completeness Indicators
Include fields that indicate event completeness: is_final for streaming scenarios, sequence_number f...
Key Insight
Message Deduplication Windows Are Shorter Than You Think
SQS FIFO queues provide exactly-once processing, but only within a 5-minute deduplication window. If your ML inference takes longer than 5 minutes and the message becomes visible again due to visibility timeout, SQS won't recognize it as a duplicate.
Practice Exercise
Build a Multi-Priority ML Inference Queue System
45 min
V
Vercel
Edge-to-Cloud Event Pipeline for AI Features
Reduced global p95 latency from 2.3 seconds to 340ms for AI-powered features, ac...
Use EventBridge Pipes for Direct SQS-to-Lambda Integration
EventBridge Pipes, launched in late 2022, provides a simpler alternative to Lambda event source mappings for SQS. Pipes support filtering, enrichment, and transformation without intermediate Lambda functions.
EventBridge vs SNS for ML Event Fan-Out
Amazon EventBridge
Content-based filtering with complex patterns—route based on...
Schema registry provides event documentation and validation,...
Archive and replay enables debugging production issues by re...
Input transformers reshape events before delivery, reducing ...
Amazon SNS
Simple attribute-based filtering only—limited to exact match...
No built-in schema management—teams must coordinate event fo...
No native replay capability—must implement custom archiving ...
Mean time to resolution (MTTR) increases dramatically—what should take minutes t...
✓ Solution
Generate UUID correlation IDs at system entry points (API Gateway, S3 triggers, ...
Framework
Event-Driven ML Maturity Model
Level 1: Reactive
Basic event-driven processing with manual scaling and monitoring. Events trigger Lambda functions di...
Level 2: Resilient
Automated failure handling with retry policies, circuit breakers, and DLQ processors. Correlation ID...
Level 3: Observable
Comprehensive monitoring with custom metrics, distributed tracing, and business-level dashboards. Te...
Level 4: Adaptive
Systems automatically adjust to load patterns. Auto-scaling responds to queue depth and latency metr...
Event-Driven ML Debugging Checklist
D
Duolingo
Real-Time Lesson Adaptation with Event-Driven ML
Duolingo reduced question-to-question latency from 350ms to 150ms while improvin...
Event Ordering Guarantees in ML Pipelines
Standard SQS queues don't guarantee message ordering, which can cause issues when event sequence matters for ML predictions. If your model depends on event order (e.g., user session sequences), use SQS FIFO queues with MessageGroupId set to your ordering key (user_id, session_id).
94%
Reduction in ML pipeline incidents after implementing comprehensive event-driven error handling
Shopify's ML platform team tracked incidents before and after implementing dead letter queues, circuit breakers, and automated retry policies across their recommendation systems.
Practice Exercise
Implement Event-Driven Model A/B Testing
90 min
Cost Optimization for Event-Driven ML
EventBridge charges per event ($1 per million), while SQS charges per request ($0.40 per million). For high-volume ML pipelines, publish directly to SQS when you don't need EventBridge's routing capabilities.
Chapter Complete!
Event-driven architectures decouple ML components, enabling ...
Dead letter queues and idempotent handlers are non-negotiabl...
Fan-out patterns enable parallel model inference and result ...