← Back to AWS Serverless ML Architecture

EXPANSION30 min65 sections

Event-Driven ML Architectures

THIS WEEK'S JOURNEY

Building Reactive ML Systems That Scale with Events

Event-driven architecture represents a fundamental shift in how we design ML systems, moving from synchronous request-response patterns to reactive, loosely-coupled systems that can handle millions of inference requests without breaking a sweat. In this chapter, you'll master AWS EventBridge and SQS to build ML pipelines that automatically scale, gracefully handle failures, and process predictions asynchronously across distributed systems.

Key Insight

Events Are the Native Language of Modern ML Systems

Traditional ML architectures treat inference as a synchronous operation—request comes in, model processes, response goes out. But real-world ML systems operate in a world of events: a user uploads an image, a transaction occurs, a sensor reading arrives, a document is modified.

73%

Reduction in ML inference latency variance

Organizations that migrated from synchronous to event-driven ML architectures reported 73% reduction in latency variance (p99 vs p50).

Synchronous vs Event-Driven ML Architecture

Synchronous ML

Client waits for inference completion, blocking user experie...

Direct coupling between request rate and model endpoint capa...

Failures cascade immediately to users with no retry mechanis...

Scaling requires over-provisioning for peak load, wasting 60...

Event-Driven ML

Immediate acknowledgment with async result delivery via webh...

Queue-based buffering absorbs traffic spikes without endpoin...

Built-in retry with exponential backoff and dead letter queu...

Scale to actual demand with queue depth-based auto-scaling p...

Stripe

Real-Time Fraud Detection Through Event-Driven ML

Achieved 99.97% fraud detection accuracy with sub-100ms p99 latency, processing ...

Framework

REACT Framework for Event-Driven ML Design

Route

Define clear event routing rules using EventBridge patterns. Determine which events trigger which mo...

Enqueue

Buffer events in SQS queues with appropriate visibility timeouts and retention periods. Size your qu...

Acknowledge

Implement proper acknowledgment patterns—only delete messages after successful inference completion....

Compensate

Design dead letter queues and compensation logic for failed inferences. Every DLQ message should tri...

EventBridge Rule for ML Model Routingtypescript

123456789101112
import * as cdk from 'aws-cdk-lib';
import * as events from 'aws-cdk-lib/aws-events';
import * as targets from 'aws-cdk-lib/aws-events-targets';
import * as sqs from 'aws-cdk-lib/aws-sqs';

// Create queues for different model tiers
const standardInferenceQueue = new sqs.Queue(this, 'StandardInferenceQueue', {
  visibilityTimeout: cdk.Duration.seconds(120),
  retentionPeriod: cdk.Duration.days(7),
  deadLetterQueue: {
    queue: dlq,
    maxReceiveCount: 3

Event Ordering Is Not Guaranteed in EventBridge

EventBridge delivers events with at-least-once semantics but does not guarantee ordering. If your ML pipeline requires sequential processing (e.g., processing video frames in order), you must implement ordering at the consumer level using sequence numbers in your event payload.

Key Insight

SQS as the Shock Absorber for ML Inference

SQS queues serve as the critical buffer between unpredictable event arrival rates and your carefully tuned ML inference capacity. Without this buffer, a sudden spike in requests—say, a viral social media post triggering 100x normal image classification requests—would either overwhelm your model endpoint or require massive over-provisioning.

Event-Driven ML Inference Pipeline Architecture

Event Sources (API, ...

EventBridge (Route &...

SQS Queues (Buffer &...

Lambda (Process & In...

Anti-Pattern: The Unbounded Fan-Out Explosion

❌ Problem

Systems that work fine in development collapse in production when real traffic h...

✓ Solution

Implement fan-out budgets: explicitly limit how many targets can receive each ev...

Implementing Your First Event-Driven Inference Pipeline

Define Your Event Schema

Create Your Event Bus and Rules

Configure SQS Queues with Appropriate Settings

Build the Lambda Consumer

Implement Result Delivery

Event-Driven ML Architecture Readiness Checklist

Notion

AI-Powered Search Through Event-Driven Embedding Generation

Reduced embedding generation costs by 70% through intelligent debouncing and pri...

Use EventBridge Archives for ML Model Retraining

Enable EventBridge archive on your ML event bus with a 90-day retention period. When you deploy a new model version, you can replay historical events through the new model to validate performance before switching production traffic.

Key Insight

The Hidden Cost of Synchronous Fallback Patterns

Many teams implement event-driven ML but keep a synchronous API 'just in case' for urgent requests. This seemingly reasonable fallback creates architectural debt that undermines the entire system.

Practice Exercise

Build an Event-Driven Image Classification Pipeline

45 min

Essential Resources for Event-Driven ML on AWS

AWS EventBridge User Guide - Content Filtering

article

Building Event-Driven Architectures on AWS - O'Reilly

book

AWS re:Invent 2023: Event-Driven ML at Scale

video

Serverless Land - Event-Driven Patterns

tool

Framework

Event-Driven ML Architecture Maturity Model

Level 1: Request-Response

All ML inference happens synchronously within API request cycles. This works for simple use cases bu...

Level 2: Basic Async

Long-running predictions are offloaded to background workers using simple queues. Clients poll for r...

Level 3: Event-Sourced

All ML operations emit events that capture intent and results. Systems can replay events to rebuild ...

Level 4: Reactive Orchestration

EventBridge rules automatically route events to appropriate handlers based on content. Step Function...

Anthropic

Building Claude's Async Inference Pipeline

Reduced client-perceived latency by 73% through streaming checkpoints, achieved ...

SQS Standard vs FIFO for ML Workloads

SQS Standard Queues

Nearly unlimited throughput—handles millions of messages per...

At-least-once delivery means duplicate handling logic is req...

Best-effort ordering works well for independent predictions ...

Lower cost at $0.40 per million requests, ideal for high-vol...

SQS FIFO Queues

Limited to 3,000 messages/second with batching, or 300/secon...

Exactly-once processing eliminates duplicate predictions and...

Strict ordering within message groups ensures sequential pre...

Higher cost at $0.50 per million requests, justified when or...

EventBridge Rule for Intelligent ML Request Routingjson

123456789101112
{
  "AWSTemplateFormatVersion": "2010-09-09",
  "Resources": {
    "HighPriorityInferenceRule": {
      "Type": "AWS::Events::Rule",
      "Properties": {
        "Name": "route-premium-ml-requests",
        "EventBusName": "ml-inference-bus",
        "EventPattern": {
          "source": ["ml.inference.request"],
          "detail-type": ["InferenceRequest"],
          "detail": {

Key Insight

The 'Claim Check' Pattern Solves Large Payload Problems

EventBridge has a 256KB event size limit, but ML inference requests often include large prompts, images, or document embeddings that exceed this. The claim check pattern solves this elegantly: store the large payload in S3, then include only the S3 reference in your event.

Anti-Pattern: The Synchronous Fan-Out Trap

❌ Problem

This creates a fragile chain where any downstream failure causes the entire oper...

✓ Solution

Use EventBridge for true asynchronous fan-out. Publish a single 'ModelDeployed' ...

Implementing Reliable Async Inference with SQS

Design Your Message Schema

Configure Queue Settings for ML Workloads

Implement Dead Letter Queue Strategy

Build Idempotent Inference Handlers

Implement Graceful Scaling

Stripe

Event-Driven Fraud Detection at Scale

Achieved 99.999% availability for fraud scoring, reduced false positive rate by ...

Event-Driven ML Architecture Security Checklist

847ms

Average latency reduction when moving from synchronous to event-driven ML inference

Organizations that migrated from synchronous API-based ML inference to event-driven architectures saw dramatic latency improvements.

Fan-Out Pattern for Multi-Model Ensemble Inference

API Gateway

EventBridge

[Parallel Fan-Out]

Model A Queue | Mode...

EventBridge Archive Costs Can Explode

EventBridge Archive stores all matched events for replay capability, but costs $0.10 per GB stored per month. A high-volume ML system generating 1KB events at 1000/second accumulates 2.6TB monthly—$260/month just for archive storage.

Framework

The REACT Framework for ML Event Design

R - Request Context

Every event must include complete request context: unique request_id, timestamp, source service, use...

E - Event Versioning

Include explicit schema_version in every event. Use semantic versioning (major.minor.patch) where ma...

A - Actionable Payload

Events should contain everything needed to process them without additional lookups. Include model id...

C - Completeness Indicators

Include fields that indicate event completeness: is_final for streaming scenarios, sequence_number f...

Key Insight

Message Deduplication Windows Are Shorter Than You Think

SQS FIFO queues provide exactly-once processing, but only within a 5-minute deduplication window. If your ML inference takes longer than 5 minutes and the message becomes visible again due to visibility timeout, SQS won't recognize it as a duplicate.

Practice Exercise

Build a Multi-Priority ML Inference Queue System

45 min

Vercel

Edge-to-Cloud Event Pipeline for AI Features

Reduced global p95 latency from 2.3 seconds to 340ms for AI-powered features, ac...

Use EventBridge Pipes for Direct SQS-to-Lambda Integration

EventBridge Pipes, launched in late 2022, provides a simpler alternative to Lambda event source mappings for SQS. Pipes support filtering, enrichment, and transformation without intermediate Lambda functions.

EventBridge vs SNS for ML Event Fan-Out

Amazon EventBridge

Content-based filtering with complex patterns—route based on...

Schema registry provides event documentation and validation,...

Archive and replay enables debugging production issues by re...

Input transformers reshape events before delivery, reducing ...

Amazon SNS

Simple attribute-based filtering only—limited to exact match...

No built-in schema management—teams must coordinate event fo...

No native replay capability—must implement custom archiving ...

Messages delivered as-is—transformation requires Lambda subs...

Implementing Backpressure with SQS and Lambdapython

123456789101112
import boto3
import json
from datetime import datetime, timedelta

cloudwatch = boto3.client('cloudwatch')
sqs = boto3.client('sqs')
lambda_client = boto3.client('lambda')

def check_and_apply_backpressure(queue_url: str, function_name: str):
    # Get current queue depth
    response = sqs.get_queue_attributes(
        QueueUrl=queue_url,

Anti-Pattern: The Invisible Poison Message

❌ Problem

The message cycles between the queue and your handler, consuming Lambda invocati...

✓ Solution

Implement progressive validation: quick schema validation first, then lightweigh...

Essential Resources for Event-Driven ML Architecture

AWS Event-Driven Architecture Guide

article

Designing Data-Intensive Applications by Martin Kleppmann

book

AWS re:Invent 2023: Building Event-Driven ML Pipelines

video

EventBridge Atlas

tool

Practice Exercise

Build an Event-Driven Image Classification Pipeline

90 min

Complete EventBridge Rule with Input Transformationpython

123456789101112
import boto3
import json

events = boto3.client('events')

# Create rule for ML inference requests
rule_response = events.put_rule(
    Name='ml-inference-request-rule',
    EventPattern=json.dumps({
        'source': ['ml.inference.api'],
        'detail-type': ['InferenceRequest'],
        'detail': {

Event-Driven ML Production Readiness Checklist

Anti-Pattern: The Synchronous Wrapper Anti-Pattern

❌ Problem

This creates artificial bottlenecks where API Gateway connections timeout (29 se...

✓ Solution

Implement true async patterns with immediate acknowledgment and callback mechani...

Practice Exercise

Implement Multi-Model Fan-Out with Result Aggregation

120 min

SQS Batch Processor with Partial Failure Handlingpython

123456789101112
import json
import boto3
from aws_lambda_powertools import Logger, Tracer, Metrics
from aws_lambda_powertools.utilities.batch import (
    BatchProcessor, EventType, batch_processor
)
from aws_lambda_powertools.utilities.typing import LambdaContext

logger = Logger()
tracer = Tracer()
metrics = Metrics()
processor = BatchProcessor(event_type=EventType.SQS)

Anti-Pattern: The Unbounded Event Storm

❌ Problem

SageMaker endpoints hit throttling limits and start returning errors. SQS queues...

✓ Solution

Implement rate limiting at event sources using SQS with controlled Lambda concur...

Event-Driven ML Architecture Resources

AWS Lambda Powertools for Python

tool

Designing Event-Driven Systems by Ben Stopford

book

AWS re:Invent 2023: Building Event-Driven Architectures

video

LocalStack for Event-Driven Testing

tool

Practice Exercise

Build a Self-Healing ML Pipeline with Dead Letter Queue Processing

75 min

EventBridge Pipe for ML Feature Enrichmentpython

123456789101112
import boto3
import json

pipes = boto3.client('pipes')
iam = boto3.client('iam')

# Create EventBridge Pipe for feature enrichment
pipe_config = {
    'Name': 'ml-feature-enrichment-pipe',
    'Source': 'arn:aws:sqs:us-east-1:123456789:raw-inference-requests',
    'SourceParameters': {
        'SqsQueueParameters': {

Anti-Pattern: The Missing Correlation ID

❌ Problem

Mean time to resolution (MTTR) increases dramatically—what should take minutes t...

✓ Solution

Generate UUID correlation IDs at system entry points (API Gateway, S3 triggers, ...

Framework

Event-Driven ML Maturity Model

Level 1: Reactive

Basic event-driven processing with manual scaling and monitoring. Events trigger Lambda functions di...

Level 2: Resilient

Automated failure handling with retry policies, circuit breakers, and DLQ processors. Correlation ID...

Level 3: Observable

Comprehensive monitoring with custom metrics, distributed tracing, and business-level dashboards. Te...

Level 4: Adaptive

Systems automatically adjust to load patterns. Auto-scaling responds to queue depth and latency metr...

Event-Driven ML Debugging Checklist

Duolingo

Real-Time Lesson Adaptation with Event-Driven ML

Duolingo reduced question-to-question latency from 350ms to 150ms while improvin...

Event Ordering Guarantees in ML Pipelines

Standard SQS queues don't guarantee message ordering, which can cause issues when event sequence matters for ML predictions. If your model depends on event order (e.g., user session sequences), use SQS FIFO queues with MessageGroupId set to your ordering key (user_id, session_id).

94%

Reduction in ML pipeline incidents after implementing comprehensive event-driven error handling

Shopify's ML platform team tracked incidents before and after implementing dead letter queues, circuit breakers, and automated retry policies across their recommendation systems.

Practice Exercise

Implement Event-Driven Model A/B Testing

90 min

Cost Optimization for Event-Driven ML

EventBridge charges per event ($1 per million), while SQS charges per request ($0.40 per million). For high-volume ML pipelines, publish directly to SQS when you don't need EventBridge's routing capabilities.

Chapter Complete!

Event-driven architectures decouple ML components, enabling ...

Dead letter queues and idempotent handlers are non-negotiabl...

Fan-out patterns enable parallel model inference and result ...

Comprehensive observability requires correlation IDs, struct...

Next: Start by implementing a simple event-driven inference pipeline with SQS and Lambda, including DLQ configuration and basic monitoring

PreviousNext