← Back to AWS Serverless ML Architecture

EXPANSION30 min64 sections

Serverless Feature Engineering

THIS WEEK'S JOURNEY

Serverless Feature Engineering: The Foundation of Production ML Systems

Feature engineering represents the most critical yet often underestimated component of machine learning systems, consuming up to 80% of data scientists' time in traditional workflows. In serverless architectures, feature engineering transforms from a manual, error-prone process into an automated, scalable pipeline that can process millions of feature computations per second.

80%

Time spent on feature engineering in ML projects

Data scientists spend the vast majority of their time not on model development, but on feature engineering and data preparation.

Key Insight

Features Are the Language Your Models Speak

Raw data is meaningless to machine learning models—features are the translated, structured representations that enable learning. A user's raw click history becomes a 'session_engagement_score'; a series of transactions becomes 'spending_velocity_7d'.

Traditional vs. Serverless Feature Engineering

Traditional Approach

Dedicated Spark clusters running 24/7 costing $10,000+/month...

Manual scaling requires capacity planning weeks in advance f...

Feature computation tightly coupled to training jobs, causin...

Training-serving skew from different code paths for batch an...

Serverless Approach

Lambda functions scale to zero, paying only for actual compu...

Automatic scaling handles any traffic pattern from 1 to 1 mi...

Feature computation decoupled into independent, deployable m...

Single feature definition serves both training and inference...

Framework

The Feature Engineering Pyramid

Raw Features (Base Layer)

Direct extractions from source data requiring minimal transformation. Examples include user_id, time...

Derived Features (Second Layer)

Single-source transformations like age_from_birthdate, price_bucket, or normalized_text. These requi...

Aggregate Features (Third Layer)

Time-windowed computations like purchase_count_30d, average_session_duration, or rolling_revenue. Th...

Cross-Entity Features (Fourth Layer)

Features combining multiple entities like user_vs_cohort_spending or product_popularity_in_category....

DoorDash

Building a Real-Time Feature Platform for Delivery Time Prediction

Delivery time prediction accuracy improved by 23%, customer complaints about lat...

The Training-Serving Skew Problem

Training-serving skew occurs when features computed during training differ from those computed during inference, causing silent model degradation. In one documented case, a fintech company's fraud model accuracy dropped from 94% to 67% in production due to skew in timestamp handling.

Basic Feature Store Integration with Lambdapython

123456789101112
import boto3
import json
from datetime import datetime

sagemaker_runtime = boto3.client('sagemaker-featurestore-runtime')

def lambda_handler(event, context):
    user_id = event['user_id']
    
    # Compute features
    features = compute_user_features(user_id)

Key Insight

Real-Time vs. Batch Features: A Strategic Decision

Not all features need real-time computation—in fact, computing everything in real-time is a common and expensive mistake. Features fall into three categories: static features (user demographics) that change rarely and should be cached indefinitely, slowly-changing features (purchase_count_30d) that can be computed hourly or daily in batch, and real-time features (cart_total, session_duration) that must be computed on every request.

Serverless Feature Pipeline Architecture

Data Sources (Dynamo...

EventBridge (Trigger...

Lambda Feature Compu...

SageMaker Feature St...

Feature Engineering Readiness Assessment

Anti-Pattern: The Monolithic Feature Function

❌ Problem

One e-commerce company's monolithic feature function grew to compute 89 features...

✓ Solution

Implement the single-responsibility principle for features: one Lambda function ...

Setting Up Your First Serverless Feature Pipeline

Create the SageMaker Feature Group

Implement the Feature Computation Lambda

Configure Event-Driven Triggers

Build the Feature Retrieval Lambda

Set Up Monitoring and Alerting

Lambda Memory Optimization for Feature Computation

Lambda allocates CPU proportionally to memory—a 1,769MB function gets one full vCPU. For compute-intensive feature calculations (embeddings, complex aggregations), increasing memory from 256MB to 1,769MB often reduces execution time by 4-6x, actually lowering total cost despite the higher per-ms rate.

Instacart

Migrating 500+ Features to Serverless Architecture

After 14 months, all 500+ features ran on serverless infrastructure. Monthly cos...

Key Insight

Feature Versioning Is Model Versioning

Every model is implicitly tied to specific feature definitions—change a feature, and you've changed the model's behavior even without retraining. This coupling means feature versioning must be as rigorous as model versioning.

Practice Exercise

Build a User Engagement Feature Pipeline

45 min

Essential Resources for Serverless Feature Engineering

Amazon SageMaker Feature Store Developer Guide

article

Feature Engineering for Machine Learning (O'Reilly)

book

Feast: Open Source Feature Store

tool

Uber's Michelangelo Feature Store Architecture

article

Framework

Feature Engineering Maturity Model

Level 1: Script-Based Features

Features computed in Jupyter notebooks or standalone scripts. No reusability, no versioning, and sig...

Level 2: Pipeline-Orchestrated Features

Features computed through orchestrated pipelines using Step Functions or Airflow. Basic scheduling a...

Level 3: Centralized Feature Repository

Shared feature definitions stored in a central repository with documentation. Teams can discover and...

Level 4: Unified Feature Store

Full feature store implementation with Amazon SageMaker Feature Store or similar. Unified offline an...

Batch Features vs Real-Time Features: Architecture Decisions

Batch Features

Computed on schedule (hourly, daily, weekly) using Step Func...

Stored in SageMaker Feature Store offline store backed by S3...

Cost-effective for features that don't change frequently lik...

Can leverage complex aggregations across large datasets with...

Real-Time Features

Computed on-demand during inference with sub-100ms latency r...

Stored in SageMaker Feature Store online store backed by Ela...

Essential for time-sensitive features like current session b...

Limited to simple transformations due to latency constraints

DoorDash

Building a Real-Time Feature Platform for Delivery Time Prediction

Order placement latency dropped from 1.2 seconds to 340ms, timeout errors decrea...

Implementing a Feature Computation Lambda with Cachingpython

123456789101112
import json
import boto3
import hashlib
from datetime import datetime, timedelta
from decimal import Decimal

dynamodb = boto3.resource('dynamodb')
feature_cache = dynamodb.Table('feature-cache')
feature_store = boto3.client('sagemaker-featurestore-runtime')

def compute_user_features(user_id: str, raw_data: dict) -> dict:
    """Pure function for feature computation - no side effects"""

Key Insight

Feature Versioning Is Model Versioning

Every ML model is implicitly coupled to specific feature definitions. When you change how a feature is computed, you've effectively created a new model even if the model weights haven't changed.

Anti-Pattern: The Feature Explosion Problem

❌ Problem

Pinterest discovered that 73% of their features contributed less than 0.1% to mo...

✓ Solution

Implement feature importance tracking from day one. Use SHAP values or permutati...

Building a Production Feature Pipeline with Step Functions

Define Feature Specifications

Implement Feature Computation Lambdas

Create the Step Functions Orchestration

Configure Feature Store Integration

Implement Data Quality Checks

Instacart

Scaling Feature Computation for Real-Time Inventory Predictions

Feature computation time dropped from 4 hours to 12 minutes, costs decreased by ...

Point-in-Time Correctness Is Non-Negotiable

When generating training data, you must use only features that would have been available at prediction time. Using future information (even accidentally) creates data leakage that inflates training metrics but destroys production performance.

Framework

Feature Freshness Classification System

Static Features (Refresh: Never/Rarely)

Features that never change or change extremely rarely. Examples: user signup date, product category,...

Slow-Moving Features (Refresh: Daily)

Features that change but not in ways that affect short-term predictions. Examples: user lifetime val...

Semi-Dynamic Features (Refresh: Hourly)

Features that change throughout the day but don't require instant updates. Examples: daily order cou...

Near-Real-Time Features (Refresh: Minutes)

Features requiring frequent updates but tolerating slight staleness. Examples: rolling 15-minute con...

67%

of ML pipeline time spent on feature engineering

Feature engineering dominates ML development time, yet most organizations lack dedicated feature infrastructure.

Feature Store Implementation Checklist

Serverless Feature Pipeline Architecture

Raw Data Sources (S3...

EventBridge Triggers...

Step Functions Orche...

Lambda Feature Compu...

Spotify

Feature Engineering for Real-Time Music Recommendations

Total feature retrieval latency averages 22ms p50 and 45ms p99, enabling real-ti...

Use Feature Store Record Identifiers Wisely

SageMaker Feature Store requires a record identifier that uniquely identifies each entity. Choose identifiers that are stable and meaningful—user_id rather than session_id for user features.

Practice Exercise

Build a Feature Freshness Monitoring System

45 min

SageMaker Feature Store vs Custom DynamoDB Feature Store

SageMaker Feature Store

Managed service with automatic online/offline synchronizatio...

Built-in point-in-time query support for training data gener...

Native integration with SageMaker training and inference

Automatic schema management and validation

Custom DynamoDB Solution

Full control over data model, indexing, and caching strategi...

Requires manual implementation of temporal queries and versi...

Works with any ML framework or inference system

Complete flexibility in schema evolution and data types

Key Insight

Feature Computation Should Be Idempotent and Deterministic

Every feature computation function should produce identical outputs given identical inputs, regardless of when or how many times it runs. This property enables safe retries, backfills, and parallel execution.

Point-in-Time Correct Feature Retrieval for Trainingpython

123456789101112
import boto3
from datetime import datetime, timedelta
import pandas as pd

def get_training_features(
    entity_ids: list,
    label_timestamps: dict,  # {entity_id: timestamp when label was observed}
    feature_group_name: str,
    lookback_buffer: timedelta = timedelta(hours=1)
) -> pd.DataFrame:
    """
    Retrieve features as they existed at label time, preventing data leakage.

Anti-Pattern: The Monolithic Feature Lambda

❌ Problem

A bug in one feature computation breaks all features. The function frequently ti...

✓ Solution

Decompose into focused feature Lambdas organized by domain and freshness require...

Feature Engineering Deep-Dive Resources

Feast: Open Source Feature Store

tool

Tecton Feature Platform Documentation

article

Feature Store Summit Recordings

video

AWS SageMaker Feature Store Workshop

article

Practice Exercise

Build a Complete Feature Pipeline from Scratch

90 min

Complete Real-Time Feature Computation Lambdapython

123456789101112
import boto3
import json
from datetime import datetime, timedelta
from decimal import Decimal
import hashlib

featurestore = boto3.client('sagemaker-featurestore-runtime')
dynamodb = boto3.resource('dynamodb')
redis_client = None  # Initialize with ElastiCache endpoint

class FeatureComputer:
    def __init__(self, user_id: str, session_id: str):

Practice Exercise

Implement Feature Monitoring and Drift Detection

60 min

Production Feature Pipeline Launch Checklist

Anti-Pattern: The Monolithic Feature Function Anti-Pattern

❌ Problem

Monolithic feature functions typically exhibit 500ms+ latency due to sequential ...

✓ Solution

Decompose feature computation into focused, single-purpose Lambda functions orga...

Feature Serving with Fallbacks and Cachingpython

123456789101112
import boto3
import asyncio
from typing import Dict, List, Optional
from dataclasses import dataclass
import json
import time

@dataclass
class FeatureConfig:
    feature_group: str
    timeout_ms: int
    fallback_value: any

Anti-Pattern: The Feature Leakage Trap

❌ Problem

Models trained with leaked features show dramatically degraded performance in pr...

✓ Solution

Implement strict point-in-time feature computation that explicitly excludes any ...

Practice Exercise

Build Point-in-Time Correct Training Data Generator

75 min

Anti-Pattern: The Unbounded Feature Cardinality Problem

❌ Problem

Models with unbounded categorical features can grow to gigabytes in size, making...

✓ Solution

Implement cardinality management strategies: use embedding layers for high-cardi...

Essential Feature Engineering Resources

AWS SageMaker Feature Store Developer Guide

article

Feast: Open Source Feature Store

tool

Feature Engineering for Machine Learning by Alice Zheng

book

Uber's Michelangelo Feature Store Paper

article

Automated Feature Quality Validation Pipelinepython

123456789101112
import boto3
import pandas as pd
from scipy import stats
import json
from datetime import datetime, timedelta
from typing import Dict, List

class FeatureValidator:
    def __init__(self, feature_group_name: str):
        self.feature_group_name = feature_group_name
        self.athena = boto3.client('athena')
        self.cloudwatch = boto3.client('cloudwatch')

Feature Engineering Security and Compliance Checklist

Framework

Feature Engineering Maturity Model

Level 1: Ad-Hoc Features

Features are computed in notebooks or scripts with no standardization. Each model has its own featur...

Level 2: Centralized Feature Computation

Features are computed by shared pipelines and stored in a basic feature store. Feature definitions a...

Level 3: Feature Platform

A self-service feature platform enables teams to define, compute, and serve features. Feature versio...

Level 4: Automated Feature Operations

Automated quality monitoring, drift detection, and alerting for all features. Point-in-time correct ...

Practice Exercise

Design a Multi-Model Feature Sharing Architecture

45 min

Feature Store Cost Optimization Strategy

Feature store costs can grow rapidly without proper management. Implement these optimizations: use DynamoDB on-demand capacity for variable workloads, configure S3 Intelligent-Tiering for offline stores, batch feature writes to reduce API calls, and regularly audit unused features for deletion.

Complete Serverless Feature Engineering Architecture

Raw Events (Kinesis)

Real-time Compute (L...

Online Store (Dynamo...

Feature Serving (Lam...

67%

of ML project time spent on feature engineering

This statistic underscores why investing in feature engineering infrastructure pays dividends.

Start with Feature Serving, Not Storage

When building your first feature pipeline, focus on the serving path first. Implement a simple Lambda that serves features from DynamoDB with proper fallbacks and monitoring.

Chapter Complete!

Serverless feature engineering combines Lambda for real-time...

Feature freshness requirements should drive architecture dec...

Point-in-time correctness is essential for training data gen...

Feature versioning and governance become critical at scale—i...

Next: Begin by auditing your current feature computation approach against the maturity model presented in this chapter

PreviousNext