EXPANSION30 min66 sections

Multi-Turn Conversations

THIS WEEK'S JOURNEY

Mastering Multi-Turn Conversations: The Heart of Intelligent LLM Applications

Multi-turn conversations represent the fundamental difference between a simple chatbot and a truly intelligent AI assistant. While single-turn interactions are stateless and isolated, multi-turn conversations require sophisticated context management that maintains coherence across dozens or even hundreds of exchanges.

67%

of users abandon AI assistants that forget context mid-conversation

This statistic reveals the critical importance of context management in user retention.

Key Insight

The Context Window Is Not Your Conversation Limit

A common misconception is that a 128K token context window means you can have 128K tokens of conversation. In reality, you need to reserve significant portions for system prompts (typically 2-5K tokens), tool definitions (1-3K tokens), retrieved context (5-20K tokens), and response generation headroom (2-4K tokens).

Anatomy of a Context Window in Multi-Turn Conversations

System Prompt (2-5K)

Tool Definitions (1-...

Retrieved Memory (5-...

Conversation History...

Framework

The CRISP Framework for Conversation Context

Current Intent

Always preserve the user's most recent goal and the immediate context needed to fulfill it. This typ...

Relevant History

Identify and preserve exchanges that directly relate to the current topic. Use semantic similarity t...

Identity & Preferences

Extract and persist user-specific information like name, communication style preferences, technical ...

Summarized Background

Compress older conversation segments into summaries that preserve key decisions, conclusions, and co...

Notion

Building Notion AI's Conversation Memory System

Context confusion dropped from 34% to 7%, user sessions increased by 45% in leng...

Naive vs. Intelligent Context Management

Naive Approach

Keep all messages until context window fills, then truncate ...

Treat all messages as equally important regardless of conten...

No summarization—either full message or nothing

Lose important early context as conversation grows

Intelligent Approach

Proactively compress and prioritize based on relevance and r...

Score messages by importance: decisions > facts > discussion...

Progressive summarization preserves meaning in fewer tokens

Important context persists regardless of when it was mention...

The 'Lost in the Middle' Problem

Research from Stanford and Anthropic demonstrates that LLMs struggle to retrieve information from the middle of long contexts. In tests with 128K token contexts, models retrieved information from the first 10K and last 10K tokens with 90%+ accuracy, but accuracy dropped to 60-70% for middle sections.

Key Insight

Conversation State Is More Than Message History

Effective multi-turn systems maintain explicit state beyond raw messages. This includes: the current task or goal (what is the user trying to accomplish?), entities mentioned (people, projects, files, dates), decisions made (choices the user confirmed), assumptions established (things the user agreed to or corrected), and emotional context (is the user frustrated, exploring, or in a hurry?).

Implementing a Basic Conversation State Trackertypescript

123456789101112
interface ConversationState {
  sessionId: string;
  currentGoal: string | null;
  entities: Map<string, EntityInfo>;
  decisions: Decision[];
  userPreferences: Record<string, string>;
  summaries: ConversationSummary[];
  recentMessages: Message[];
  tokenCount: number;
}

class StateTracker {

Anti-Pattern: The 'Append Everything' Approach

❌ Problem

Teams using this approach report 3-5x higher API costs than necessary, user comp...

✓ Solution

Implement proactive context management from day one. Even a simple sliding windo...

Implementing Progressive Summarization

Define Summarization Triggers

Segment Conversation by Topic

Extract Key Information First

Generate Tiered Summaries

Preserve Verbatim Anchors

Key Insight

Summarization Is Lossy—Design for Graceful Degradation

Every summarization loses information. The question isn't whether you'll lose details, but which details you can afford to lose.

Intercom

Fin AI's Conversation Continuity Architecture

Customer satisfaction scores improved 23% for multi-session conversations. Resol...

Context Management Health Check

Use Structured Summaries Over Prose

When summarizing conversation segments, structured formats (JSON, markdown with headers, bullet points) compress better and retrieve more reliably than prose paragraphs. A structured summary like '## Decisions: - Use PostgreSQL for primary DB - Deploy to AWS us-east-1' is more token-efficient and less ambiguous than 'The team discussed database options and decided to go with PostgreSQL.

Key Insight

The Recency Bias Is Your Friend—Use It Strategically

LLMs naturally weight recent context more heavily than older context, even within the same prompt. Rather than fighting this bias, design your context structure to leverage it.

Practice Exercise

Build a Conversation Summarizer

45 min

Framework

The Memory Hierarchy Model

Working Memory (Registers)

The current turn's context: user's latest message, immediate prior exchange, active task state. Alwa...

Session Memory (L1 Cache)

Recent conversation history, last 5-10 exchanges verbatim. Fast to access (already in context), mode...

Conversation Memory (L2 Cache)

Summarized history from current conversation. Compressed representations of older exchanges. Retriev...

User Memory (RAM)

Persistent facts about the user across all conversations: preferences, expertise level, past project...

Privacy Implications of Conversation Memory

Persistent conversation memory raises significant privacy considerations. Users may share sensitive information expecting it to be forgotten, then be surprised when it surfaces weeks later.

Framework

The STEAM Memory Architecture

Short-term Buffer

The immediate conversation window holding the last 3-5 exchanges verbatim. This provides full contex...

Topic Summaries

Compressed representations of conversation segments grouped by topic. Each summary captures key deci...

Entity Memory

Structured storage of named entities, their attributes, and relationships mentioned across conversat...

Action History

A log of all actions taken, tools called, and their outcomes throughout the conversation. Critical f...

Implementing a Tiered Memory Systemtypescript

123456789101112
interface ConversationMemory {
  shortTerm: Message[];           // Last 5 messages, full content
  workingMemory: TopicSummary[];  // Current session summaries
  longTerm: CompressedMemory[];   // Cross-session persistent memory
  entities: EntityGraph;          // Structured entity relationships
}

class TieredMemoryManager {
  private readonly SHORT_TERM_LIMIT = 5;
  private readonly WORKING_MEMORY_TOKENS = 2000;
  private readonly COMPRESSION_THRESHOLD = 0.7;

Summarization Strategies for Context Compression

Extractive Summarization

Selects and preserves exact phrases from original messages, ...

Faster to compute (can use embedding similarity), typically ...

Preserves original tone and intent without risk of misinterp...

Works well for factual, technical conversations where exact ...

Abstractive Summarization

Generates new text that captures meaning, enabling much high...

Requires LLM call for each summarization, adding 200-500ms l...

Can synthesize information across multiple messages into coh...

Better for conversational, exploratory discussions where the...

Notion

Building Notion AI's Conversation Memory System

Notion AI achieved 89% accuracy in resolving ambiguous references without asking...

Anti-Pattern: The Infinite Context Illusion

❌ Problem

Applications using naive full-context approaches see response quality degrade by...

✓ Solution

Implement intelligent context curation that places the most relevant information...

Building a Production-Ready Summarization Pipeline

Define Summarization Triggers

Segment by Topic or Intent

Extract Key Elements First

Generate Hierarchical Summaries

Validate Summary Quality

Key Insight

The Forgetting Curve Applies to AI Conversations Too

Just as humans forget information over time following Ebbinghaus's forgetting curve, AI conversation systems should implement intentional forgetting. Not all information deserves permanent memory—temporary preferences ('I'm in a hurry today'), superseded decisions ('actually, let's use Python instead'), and exploratory tangents should decay over time.

State Management Across Distributed Systems

In production deployments, conversation state must be synchronized across multiple server instances. Use Redis or similar in-memory stores for active session state with TTLs matching session timeout.

Framework

The Context Relevance Scoring Model (CRSM)

Recency Score (0-1)

Exponential decay based on message age. Recent messages score near 1.0, with scores halving every N ...

Semantic Similarity (0-1)

Cosine similarity between the current query embedding and each historical message embedding. Identif...

Information Density (0-1)

Measures how much unique, non-redundant information a message contains. Calculated by comparing mess...

Reference Weight (0-1)

Increases when a message is referenced by later messages (explicitly or through coreference). Messag...

Linear

Linear's Approach to Cross-Session Issue Context

Linear's AI features achieved 94% accuracy on questions requiring historical con...

67%

of conversation context is redundant or low-value

Analysis of 50,000 multi-turn conversations revealed that two-thirds of messages could be summarized or removed without impacting response quality.

Context Window Optimization Checklist

Multi-Turn Context Assembly Pipeline

User Message

Query Analysis

Parallel Retrieval [...

Relevance Scoring

Practice Exercise

Build a Conversation Summarizer with Quality Validation

45 min

Use Structured Output for Memory Updates

When using LLMs to extract information for memory storage, always use structured output (JSON mode or function calling) rather than free-form text. This ensures consistent formatting for entities, preferences, and facts, making downstream retrieval and comparison reliable.

Key Insight

The Compounding Value of Entity Memory

Entity memory—structured storage of people, projects, concepts, and their relationships—provides compounding returns over conversation length. In early turns, entity memory adds modest value by enabling coreference resolution ('she' → 'Sarah from marketing').

Implementing Conversation State Machinepython

123456789101112
from enum import Enum
from dataclasses import dataclass
from typing import Optional, List, Dict

class ConversationPhase(Enum):
    GREETING = "greeting"
    PROBLEM_DISCOVERY = "problem_discovery"
    SOLUTION_EXPLORATION = "solution_exploration"
    DECISION_MAKING = "decision_making"
    ACTION_EXECUTION = "action_execution"
    WRAP_UP = "wrap_up"

Anti-Pattern: Treating All Messages as Equally Important

❌ Problem

Systems treating all messages equally show 35% lower retrieval precision when su...

✓ Solution

Implement message importance classification before storage. Use simple heuristic...

Practice Exercise

Build a Conversation Summarizer

45 min

Progressive Summarization Implementationpython

123456789101112
from dataclasses import dataclass, field
from typing import List, Optional
from datetime import datetime
import hashlib

@dataclass
class Message:
    role: str
    content: str
    timestamp: datetime
    metadata: dict = field(default_factory=dict)

Practice Exercise

Implement Sliding Window with Semantic Anchors

60 min

Semantic Anchor Detectionpython

123456789101112
import re
from typing import List, Tuple
from dataclasses import dataclass

@dataclass
class AnchorScore:
    message_idx: int
    total_score: float
    entity_score: float
    decision_score: float
    reference_score: float
    is_anchor: bool

Multi-Turn Implementation Checklist

Anti-Pattern: The Infinite History Append

❌ Problem

One e-commerce company saw their average API cost per conversation increase from...

✓ Solution

Implement proactive context management from day one. Set hard token budgets with...

Anti-Pattern: Stateless Session Handling

❌ Problem

A healthcare chatbot using this pattern experienced 3-5 second latencies on ever...

✓ Solution

Maintain explicit session state that persists between requests. Extract and stor...

Anti-Pattern: One-Size-Fits-All Context Windows

❌ Problem

An enterprise support platform found that 60% of conversations resolved within 3...

✓ Solution

Implement adaptive context management that scales with conversation complexity. ...

Practice Exercise

Build a Memory-Augmented Conversation System

90 min

Three-Tier Memory Architecturepython

123456789101112
from typing import List, Dict, Optional
from dataclasses import dataclass
import numpy as np

@dataclass
class WorkingMemory:
    current_topic: str
    user_intent: str
    pending_actions: List[str]
    key_entities: Dict[str, str]
    confidence_scores: Dict[str, float]

Essential Multi-Turn Conversation Resources

LangChain Memory Documentation

article

MemGPT: Towards LLMs as Operating Systems

article

Building LLM Applications with Memory - DeepLearning.AI

video

Redis Vector Search for Conversation Memory

tool

Practice Exercise

Stress Test Your Context Management

30 min

Debug Context Issues with Transparency Logging

When users report the AI 'forgetting' things, the issue is almost always in context assembly, not the model. Implement detailed logging that captures: exact context sent to the model, summarization inputs and outputs, memory retrieval queries and results, and token counts at each stage.

Framework

CLEAR Context Quality Framework

Completeness

Does the context contain all information needed to respond appropriately? Audit by identifying quest...

Latency

How quickly can context be assembled and retrieved? Measure P50 and P99 latencies for context assemb...

Efficiency

What percentage of context tokens are actually useful? Calculate signal-to-noise ratio by comparing ...

Accuracy

Is the context factually correct and up-to-date? Audit for summarization errors, stale state, and co...

Plan for Session Recovery from Day One

Production systems will experience crashes, timeouts, and network failures mid-conversation. Design your state management to survive these failures gracefully.

67%

of conversation failures traced to context issues

When conversations fail - users abandon, request human handoff, or express frustration - the root cause is most often context-related: missing information, stale state, or irrelevant retrieval.

Synchronous vs Asynchronous Context Updates

Synchronous Updates

Context updated before response generation

Guaranteed consistency - response always uses latest state

Adds latency to every turn (50-200ms typical)

Simpler debugging - clear cause and effect

Asynchronous Updates

Context updated after response sent

Eventually consistent - response may use slightly stale stat...

Minimal added latency to user-facing path

Complex debugging - timing-dependent behavior

Beware of Summarization Drift

When summaries are recursively summarized (summarizing summaries), information gradually drifts from the original meaning. After 3-4 levels of summarization, critical details may be lost or distorted.

Notion

Building AI Memory for Workspace Context

The memory architecture enabled Notion AI to provide contextually relevant assis...

Production-Ready Session State Managertypescript

123456789101112
interface SessionState {
  sessionId: string;
  userId: string;
  conversationSummary: string;
  workingMemory: {
    currentTopic: string;
    userIntent: string;
    pendingActions: string[];
    extractedEntities: Record<string, string>;
  };
  messageCount: number;
  lastActivityAt: Date;

Chapter Complete!

Multi-turn conversation management requires explicit archite...

The three-tier memory architecture (short-term sliding windo...

Summarization is a critical capability but introduces risks:...

State management requires careful attention to consistency, ...

Next: Start by auditing your current conversation handling: measure context utilization, identify where information is lost, and benchmark latency at each stage

PreviousNext