EXPANSION35 min61 sections

Agentic RAG

THIS WEEK'S JOURNEY

Agentic RAG: When Retrieval Becomes Reasoning

Traditional RAG systems follow a rigid pattern: retrieve documents, stuff them into context, generate a response. But what happens when the first retrieval misses the mark, when multiple sources need synthesis, or when the query requires decomposition into sub-questions? Agentic RAG represents a paradigm shift where your retrieval system gains the ability to reason about its own performance, iterate on failed retrievals, and orchestrate complex multi-step information gathering.

Key Insight

The Fundamental Limitation of Single-Shot RAG

Single-shot RAG assumes that one retrieval operation will surface all relevant information—an assumption that fails in 40-60% of complex queries according to internal benchmarks from companies like Anthropic and Cohere. When a user asks 'How did our Q3 revenue compare to competitors and what drove the difference?', a single retrieval cannot simultaneously fetch internal financial data, competitor reports, and market analysis.

Traditional RAG vs. Agentic RAG Architecture

Traditional RAG

Single retrieval operation per query with fixed top-k result...

No evaluation of retrieval quality before generation

Query passes through unchanged regardless of complexity

Fails silently when relevant documents aren't retrieved

Agentic RAG

Multiple retrieval iterations with dynamic result counts

Built-in relevance grading before proceeding to generation

Query decomposition and reformulation based on initial resul...

Explicit failure detection with corrective retrieval strateg...

67%

Improvement in answer accuracy with iterative retrieval

When RAG systems were given the ability to perform up to 3 retrieval iterations with query reformulation, answer accuracy on complex multi-hop questions improved from 34% to 57%.

Framework

The REACT-RAG Loop

Reason

The agent analyzes the query and current context to determine what information is missing. This incl...

Retrieve

Execute the planned retrieval action, which might involve querying a vector store, calling an API, p...

Evaluate

Grade the retrieved documents for relevance and completeness. This step determines whether the agent...

Act/Generate

Either generate the final response if sufficient information has been gathered, or loop back to the ...

Notion

Building an Agentic Q&A System for Workspace Search

User satisfaction on complex queries improved to 73%, a 78% relative improvement...

The Cost of Agentic RAG

Each iteration in an agentic RAG loop incurs LLM inference costs for reasoning, retrieval costs for vector search or API calls, and latency overhead. A query that takes 3 iterations costs roughly 3x a single-shot query.

Agentic RAG Control Flow

User Query

Query Analysis & Dec...

Retrieval Execution

Relevance Grading

Basic Agentic RAG Loop Implementationpython

123456789101112
from typing import List, Tuple
from dataclasses import dataclass

@dataclass
class RetrievalResult:
    documents: List[str]
    relevance_scores: List[float]
    source: str

class AgenticRAG:
    def __init__(self, llm, retriever, max_iterations=3):
        self.llm = llm

Key Insight

Relevance Grading is the Critical Decision Point

The relevance grading step determines whether your agentic RAG system adds value or just adds latency. Without effective grading, the agent either accepts irrelevant documents (degrading answer quality) or rejects relevant ones (triggering unnecessary iterations).

Anti-Pattern: The Infinite Loop Trap

❌ Problem

Users experience extreme latency (30+ seconds for simple queries), costs spiral ...

✓ Solution

Implement a three-part termination strategy: (1) hard maximum iteration limit (3...

Implementing Your First Agentic RAG System

Audit Your Current RAG Failures

Design Your Relevance Grading Prompt

Implement the Sufficiency Check

Build Query Reformulation Logic

Assemble the Loop with Safeguards

Stripe

Corrective RAG for Developer Documentation Search

Search success rate (measured by users not immediately searching again) improved...

Start with Corrective RAG Before Full Agentic

Full agentic RAG with query decomposition and multi-source orchestration is complex to build and debug. Start with Corrective RAG—a simpler pattern that only triggers additional retrieval when the first attempt fails quality checks.

Agentic RAG Readiness Assessment

Key Insight

The Query Reformulation Paradox

Effective query reformulation requires understanding why the original query failed—but you often can't know why it failed without seeing what it retrieved. This creates a chicken-and-egg problem that teams solve in different ways.

Framework

The Retrieval Confidence Hierarchy

High Confidence (>0.85)

Top retrieved documents have high semantic similarity and pass LLM relevance grading. Proceed direct...

Medium Confidence (0.65-0.85)

Some relevant documents retrieved but potential gaps identified. Trigger one reformulation attempt w...

Low Confidence (0.45-0.65)

Retrieved documents are marginally relevant or miss key aspects of the query. Trigger query decompos...

Very Low Confidence (<0.45)

Retrieved documents are largely irrelevant. Before iterating, verify the information exists in your ...

Practice Exercise

Build a Corrective RAG Prototype

90 min

Beware of Context Pollution

Each retrieval iteration adds documents to your context. Without careful management, you accumulate marginally relevant documents that dilute the signal from highly relevant ones.

Framework

The REACT Pattern for RAG Agents

Thought Generation

The agent explicitly states what it's thinking about the current situation. This includes analyzing ...

Action Selection

Based on the thought, the agent selects an appropriate action from available tools. Actions might in...

Observation Processing

After executing an action, the agent observes and interprets the results. This includes evaluating r...

Iteration Control

The framework includes mechanisms to decide when to continue iterating versus when to synthesize a f...

Implementing a REACT RAG Agentpython

123456789101112
from typing import List, Dict, Optional
from dataclasses import dataclass
from enum import Enum

class ActionType(Enum):
    SEARCH_DOCS = "search_docs"
    SEARCH_CODE = "search_code"
    QUERY_DATABASE = "query_database"
    SYNTHESIZE = "synthesize"

@dataclass
class AgentState:

Standard RAG vs. Corrective RAG

Standard RAG

Single retrieval pass with fixed k documents

Trusts all retrieved documents equally regardless of relevan...

No mechanism to detect or recover from retrieval failures

Query failures result in hallucinated or irrelevant answers

Corrective RAG

Multiple retrieval attempts with dynamic document counts

Grades each document for relevance before inclusion

Automatic correction through query reformulation or source e...

Graceful degradation with explicit uncertainty when retrieva...

Notion

Building Self-Correcting Knowledge Retrieval

Answer accuracy improved from 71% to 89% on their internal benchmark of complex ...

Implementing Iterative Retrieval

Design Your Iteration Triggers

Build the Query Reformulation Engine

Implement Progressive Context Building

Create the Sufficiency Evaluator

Add Iteration Limits and Circuit Breakers

Key Insight

Tool-Augmented RAG Transforms Static Knowledge into Dynamic Intelligence

The most powerful RAG systems don't just retrieve documents—they orchestrate tools to gather, validate, and synthesize information dynamically. Tool-augmented RAG gives your agent capabilities beyond text retrieval: executing code to validate technical claims, querying databases for real-time data, calling APIs for current information, and running calculations to verify numerical assertions.

Tool-Augmented RAG with Multiple Data Sourcestypescript

123456789101112
interface Tool {
  name: string;
  description: string;
  parameters: JSONSchema;
  execute: (params: any) => Promise<ToolResult>;
}

interface ToolResult {
  success: boolean;
  data: any;
  confidence: number;
  source: string;

Anti-Pattern: The Monolithic Retriever Anti-Pattern

❌ Problem

Systems using monolithic retrieval typically see 40-50% lower accuracy on comple...

✓ Solution

Implement query classification as the first step in your RAG pipeline. Categoriz...

Framework

Multi-Source RAG Architecture

Source Registry

A central catalog of all available knowledge sources with metadata about each: data type (structured...

Query Router

Analyzes incoming queries to determine which sources are most likely to contain relevant information...

Source Adapters

Standardized interfaces that normalize results from heterogeneous sources into a common format. Each...

Result Fusion

Combines results from multiple sources into a unified, deduplicated context. Handles conflicting inf...

Stripe

Multi-Source RAG for Developer Support

First-contact resolution rate improved from 45% to 72% after implementing multi-...

3.2x

Improvement in answer accuracy when using agentic RAG vs. single-shot retrieval

This benchmark tested 500 complex questions requiring multi-hop reasoning across a corpus of 100,000 documents.

RAG Agent Safety and Control Checklist

Corrective RAG Decision Flow

User Query

Initial Retrieval

Relevance Grading

[If Low Score] Query...

Key Insight

Self-Reflection Enables RAG Agents to Catch Their Own Mistakes

The most reliable RAG agents implement explicit self-reflection steps where they evaluate their own outputs before returning them to users. This isn't just about checking for hallucinations—it's about verifying that the response actually addresses the user's question, that cited sources support the claims made, and that the answer is complete.

Practice Exercise

Build a Corrective RAG Pipeline

90 min

Beware of Correction Loops

Corrective RAG can enter infinite loops if the correction strategy doesn't actually improve retrieval. If query reformulation consistently produces queries that retrieve the same irrelevant documents, the system will keep correcting without progress.

Perplexity

Multi-Step Reasoning with Source Verification

Perplexity's approach achieves 94% factual accuracy on their internal benchmark,...

Key Insight

Retrieval Feedback Loops Create Continuously Improving RAG Systems

The most sophisticated RAG systems implement feedback loops that use agent behavior and user signals to continuously improve retrieval quality. When an agent repeatedly reformulates a query before finding relevant documents, that's a signal that your knowledge base might be missing content or your embeddings aren't capturing the right semantics.

Use Lightweight Models for Agent Scaffolding

Not every LLM call in an agentic RAG system needs your most powerful model. Use lightweight models (GPT-3.5, Claude Haiku, or fine-tuned small models) for scaffolding tasks like relevance grading, query classification, and tool selection.

Practice Exercise

Build a Self-Correcting RAG Agent

90 min

Corrective RAG Implementation with Self-Healingpython

123456789101112
from typing import List, Dict, Tuple
from dataclasses import dataclass
from enum import Enum
import asyncio

class RetrievalQuality(Enum):
    EXCELLENT = "excellent"  # >0.8 relevance
    ACCEPTABLE = "acceptable"  # 0.5-0.8 relevance
    POOR = "poor"  # <0.5 relevance
    FAILED = "failed"  # No results

@dataclass

Practice Exercise

Implement Multi-Source RAG with Source Arbitration

120 min

Tool-Augmented RAG Agent with Dynamic Tool Selectionpython

123456789101112
from typing import List, Dict, Any, Callable
from abc import ABC, abstractmethod
import json

class RAGTool(ABC):
    """Base class for RAG tools."""
    name: str
    description: str
    
    @abstractmethod
    async def execute(self, **kwargs) -> Dict[str, Any]:
        pass

Agentic RAG Production Readiness Checklist

Anti-Pattern: The Infinite Loop Agent

❌ Problem

Runaway costs as queries burn through token budgets. User experience degrades as...

✓ Solution

Implement strict iteration limits with exponential backoff on retry delays. Add ...

Anti-Pattern: The Over-Correcting Agent

❌ Problem

Latency increases 2-3x as unnecessary correction loops execute. Costs rise from ...

✓ Solution

Calibrate quality thresholds using real query distributions from your users. Mea...

Anti-Pattern: The Black Box Agent

❌ Problem

Debugging production issues becomes guesswork. You can't identify which componen...

✓ Solution

Log every agent decision with full context: what was considered, what was chosen...

Practice Exercise

Build an Evaluation Suite for Agentic RAG

180 min

Essential Resources for Agentic RAG Development

LangGraph Documentation and Tutorials

tool

Corrective RAG Paper (Shi et al., 2024)

article

RAGAS Evaluation Framework

tool

Self-RAG: Learning to Retrieve, Generate, and Critique (Asai et al., 2023)

article

Comprehensive Agentic RAG Evaluation Frameworkpython

123456789101112
from dataclasses import dataclass
from typing import List, Dict, Optional
from enum import Enum
import asyncio
import json

class QueryType(Enum):
    SIMPLE_LOOKUP = "simple_lookup"
    MULTI_HOP = "multi_hop"
    REQUIRES_CORRECTION = "requires_correction"
    MULTI_SOURCE = "multi_source"
    UNANSWERABLE = "unanswerable"

Start Simple, Add Agency Incrementally

The most successful agentic RAG deployments start with basic RAG and add agency only where data proves it's needed. Measure your baseline single-pass RAG performance first.

Framework

The ITERATE Framework for Agentic RAG Development

Instrument Everything

Before adding any agency, instrument your baseline RAG with comprehensive logging. Track retrieval s...

Test with Diverse Queries

Build evaluation datasets that cover your full query distribution including edge cases. Include simp...

Evaluate Component-wise

Measure each agentic component independently: retrieval quality, correction effectiveness, tool sele...

Refine Incrementally

Add agency one capability at a time. Start with retrieval correction, measure impact, then add tool ...

67%

of production RAG failures are retrieval failures that agentic correction can address

This statistic underscores why agentic RAG focuses heavily on retrieval correction.

Quick Win: Add Retrieval Confidence to Every Response

Even without full agentic capabilities, adding a retrieval confidence score to every RAG response provides immediate value. Calculate the average similarity score of retrieved documents, and if it falls below a threshold (typically 0.7), append a disclaimer like 'This answer is based on limited relevant information.' Users appreciate the transparency, and you've built the foundation for future correction logic..

Chapter Complete!

Agentic RAG transforms passive retrieval into active reasoni...

Corrective RAG implements a three-stage pipeline: retrieve, ...

Tool-augmented RAG extends agent capabilities beyond vector ...

Multi-source RAG requires careful attention to source arbitr...

Next: Begin by instrumenting your current RAG system to measure retrieval quality on every query

PreviousNext