FOUNDATION30 min66 sections

AI Agent Fundamentals

THIS WEEK'S JOURNEY

The Dawn of Autonomous AI Systems

We stand at a pivotal moment in software engineering where AI systems are evolving from passive responders to active participants in complex workflows. AI agents represent a fundamental shift from the chatbots and simple automation we've built for the past decade—they can reason about problems, break down complex tasks, use tools to interact with external systems, and maintain context across extended interactions.

78%

of enterprise AI projects fail to move from prototype to production

The primary reason for this failure rate isn't technical capability—it's architectural misunderstanding.

Key Insight

An AI Agent is a System, Not a Model

The most common misconception among engineers new to AI agents is conflating the agent with the underlying language model. An AI agent is an orchestration system that uses an LLM as its reasoning engine, combined with tools, memory, and a control loop that enables autonomous operation.

Chatbot vs. Agent: The Fundamental Differences

Traditional Chatbot

Single turn or simple multi-turn conversations with no persi...

Responds to user input but cannot initiate actions independe...

Limited to information retrieval and text generation

Stateless or simple session-based memory that resets between...

AI Agent

Goal-oriented operation that persists across multiple intera...

Autonomously executes multi-step plans using external tools ...

Can read, write, and modify external systems and databases

Maintains long-term memory and learns from past interactions

Notion

Evolution from Q&A Bot to Autonomous Agent

Notion reported a 340% increase in AI feature engagement after the agent launch,...

The Agent Loop: Continuous Cycle of Autonomous Operation

PERCEIVE (Receive go...

REASON (Analyze situ...

PLAN (Determine next...

ACT (Execute tool ca...

Key Insight

The Agent Loop is Where Reliability Lives or Dies

The agent loop—the continuous cycle of reasoning, acting, and observing—is the most critical architectural component and the source of most production failures. Unlike a chatbot that processes a request and returns a response, an agent might loop dozens of times to complete a complex task.

Framework

The OODA Loop for AI Agents

Observe

The agent gathers information from its environment—user input, tool outputs, memory retrieval, and s...

Orient

The agent contextualizes observations using its training, instructions, and memory. This is where th...

Decide

Based on orientation, the agent selects the next action from available options. This might be callin...

Act

The agent executes the chosen action, typically by calling a tool. On AWS, this triggers Lambda func...

Agents Amplify Both Capability and Risk

Every capability you give an agent is also a potential failure mode. An agent that can send emails can spam your customers.

Workflows vs. Agents: Choosing the Right Paradigm

Deterministic Workflow (Step Functions)

Predefined execution paths with explicit branching logic

Highly predictable cost and execution time

Easy to test, debug, and audit for compliance

Best for well-understood processes with clear rules

AI Agent (Bedrock + Orchestration)

Dynamic execution paths determined at runtime by reasoning

Variable cost and time based on task complexity

Harder to test exhaustively due to non-deterministic behavio...

Best for ambiguous tasks requiring judgment and adaptation

Anti-Pattern: The 'Agent Everything' Trap

❌ Problem

Teams waste months building agent infrastructure for problems that don't require...

✓ Solution

Apply the 'minimum autonomy principle': use the simplest architecture that solve...

Key Insight

Tool Calling is the Bridge Between Reasoning and Reality

Tool calling—also known as function calling—is the mechanism that transforms an LLM from a text generator into an agent capable of affecting the real world. When you define tools for an agent, you're essentially teaching it what actions are possible and how to invoke them.

Anatomy of a Well-Defined Tool for Bedrock Agentsjson

123456789101112
{
  "name": "search_customer_orders",
  "description": "Search for customer orders by various criteria. Use this tool when the user asks about order status, order history, or needs to find a specific order. Returns up to 10 matching orders sorted by date descending. Requires at least one search parameter.",
  "input_schema": {
    "type": "object",
    "properties": {
      "customer_email": {
        "type": "string",
        "description": "Customer's email address for exact match"
      },
      "order_id": {
        "type": "string",

Stripe

Building Reliable Financial Agents with Strict Tool Boundaries

Stripe reduced customer support resolution time by 67% while maintaining zero un...

Start with Read-Only Tools

When building your first agent, implement only read-only tools initially. An agent that can search, retrieve, and analyze data but cannot modify anything is inherently safe to experiment with.

Key Insight

Planning and Reasoning: The Agent's Inner Monologue

Modern AI agents employ various planning strategies that determine how they approach complex tasks. ReAct (Reasoning and Acting) interleaves thinking and action—the agent reasons about what to do, takes an action, observes the result, then reasons again.

Implementing ReAct Pattern on AWS

Initialize Agent Context

Reasoning Step

Action Validation

Tool Execution

Observation Recording

Agent Readiness Assessment: Is an Agent Right for Your Use Case?

The Context Window is Your Agent's Working Memory Limit

Every piece of information your agent needs to reason about must fit in the context window—typically 100K-200K tokens for modern models. Long-running agents accumulate conversation history, tool outputs, and observations that can exhaust this limit.

Framework

The OODA Loop for AI Agents

Observe

The agent gathers information from its environment through tool calls, API responses, user inputs, a...

Orient

The agent interprets observations through its context window, applying the LLM's reasoning capabilit...

Decide

Based on orientation, the agent selects the next action from available tools or determines that it h...

Act

The agent executes its decision by calling a tool, generating a response, or modifying system state....

Deterministic Workflows vs. Autonomous Agents

Deterministic Workflows

Fixed execution path defined at design time—every possible b...

Predictable behavior makes testing straightforward with 100%...

Scales linearly with complexity—10x more cases means 10x mor...

Fails gracefully with predefined error handlers but cannot a...

Autonomous Agents

Dynamic execution path determined at runtime based on contex...

Probabilistic behavior requires statistical testing and outp...

Handles complexity through reasoning—novel cases don't requi...

Can recover from unexpected situations by reasoning about al...

Notion

Building Notion AI with Hybrid Agent Architecture

Notion AI reached 1 million users within 3 weeks of launch, with 73% weekly rete...

Anti-Pattern: The God Tool Anti-Pattern

❌ Problem

God tools lead to unpredictable agent behavior, difficult debugging, and securit...

✓ Solution

Design tools following the Unix philosophy: each tool should do one thing well. ...

Implementing Your First Agent Loop in Python

Define Your Tool Schema

Build the Tool Executor

Implement the Core Loop

Add State Management

Implement Graceful Termination

Key Insight

Planning Is What Separates Agents from Autocomplete

The most sophisticated chatbots are still fundamentally reactive—they respond to the immediate input without considering multi-step strategies. Agents, by contrast, engage in planning: decomposing complex goals into subtasks, ordering those subtasks logically, and adapting the plan as new information emerges.

Basic Agent Loop Implementationpython

123456789101112
import anthropic
from typing import Any

def run_agent(user_message: str, tools: list, max_iterations: int = 10) -> str:
    client = anthropic.Anthropic()
    messages = [{"role": "user", "content": user_message}]
    
    for iteration in range(max_iterations):
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            tools=tools,

Token Costs Compound in Agent Loops

Each iteration of an agent loop includes the full conversation history in the API call. A 10-iteration agent loop doesn't cost 10x a single call—it costs roughly 55x (1+2+3+...+10) due to the growing context.

67%

of agent failures occur in the first 3 tool calls

This statistic reveals that most agent failures aren't deep reasoning errors—they're setup problems.

Framework

The ReAct Framework: Reasoning + Acting

Thought

Before each action, the agent generates explicit reasoning about what it knows, what it needs to fin...

Action

The agent selects and executes a tool with specific parameters. In ReAct, actions are always grounde...

Observation

Tool results are formatted as observations that the agent incorporates into its reasoning. The obser...

Reflection

After observations, the agent reflects on whether the result was expected, what it learned, and how ...

Stripe

How Stripe Built Their Support Agent with Constrained Autonomy

The agent now handles 42% of incoming support tickets end-to-end, with a 91% cus...

Agent Readiness Assessment

The Agent Decision Tree

User Request

Intent Clear?

→ No →

Request Clarificatio...

→ Yes ↓

Within Capabilities?

Use Streaming for Better User Experience

Agent loops can take 30+ seconds for complex tasks. Without feedback, users assume the system is broken.

Practice Exercise

Build a Research Agent

45 min

Key Insight

The 'Goldilocks Zone' for Agent Autonomy

Research from Microsoft and Stanford reveals an optimal autonomy level for most business applications: agents should have full autonomy over information gathering and analysis, but require confirmation for actions with real-world consequences. This isn't just about safety—it's about user trust.

Single-Agent vs. Multi-Agent Architectures

Single-Agent

One model instance handles the entire task from start to fin...

Simpler to implement, debug, and monitor in production

Context is preserved naturally throughout the conversation

Limited by single context window size (typically 128K-200K t...

Multi-Agent

Multiple specialized agents collaborate, each with focused c...

Complex orchestration logic required, harder to debug failur...

Context must be explicitly passed between agents, risking in...

Can handle arbitrarily complex tasks by distributing across ...

Anti-Pattern: The Infinite Context Trap

❌ Problem

Anthropic's research shows that Claude's accuracy on retrieval tasks drops by 23...

✓ Solution

Use dynamic context injection: start with a minimal system prompt covering core ...

Essential Reading for Agent Development

ReAct: Synergizing Reasoning and Acting in Language Models

article

Anthropic's Tool Use Documentation

article

LangChain's Agent Documentation

article

Building LLM Applications for Production

article

THIS WEEK'S JOURNEY

Putting Agent Fundamentals into Practice

Understanding AI agent concepts intellectually is only the first step—true mastery comes through hands-on implementation and deliberate practice. This section provides concrete exercises, real code examples, and practical checklists that will transform theoretical knowledge into production skills.

Practice Exercise

Build Your First Agent Loop from Scratch

45 min

Minimal Agent Loop Implementationpython

123456789101112
import json
from typing import Callable
import boto3

class MinimalAgent:
    def __init__(self, model_id: str = 'anthropic.claude-3-sonnet-20240229-v1:0'):
        self.bedrock = boto3.client('bedrock-runtime')
        self.model_id = model_id
        self.tools = {}
        self.max_iterations = 10
        
    def register_tool(self, name: str, func: Callable, description: str, parameters: dict):

Practice Exercise

Implement Tool Call Validation and Error Handling

30 min

Agent Loop Implementation Checklist

Tool Definition with Bedrock Agentspython

123456789101112
# Define tools for AWS Bedrock Agents using OpenAPI schema
import json

# Tool definition following Bedrock's action group format
calendar_tool_schema = {
    "openapi": "3.0.0",
    "info": {
        "title": "Calendar Management API",
        "version": "1.0.0"
    },
    "paths": {
        "/events": {

Anti-Pattern: The 'God Tool' Anti-Pattern

❌ Problem

God tools lead to dramatically higher error rates (often 40-60% failure on compl...

✓ Solution

Create focused, single-purpose tools with clear names and simple parameter schem...

Anti-Pattern: The 'No Guardrails' Anti-Pattern

❌ Problem

Without guardrails, a single bad interaction can generate thousands of dollars i...

✓ Solution

Implement defense in depth from day one. Set hard limits on iterations (10-25 ty...

Anti-Pattern: The 'Prompt and Pray' Anti-Pattern

❌ Problem

Prompt entropy leads to unpredictable agent behavior. Instructions added to fix ...

✓ Solution

Treat prompts as code with proper engineering practices. Version control all pro...

Practice Exercise

Design a Tool Schema for a Real Use Case

25 min

Framework

The TRACE Framework for Agent Debugging

Trigger

Identify exactly what triggered the failure. Was it a specific user input, a particular tool respons...

Reasoning

Examine the LLM's reasoning at each step. Did it correctly understand the goal? Did it choose approp...

Actions

Audit all tool calls made during the session. Were they called in the right order? Did they receive ...

Context

Evaluate the context available to the LLM at each decision point. Was relevant information missing? ...

Start with Verbose Logging, Optimize Later

In early development, log everything: full prompts, complete responses, all tool inputs and outputs, timing data, and token counts. This feels excessive but pays enormous dividends when debugging.

Essential Resources for Agent Development

AWS Bedrock Agents Documentation

article

Anthropic's Tool Use Guide

article

LangChain Agent Documentation

article

Building LLM Applications for Production (Chip Huyen)

article

Practice Exercise

Build an Agent Evaluation Suite

60 min

Human-in-the-Loop is Not Optional for Production

Every production agent needs human oversight mechanisms. This doesn't mean humans approve every action—it means humans can intervene when needed, review decisions after the fact, and override agent behavior.

Agent Testing Approaches

Unit Testing (Tools)

Test individual tools in isolation with mocked inputs

Verify parameter validation catches invalid inputs

Confirm error handling returns helpful messages

Check output schemas match specifications

Integration Testing (Full Agent)

Test complete agent flows with real LLM calls

Verify tool selection matches expected behavior

Confirm multi-step reasoning produces correct results

Check guardrails trigger appropriately

Agent Evaluation Test Case Structurepython

123456789101112
from dataclasses import dataclass
from typing import List, Optional
import json

@dataclass
class ExpectedToolCall:
    tool_name: str
    required_params: dict  # Params that must match exactly
    optional_params: dict = None  # Params that may or may not be present

@dataclass
class AgentTestCase:

67%

of agent failures are caused by tool-related issues

Analysis of production agent failures shows that most issues stem from tool design, not LLM reasoning.

Beware of Evaluation Overfitting

As you build evaluation suites, resist the temptation to optimize solely for your test cases. Agents that score perfectly on fixed test sets often fail on real user queries that differ slightly from test patterns.

Pre-Production Agent Launch Checklist

Chapter Complete!

AI agents differ fundamentally from chatbots through their a...

The agent loop (perceive → reason → act → observe) is the co...

Tool design is often more important than prompt engineering—...

Production agents require comprehensive guardrails including...

Next: Begin by implementing a minimal agent loop using the code examples provided, focusing on clean separation between orchestration and tools