AI Agent Architecture: A Hands-On Workshop

If you’ve been hearing the phrase “AI agents” everywhere and wondering what’s actually happening under the hood — you’re in the right place. This workshop breaks down AI agent architecture piece by piece, gives you real code to run, and leaves you with a mental model that’ll stick.

By the end, you’ll understand how agents perceive their environment, remember context, plan actions, and execute — and you’ll have built a minimal working agent from scratch.

Let’s get into it.

What Is an AI Agent, Really?

Before we touch code, let’s nail the definition.

An AI agent is a system that perceives inputs, reasons about them, and takes actions to accomplish a goal — often in a loop, repeatedly, until the task is done or a stopping condition is met.

That’s it. The word “agent” just means something that acts on behalf of something else.

What makes modern AI agents interesting is the LLM at the center. Instead of hand-coding every decision rule, the language model reasons dynamically. You tell it what tools it has, what the goal is, and let it figure out how to get there.

Here’s the core mental model you need:

Perception → Memory → Planning → Action → (repeat)

This is the agent loop. Everything else — memory systems, tool calling, multi-agent orchestration — is just elaboration on this loop.

The Four Core Components of AI Agent Architecture

1. Perception: How the Agent Takes In the World

Perception is everything the agent receives as input. This includes:

User messages — the immediate task or question
Tool results — outputs from functions the agent called
Environment state — current context, like file contents or API responses
System prompts — static instructions that shape behavior

In code, perception is simply what you stuff into the messages array before you call the LLM.

Example:

messages = [
    {"role": "system", "content": "You are a helpful coding assistant with access to a Python REPL."},
    {"role": "user", "content": "Write a script that reads a CSV and prints the column names."},
]

The agent “sees” all of this as its current world state. Good architecture means carefully controlling what the agent perceives — give it too much noise and performance degrades; give it too little and it can’t do the job.

2. Memory: How the Agent Remembers

Memory is one of the most underrated parts of agent design. There are four types:

Memory Type	Where It Lives	What It’s Good For
In-context	The message window	Short-term working memory
External (vector DB)	Database like Pinecone or Chroma	Long-term semantic recall
Episodic	Structured log of past events	Learning from previous runs
Semantic	Curated facts about the world	Domain knowledge lookup

For most beginners, in-context memory is all you need. You maintain a running messages list and append to it each turn.

Simple in-context memory:

def run_agent(user_input: str, history: list) -> str:
    history.append({"role": "user", "content": user_input})

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=history
    )

    assistant_message = response.content[0].text
    history.append({"role": "assistant", "content": assistant_message})

    return assistant_message

The moment the history list fills up the context window, you have a memory problem. That’s when you graduate to external memory — but don’t worry about that yet.

3. Planning: How the Agent Decides What to Do

Planning is where the LLM earns its keep. Given the current perception and memory, the model figures out what to do next.

Two planning patterns dominate:

ReAct (Reason + Act): The model alternates between “thinking” (writing out its reasoning) and “acting” (calling a tool). Think of it as the agent talking through its problem before picking up a wrench.

Thought: The user wants to know the weather in Paris. I should call the weather API.
Action: get_weather(city="Paris")
Observation: {"temperature": 14, "condition": "cloudy"}
Thought: I have the weather data. I can now answer.
Answer: It's 14°C and cloudy in Paris right now.

Plan-and-Execute: The model first creates a multi-step plan, then executes each step. Better for complex tasks but more brittle.

Most modern frameworks (LangGraph, AutoGen, Claude’s tool use) implement a form of ReAct. Understanding it helps you debug when your agent gets stuck in loops or skips steps.

4. Action: How the Agent Does Things

Actions are how the agent affects the world. They come in two flavors:

Tool calls — invoking functions with defined inputs and outputs:
– Search the web
– Read/write files
– Call an API
– Run code
– Query a database

Responses — generating text back to the user. Sometimes the right action is just saying something.

In the Claude API, you define tools like this:

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "The city name"
                }
            },
            "required": ["city"]
        }
    }
]

The model decides whether to call a tool and what arguments to pass. Your code handles actually running the tool and returning the result.

Workshop Exercise: Build a Minimal Research Agent

Now let’s put it all together. We’ll build a simple research agent that can search the web and synthesize an answer.

Prerequisites

pip install anthropic

You’ll need an Anthropic API key in your environment:

export ANTHROPIC_API_KEY="your-key-here"

Step 1: Define Your Tools

import anthropic
import json

client = anthropic.Anthropic()

# Simulate a web search (replace with real search API in production)
def search_web(query: str) -> str:
    """Simulated search results for workshop purposes."""
    results = {
        "harness engineering": "Harness engineering is the discipline of building reliable infrastructure for AI agents, including monitoring, circuit breakers, and orchestration patterns.",
        "AI agent architecture": "AI agents typically consist of perception, memory, planning, and action components organized in a loop.",
        "LangGraph": "LangGraph is a framework for building stateful, multi-actor applications with LLMs using a graph-based control flow.",
    }
    for key in results:
        if key.lower() in query.lower():
            return results[key]
    return f"No results found for: {query}"

tools = [
    {
        "name": "search_web",
        "description": "Search the web for information on a topic",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "The search query"
                }
            },
            "required": ["query"]
        }
    }
]

Step 2: Build the Agent Loop

def run_research_agent(question: str) -> str:
    messages = [
        {"role": "user", "content": question}
    ]

    system_prompt = """You are a research assistant. Use the search_web tool to
    find information before answering. Always search first, then synthesize
    what you find into a clear, accurate response."""

    while True:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            system=system_prompt,
            tools=tools,
            messages=messages
        )

        # Check if the agent wants to use a tool
        if response.stop_reason == "tool_use":
            # Extract tool calls
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    print(f"  → Searching: {block.input['query']}")
                    result = search_web(block.input["query"])
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result
                    })

            # Add assistant response and tool results to history
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})

        elif response.stop_reason == "end_turn":
            # Agent is done — extract final text response
            for block in response.content:
                if hasattr(block, "text"):
                    return block.text
            break
        else:
            break

    return "Agent finished without a response."

Step 3: Run It

if __name__ == "__main__":
    question = "What is harness engineering and why does it matter for AI agents?"
    print(f"Question: {question}\n")
    print("Agent thinking...\n")
    answer = run_research_agent(question)
    print(f"\nAnswer:\n{answer}")

Expected output:

Question: What is harness engineering and why does it matter for AI agents?

Agent thinking...

  → Searching: harness engineering AI agents

Answer:
Harness engineering is the discipline of building reliable infrastructure
for AI agents — covering monitoring, circuit breakers, and orchestration
patterns. It matters because agents that work in demos often fail in
production without this reliability layer...

You just built a functioning AI agent. It perceives a question, plans to search, takes an action, and synthesizes a response. That’s the full loop.

Common Beginner Mistakes (And How to Avoid Them)

Mistake 1: Ignoring the Stop Reason

Always check stop_reason. If it’s tool_use, you must handle tool calls before looping. If you skip this, the agent will stall or return an empty response.

Mistake 2: Infinite Loops

Add a max iterations guard:

MAX_ITERATIONS = 10

for iteration in range(MAX_ITERATIONS):
    response = client.messages.create(...)
    if response.stop_reason == "end_turn":
        break
else:
    print("Warning: Agent hit max iterations")

Mistake 3: Vague Tool Descriptions

The model decides whether to call a tool based entirely on the description field. Be specific. “Search the web” is worse than “Search the web for factual information about a specific topic.”

Mistake 4: Stuffing Too Much in the System Prompt

Keep your system prompt focused. Long, sprawling instructions confuse the model. One role, a few constraints, that’s it.

What Comes Next: Leveling Up Your Architecture

Once you’re comfortable with the basic loop, here’s the progression:

Add persistent memory — Use a vector database to store and retrieve past interactions. Libraries like ChromaDB make this straightforward.

Add structured output — Instead of parsing free text, use tool calling to force the model to return structured data. More reliable for downstream processing.

Multi-agent patterns — Decompose complex tasks across specialized agents. One agent plans, another executes, another reviews. This is where LangGraph and similar frameworks shine.

Observability — Add logging to every tool call and LLM invocation. You cannot debug what you cannot see. This is literally what harness engineering is about: wrapping your agents with the infrastructure that makes them production-worthy.

Pro tip: The jump from “demo agent” to “production agent” is mostly an infrastructure problem, not a model problem. Reliability, retries, timeouts, logging, fallbacks — that’s the harness.

Your Workshop Checklist

Before you move on, make sure you can answer these:

[ ] What are the four components of an AI agent?
[ ] What is the agent loop and why does it repeat?
[ ] What’s the difference between in-context and external memory?
[ ] What does stop_reason == "tool_use" mean in the Claude API?
[ ] Why do tool descriptions matter so much?

If you can answer all five, you have a solid foundation.

Keep Building

This workshop gave you the skeleton. The flesh comes from building real things. Pick a small, concrete use case — a personal research assistant, a code reviewer, a data pipeline checker — and build it using the pattern you just learned.

The best way to deepen your understanding of agent architecture is to hit the weird edge cases: the infinite loop, the tool that returns garbage, the model that ignores your instructions. Those failures teach you more than any tutorial.

Ready to go deeper? Check out our tutorials on agent memory systems and multi-agent orchestration with LangGraph — both designed for engineers just getting started with harness engineering.

And if you want to follow along as we build this discipline in public, subscribe to the newsletter below. New tutorials drop every two weeks.

Kai Renner is a senior AI/ML engineering leader with a PhD in Computer Engineering and 10+ years building production AI systems. He writes about making AI agents reliable, observable, and production-ready at harnessengineering.academy.