If you’ve been hearing the phrase “AI agents” everywhere and wondering what’s actually happening under the hood — you’re in the right place. This workshop breaks down AI agent architecture piece by piece, gives you real code to run, and leaves you with a mental model that’ll stick.
By the end, you’ll understand how agents perceive their environment, remember context, plan actions, and execute — and you’ll have built a minimal working agent from scratch.
Let’s get into it.
What Is an AI Agent, Really?
Before we touch code, let’s nail the definition.
An AI agent is a system that perceives inputs, reasons about them, and takes actions to accomplish a goal — often in a loop, repeatedly, until the task is done or a stopping condition is met.
That’s it. The word “agent” just means something that acts on behalf of something else.
What makes modern AI agents interesting is the LLM at the center. Instead of hand-coding every decision rule, the language model reasons dynamically. You tell it what tools it has, what the goal is, and let it figure out how to get there.
Here’s the core mental model you need:
Perception → Memory → Planning → Action → (repeat)
This is the agent loop. Everything else — memory systems, tool calling, multi-agent orchestration — is just elaboration on this loop.
The Four Core Components of AI Agent Architecture
1. Perception: How the Agent Takes In the World
Perception is everything the agent receives as input. This includes:
- User messages — the immediate task or question
- Tool results — outputs from functions the agent called
- Environment state — current context, like file contents or API responses
- System prompts — static instructions that shape behavior
In code, perception is simply what you stuff into the messages array before you call the LLM.
Example:
messages = [
{"role": "system", "content": "You are a helpful coding assistant with access to a Python REPL."},
{"role": "user", "content": "Write a script that reads a CSV and prints the column names."},
]
The agent “sees” all of this as its current world state. Good architecture means carefully controlling what the agent perceives — give it too much noise and performance degrades; give it too little and it can’t do the job.
2. Memory: How the Agent Remembers
Memory is one of the most underrated parts of agent design. There are four types:
| Memory Type | Where It Lives | What It’s Good For |
|---|---|---|
| In-context | The message window | Short-term working memory |
| External (vector DB) | Database like Pinecone or Chroma | Long-term semantic recall |
| Episodic | Structured log of past events | Learning from previous runs |
| Semantic | Curated facts about the world | Domain knowledge lookup |
For most beginners, in-context memory is all you need. You maintain a running messages list and append to it each turn.
Simple in-context memory:
def run_agent(user_input: str, history: list) -> str:
history.append({"role": "user", "content": user_input})
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=history
)
assistant_message = response.content[0].text
history.append({"role": "assistant", "content": assistant_message})
return assistant_message
The moment the history list fills up the context window, you have a memory problem. That’s when you graduate to external memory — but don’t worry about that yet.
3. Planning: How the Agent Decides What to Do
Planning is where the LLM earns its keep. Given the current perception and memory, the model figures out what to do next.
Two planning patterns dominate:
ReAct (Reason + Act): The model alternates between “thinking” (writing out its reasoning) and “acting” (calling a tool). Think of it as the agent talking through its problem before picking up a wrench.
Thought: The user wants to know the weather in Paris. I should call the weather API.
Action: get_weather(city="Paris")
Observation: {"temperature": 14, "condition": "cloudy"}
Thought: I have the weather data. I can now answer.
Answer: It's 14°C and cloudy in Paris right now.
Plan-and-Execute: The model first creates a multi-step plan, then executes each step. Better for complex tasks but more brittle.
Most modern frameworks (LangGraph, AutoGen, Claude’s tool use) implement a form of ReAct. Understanding it helps you debug when your agent gets stuck in loops or skips steps.
4. Action: How the Agent Does Things
Actions are how the agent affects the world. They come in two flavors:
Tool calls — invoking functions with defined inputs and outputs:
– Search the web
– Read/write files
– Call an API
– Run code
– Query a database
Responses — generating text back to the user. Sometimes the right action is just saying something.
In the Claude API, you define tools like this:
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a city",
"input_schema": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city name"
}
},
"required": ["city"]
}
}
]
The model decides whether to call a tool and what arguments to pass. Your code handles actually running the tool and returning the result.
Workshop Exercise: Build a Minimal Research Agent
Now let’s put it all together. We’ll build a simple research agent that can search the web and synthesize an answer.
Prerequisites
pip install anthropic
You’ll need an Anthropic API key in your environment:
export ANTHROPIC_API_KEY="your-key-here"
Step 1: Define Your Tools
import anthropic
import json
client = anthropic.Anthropic()
# Simulate a web search (replace with real search API in production)
def search_web(query: str) -> str:
"""Simulated search results for workshop purposes."""
results = {
"harness engineering": "Harness engineering is the discipline of building reliable infrastructure for AI agents, including monitoring, circuit breakers, and orchestration patterns.",
"AI agent architecture": "AI agents typically consist of perception, memory, planning, and action components organized in a loop.",
"LangGraph": "LangGraph is a framework for building stateful, multi-actor applications with LLMs using a graph-based control flow.",
}
for key in results:
if key.lower() in query.lower():
return results[key]
return f"No results found for: {query}"
tools = [
{
"name": "search_web",
"description": "Search the web for information on a topic",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query"
}
},
"required": ["query"]
}
}
]
Step 2: Build the Agent Loop
def run_research_agent(question: str) -> str:
messages = [
{"role": "user", "content": question}
]
system_prompt = """You are a research assistant. Use the search_web tool to
find information before answering. Always search first, then synthesize
what you find into a clear, accurate response."""
while True:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=system_prompt,
tools=tools,
messages=messages
)
# Check if the agent wants to use a tool
if response.stop_reason == "tool_use":
# Extract tool calls
tool_results = []
for block in response.content:
if block.type == "tool_use":
print(f" → Searching: {block.input['query']}")
result = search_web(block.input["query"])
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
# Add assistant response and tool results to history
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
elif response.stop_reason == "end_turn":
# Agent is done — extract final text response
for block in response.content:
if hasattr(block, "text"):
return block.text
break
else:
break
return "Agent finished without a response."
Step 3: Run It
if __name__ == "__main__":
question = "What is harness engineering and why does it matter for AI agents?"
print(f"Question: {question}\n")
print("Agent thinking...\n")
answer = run_research_agent(question)
print(f"\nAnswer:\n{answer}")
Expected output:
Question: What is harness engineering and why does it matter for AI agents?
Agent thinking...
→ Searching: harness engineering AI agents
Answer:
Harness engineering is the discipline of building reliable infrastructure
for AI agents — covering monitoring, circuit breakers, and orchestration
patterns. It matters because agents that work in demos often fail in
production without this reliability layer...
You just built a functioning AI agent. It perceives a question, plans to search, takes an action, and synthesizes a response. That’s the full loop.
Common Beginner Mistakes (And How to Avoid Them)
Mistake 1: Ignoring the Stop Reason
Always check stop_reason. If it’s tool_use, you must handle tool calls before looping. If you skip this, the agent will stall or return an empty response.
Mistake 2: Infinite Loops
Add a max iterations guard:
MAX_ITERATIONS = 10
for iteration in range(MAX_ITERATIONS):
response = client.messages.create(...)
if response.stop_reason == "end_turn":
break
else:
print("Warning: Agent hit max iterations")
Mistake 3: Vague Tool Descriptions
The model decides whether to call a tool based entirely on the description field. Be specific. “Search the web” is worse than “Search the web for factual information about a specific topic.”
Mistake 4: Stuffing Too Much in the System Prompt
Keep your system prompt focused. Long, sprawling instructions confuse the model. One role, a few constraints, that’s it.
What Comes Next: Leveling Up Your Architecture
Once you’re comfortable with the basic loop, here’s the progression:
Add persistent memory — Use a vector database to store and retrieve past interactions. Libraries like ChromaDB make this straightforward.
Add structured output — Instead of parsing free text, use tool calling to force the model to return structured data. More reliable for downstream processing.
Multi-agent patterns — Decompose complex tasks across specialized agents. One agent plans, another executes, another reviews. This is where LangGraph and similar frameworks shine.
Observability — Add logging to every tool call and LLM invocation. You cannot debug what you cannot see. This is literally what harness engineering is about: wrapping your agents with the infrastructure that makes them production-worthy.
Pro tip: The jump from “demo agent” to “production agent” is mostly an infrastructure problem, not a model problem. Reliability, retries, timeouts, logging, fallbacks — that’s the harness.
Your Workshop Checklist
Before you move on, make sure you can answer these:
- [ ] What are the four components of an AI agent?
- [ ] What is the agent loop and why does it repeat?
- [ ] What’s the difference between in-context and external memory?
- [ ] What does
stop_reason == "tool_use"mean in the Claude API? - [ ] Why do tool descriptions matter so much?
If you can answer all five, you have a solid foundation.
Keep Building
This workshop gave you the skeleton. The flesh comes from building real things. Pick a small, concrete use case — a personal research assistant, a code reviewer, a data pipeline checker — and build it using the pattern you just learned.
The best way to deepen your understanding of agent architecture is to hit the weird edge cases: the infinite loop, the tool that returns garbage, the model that ignores your instructions. Those failures teach you more than any tutorial.
Ready to go deeper? Check out our tutorials on agent memory systems and multi-agent orchestration with LangGraph — both designed for engineers just getting started with harness engineering.
And if you want to follow along as we build this discipline in public, subscribe to the newsletter below. New tutorials drop every two weeks.
Kai Renner is a senior AI/ML engineering leader with a PhD in Computer Engineering and 10+ years building production AI systems. He writes about making AI agents reliable, observable, and production-ready at harnessengineering.academy.