There’s a moment in every engineer’s journey when a chatbot stops feeling impressive and starts feeling limiting. You want something that doesn’t just answer — you want something that acts. An AI that can use tools, make decisions in a loop, and carry out a task end-to-end without you holding its hand.
That’s what agentic AI is. And you’re five minutes away from building one.
This tutorial is the first stop in the agentic AI course track here on harnessengineering.academy. By the end, you’ll have a working Python agent that calls a real tool, processes the result, and returns a final answer — all in under 60 lines of code. More importantly, you’ll understand why each piece exists, so you can extend it into something production-worthy.
What Is an Agentic AI System (and Why It Matters Now)
Before we touch code, let’s be precise about what we’re building — because “AI agent” gets used to describe everything from a glorified chatbot to a fully autonomous software engineer.
A chatbot takes your input, runs it through a language model, and returns text. One turn. Done. It has no memory of prior turns unless you explicitly pass them in, and it can’t take any action in the world.
An AI agent is different in three fundamental ways:
- Tools — The model can invoke external functions: search the web, call an API, read a file, run code.
- Memory — The agent maintains context across multiple steps in a task, not just a single exchange.
- Autonomy — The agent decides when to use a tool, which tool to use, and when it’s done — all without you scripting each step.
This isn’t a future-facing concept. According to Gartner’s 2024 AI Hype Cycle, over 70% of enterprises plan to deploy AI agents in production workflows within two years. The global AI agents market is projected to grow from roughly $5 billion in 2024 to over $47 billion by 2030 — a compounding growth rate near 45%. Developers who internalize agentic patterns report building automation prototypes 3 to 5 times faster than with traditional scripting approaches.
Learning to build agents now isn’t getting ahead of the curve — it’s catching up to where the industry already is.
Who This Tutorial Is For
You need to know basic Python — functions, loops, dictionaries. You don’t need to know anything about LLMs, frameworks, or AI beyond “I’ve used ChatGPT.” If you can run a Python script in a terminal, you’re ready.
What You’ll Need Before You Start
Prerequisites
- Python 3.9 or later — Check with
python3 --version. - A terminal — Any shell works: bash, zsh, PowerShell on Windows.
- An Anthropic API key — Sign up at console.anthropic.com. The free tier is sufficient for this tutorial.
Install the SDK
pip install anthropic
That’s the only dependency for this tutorial. No LangChain, no AutoGen, no orchestration framework yet — we’re building the bare-metal version first so you understand what those frameworks are abstracting.
Project Folder Structure
Create a folder and a single Python file:
my-first-agent/
└── agent.py
That’s it. Everything lives in agent.py for now. When your agent grows, you’ll naturally split it — but start minimal.
Step 1 — Define Your Agent’s Goal and Tool
Every agent operates on a loop. The simplest version has four stages:
Perceive → Decide → Act → Observe
- Perceive — The agent receives a task (your message).
- Decide — The model determines whether to use a tool or answer directly.
- Act — If a tool is needed, the agent calls it.
- Observe — The agent feeds the tool result back into the model and loops.
The loop terminates when the model decides it has enough information to answer — signaled by a stop_reason of "end_turn" instead of "tool_use".
Choosing Your First Tool
For this tutorial, we’ll build a simple calculator tool. It’s perfect for a first agent because:
- No external API key needed
- Deterministic output — easy to verify correctness
- Forces the model to use a tool rather than guess at arithmetic
Here’s the tool defined as a Python function:
def calculate(expression: str) -> str:
"""Evaluate a mathematical expression and return the result."""
try:
result = eval(expression, {"__builtins__": {}}, {})
return str(result)
except Exception as e:
return f"Error: {e}"
Gotcha — Never use
eval()in production. For this tutorial it’s fine because we control the input. In any real system, use a safe math parser likesimpleevalorasteval. We’re keeping it simple here to focus on the agent loop, not input sanitization.
Writing the System Prompt
The system prompt defines your agent’s role and instructs it on when to use tools. Keep it short and direct:
SYSTEM_PROMPT = """You are a helpful assistant with access to a calculator tool.
When a question requires arithmetic, use the calculator tool rather than guessing.
Always show your reasoning before calling a tool."""
Step 2 — Wire Up the Tool-Calling Loop
Here’s the complete agent in one file. Read it through once before we walk it line by line:
import anthropic
import json
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from environment
SYSTEM_PROMPT = """You are a helpful assistant with access to a calculator tool.
When a question requires arithmetic, use the calculator tool rather than guessing.
Always show your reasoning before calling a tool."""
TOOLS = [
{
"name": "calculate",
"description": "Evaluate a mathematical expression. Input must be a valid Python arithmetic expression.",
"input_schema": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "A Python arithmetic expression, e.g. '(15 * 4) + 7'"
}
},
"required": ["expression"]
}
}
]
def calculate(expression: str) -> str:
try:
result = eval(expression, {"__builtins__": {}}, {})
return str(result)
except Exception as e:
return f"Error: {e}"
def run_agent(user_message: str) -> str:
messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=SYSTEM_PROMPT,
tools=TOOLS,
messages=messages
)
# Append assistant response to message history
messages.append({"role": "assistant", "content": response.content})
# If the model is done, return the final text
if response.stop_reason == "end_turn":
for block in response.content:
if hasattr(block, "text"):
return block.text
# Otherwise, process tool calls
tool_results = []
for block in response.content:
if block.type == "tool_use":
tool_input = block.input
result = calculate(tool_input["expression"])
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
# Feed results back into the conversation
messages.append({"role": "user", "content": tool_results})
if __name__ == "__main__":
answer = run_agent("If I have 3 groups of 17 items, and then add 42 more, how many do I have total?")
print(answer)
Annotated Walkthrough
Lines 1–2: Import anthropic for the API client and json for safe handling later.
TOOLS list: This is how you describe your tool to the model. The input_schema follows JSON Schema format. The model reads this and knows exactly what arguments to pass when it decides to call calculate. This schema is the bridge between the model’s intent and your Python function.
run_agent function: This is the core loop. Notice it uses while True — the loop only breaks when stop_reason == "end_turn". Every iteration either gets the final answer or executes a tool call and loops again.
messages.append after each response: This is how the agent maintains memory within a task. Every exchange — user message, assistant response, tool results — gets added to the running conversation. The model always sees the full history.
tool_results with tool_use_id: When you feed tool output back to the model, you must match it to the original tool call using tool_use_id. Miss this and the model loses track of which result corresponds to which request.
Common Beginner Mistakes
Gotcha 1 — Missing loop termination. If you don’t check
stop_reason == "end_turn", your agent loops forever even after the model has its answer. Always gate your loop on the stop reason.Gotcha 2 — Not passing tool results back. A frequent mistake is calling the tool but forgetting to append the
tool_resultmessage before the next model call. Without it, the model doesn’t know what the tool returned and will often hallucinate an answer or call the tool again.Gotcha 3 — Forgetting to append the assistant message. You must add
response.contenttomessagesas anassistantrole message before appending tool results as auserrole message. The order matters.
Step 3 — Run It and See It Think
Set your API key and run the script:
export ANTHROPIC_API_KEY="your-key-here"
python3 agent.py
You should see something like:
Let me calculate that for you.
3 groups of 17 items = 51 items, plus 42 more = 93 items total.
You have **93 items** in total.
Behind the scenes, here’s what happened:
- Your message was sent to Claude with the tool definition attached.
- Claude’s response came back with
stop_reason: "tool_use"and atool_useblock containing{"expression": "(3 * 17) + 42"}. - Your Python code called
calculate("(3 * 17) + 42"), got"93", and sent it back. - Claude received the result, composed a final answer, and returned
stop_reason: "end_turn". - Your loop extracted the text and printed it.
Adding a Second Tool
Want to add a second capability without rewriting anything? Just define the tool function and append its schema to TOOLS:
def get_current_date() -> str:
from datetime import date
return date.today().isoformat()
# Add to TOOLS list:
{
"name": "get_current_date",
"description": "Returns today's date in ISO format (YYYY-MM-DD).",
"input_schema": {
"type": "object",
"properties": {},
"required": []
}
}
Then in your tool dispatch, add a conditional:
if block.name == "calculate":
result = calculate(tool_input["expression"])
elif block.name == "get_current_date":
result = get_current_date()
The model will now use whichever tool fits the task. You extended the agent’s capabilities without touching the loop logic.
From 5-Minute Prototype to Production-Ready Harness
Congratulations — you’ve built a working AI agent. But let’s be honest about what you haven’t built yet.
What Breaks When You Scale
The prototype above is fragile in predictable ways:
- No retries. If the API call fails due to a network timeout or rate limit, your agent crashes. In production, you need exponential backoff and retry logic around every model call.
- No observability. You can’t see how long tool calls take, how many tokens you’re burning, or which steps failed. Without logging and tracing, debugging production agents is guesswork.
- No cost controls. The
while Trueloop will run until the model says stop. A misbehaving agent or a pathological input could loop dozens of times before terminating — each iteration billing tokens. - No error isolation. If
calculate()raises an exception, the whole agent crashes rather than returning a graceful tool error.
The Harness Engineering Mindset
This is where harness engineering comes in — the discipline of wrapping AI agents in the infrastructure that makes them reliable, observable, and safe to operate in production. A harness isn’t the agent itself; it’s everything around the agent that makes it trustworthy:
- Retry and circuit-breaker policies
- Token budget management
- Structured logging and tracing
- Input/output validation
- Timeout enforcement
Think of it the same way you think about writing a web server. You wouldn’t ship a Flask app with no error handling, no logging, and no request timeouts. Your AI agent deserves the same engineering rigor. For a deep dive into what a production harness looks like, see our harness engineering fundamentals guide.
Next Steps
Here’s the honest progression from where you are now:
- Add error handling — Wrap
client.messages.createin a try/except, catchanthropic.APIError, and implement basic retry logic. - Add memory — Explore conversation persistence so your agent can pick up a task where it left off, not just within a single run.
- Add observability — Log every tool call with its inputs, outputs, and latency. You’ll thank yourself the first time something goes wrong at 2 AM.
- Explore multi-agent patterns — Some tasks are too complex for a single agent. Breaking work into specialized sub-agents with a coordinator is one of the most powerful patterns in agentic AI.
Your Agentic AI Learning Path: What’s Next
You’ve completed the foundation. Here’s the recommended progression to go from this five-minute prototype to engineering production-grade agentic systems.
Recommended Learning Progression
Level 1 (You are here): Single-Agent Fundamentals
– Tool calling and the perceive-decide-act-observe loop
– Prompt engineering for agents
– Basic error handling
Level 2: Reliable Single Agents
– Retry and fallback patterns
– Memory architectures (in-context, external vector store, key-value)
– Observability and cost tracking
– Structured output and output validation
Level 3: Multi-Agent Systems
– Orchestrator-worker patterns
– Parallelization and fan-out
– Inter-agent communication and state passing
– Harness patterns for coordinated agents
Level 4: Production Harness Engineering
– Deployment and scaling
– Safety and guardrails
– Evaluation and regression testing for agentic systems
– Incident response for autonomous AI
Resources on harnessengineering.academy
The full agentic AI course track builds directly on what you’ve started here:
- Tool-Calling Deep Dive — Goes beyond the basics into parallel tool calls, streaming, and error recovery.
- Memory Architectures for AI Agents — A practical guide to giving your agent durable memory.
- Multi-Agent Patterns — How to coordinate multiple agents to tackle complex tasks.
- Harness Engineering Fundamentals — The broader discipline that makes agents production-ready.
Practice Projects to Cement the Skills
The fastest way to internalize agentic patterns is to build something you actually want to use. Here are three projects at increasing difficulty:
- Research assistant — An agent with a web search tool that compiles a structured summary on any topic. Beginner-friendly, no external storage needed.
- File organizer — An agent that reads a messy directory, categorizes files by type and date, and moves them into organized subfolders. Introduces file system tools and multi-step planning.
- Code reviewer — An agent that reads a Python file, identifies potential bugs and style issues using static analysis tools, and writes a structured review. Introduces chained tool calls and structured output.
Start Building. Keep Learning.
You now know the core loop behind every AI agent ever built — perceive, decide, act, observe. The prototype you built today uses the same fundamental architecture as the agents running in enterprise workflows right now. The difference is the harness around them.
The harness is what this entire academy is about.
Ready to go deeper? Enroll in the full Agentic AI Course Track on harnessengineering.academy — a structured, hands-on curriculum that takes you from this first agent all the way to production-grade harness engineering. Every lesson includes working code, real-world examples, and the engineering judgment that separates prototype builders from production engineers.
The agents running the next generation of software are being built by people who started exactly where you are right now.
Written by Kai Renner — Senior AI/ML Engineering Leader and founder of harnessengineering.academy. Kai has spent a decade building reliable AI systems in production and writes to make agentic engineering accessible to every developer.