Agent Design Patterns: A Developer's Guide to Building Better AI Agents

You have read about ReAct, Plan-and-Execute, multi-agent orchestration, routing, reflection, and a dozen other agent design patterns. Every framework tutorial presents its pattern as the obvious choice. Every architecture blog makes the selected approach look inevitable. And you still do not know which pattern to use for the system you need to build next week.

This is pattern paralysis, and it affects most developers entering the agent space. The fix is not learning more patterns. It is understanding which pattern fits which problem, and more importantly, knowing when a simpler pattern is the right answer.

Anthropic’s own guidance makes this clear: “Start with simple prompts, optimize them with comprehensive evaluation, and add multi-step agentic systems only when simpler solutions fall short.” The company that builds Claude recommends simplicity first. That should tell you something about the state of complex agent architectures.

This guide covers eight agent design patterns organized as a progression from simple to advanced. For each pattern, you will learn what it does, when to use it, and where it breaks. At the end, a decision framework helps you choose the right pattern for your specific project. No framework allegiance. No hype. Practical guidance for developers building real systems.

Interactive Concept Map

Click any node to expand or collapse. Use the controls to zoom, fit to view, or go fullscreen.

agent design patterns infographic — Visual overview of agent design patterns. Click to enlarge.

What Are Agent Design Patterns and Why They Matter

An agent design pattern is a reusable architectural blueprint for building AI systems that can reason, use tools, and take actions. Patterns solve the problem of designing agent behavior from scratch every time by providing tested structures for common problems.

Before diving into specific patterns, you need to understand a critical distinction that Anthropic draws between two types of agentic systems.

Workflows are systems where LLMs and tools are orchestrated through predefined code paths. You, the developer, decide the execution order. The LLM fills in the content at each step, but the control flow is deterministic. Think of a pipeline: step 1 always leads to step 2.

Agents are systems where the LLM dynamically directs its own processes and tool usage. The model decides what to do next based on the current state. The control flow is non-deterministic. Think of a problem solver: the next step depends on what the previous step revealed.

This distinction matters because most “agent” applications are better served by workflows. A customer support system that follows a known troubleshooting tree does not need a reasoning agent. It needs a workflow with LLM-powered steps. True agents are warranted when you cannot predict the execution path in advance, when the problem is genuinely open-ended.

Understanding what harness engineering is and how it relates to these patterns helps you see where each pattern fits in the broader infrastructure picture.

The Pattern Progression: Simple to Advanced

The biggest mistake developers make with agent design patterns is treating them as a menu of equal options. They are not. They are a progression, and the right approach is to start at the lowest level that solves your problem.

Level 1: Single-Agent Foundations — Augmented LLM, Prompt Chaining, Routing. These handle 70-80% of use cases and should be your default starting point.

Level 2: Reasoning Agents — ReAct, Plan-and-Execute, Reflection. For tasks that require dynamic decision-making or iterative improvement. Use these when Level 1 patterns cannot handle the task complexity.

Level 3: Multi-Agent Systems — Orchestrator-Workers, Sequential Pipeline, Parallel Fan-Out. For tasks requiring multiple specialized capabilities. The complexity cost is real; multi-agent systems consume roughly 15 times the tokens of single-agent systems.

Level 4: Advanced Compositions — Evaluator-Optimizer, Hierarchical Decomposition, Composite Patterns. For complex production systems that combine multiple patterns. This is where most real-world systems end up, but you should arrive here through proven need, not ambition.

Each level up the progression adds latency, cost, and debugging complexity. Climb only when you have measured evidence that the current level is insufficient.

Level	Patterns	Best For	Token Cost	Complexity
1	Augmented LLM, Chaining, Routing	Predictable tasks	Low (1x)	Low
2	ReAct, Plan-Execute, Reflection	Dynamic reasoning	Medium (2-5x)	Medium
3	Orchestrator, Pipeline, Parallel	Multi-specialization	High (10-15x)	High
4	Evaluator-Optimizer, Composite	Complex production	Very High (15x+)	Very High

Level 1 Patterns: Single-Agent Foundations

These patterns form the foundation of every agent system. Even if your final architecture is more complex, you will use these patterns as building blocks.

Augmented LLM

The simplest agent design pattern: a language model enhanced with retrieval, tools, and memory. This is not a “real agent” in the academic sense, but it solves an enormous range of practical problems.

How it works: The LLM receives a user query along with retrieved context (from a vector database or search), access to defined tools (APIs, databases, calculators), and optionally, memory of prior interactions. The model generates a single response using these augmentations.

When to use: Start here. If your task can be solved with one LLM call plus context and tool access, this pattern is sufficient. Examples include question-answering with domain knowledge, single-step data lookups, and content generation with factual grounding.

Where this breaks: Tasks that require multiple sequential decisions, iterative refinement, or branching logic based on intermediate results. When you find yourself wanting to call the LLM repeatedly in a loop to handle a single user request, you have outgrown this pattern.

Anthropic’s team shared a revealing detail about building their SWE-bench coding agent: they spent more time optimizing their tools than their overall prompt. Tool quality matters more than prompt cleverness, even at Level 1.

Prompt Chaining

A sequence of LLM calls where each step’s output feeds into the next step’s input, with optional validation gates between steps.

How it works: You decompose a complex task into a series of simpler sub-tasks. Step 1 generates an outline. Step 2 expands each section. Step 3 edits for consistency. Between each step, programmatic checks validate the output before passing it forward. If a check fails, the step retries or the chain halts.

When to use: When your task has a predictable multi-step structure but each step benefits from LLM generation. Document processing pipelines, structured content creation, and data transformation workflows all fit this pattern well.

Where this breaks: When the number or order of steps cannot be determined in advance. If you need the system to decide what to do next based on what it discovered in the previous step, you need a reasoning agent, not a chain.

Routing

An initial classification step that directs the input to one of several specialized handlers.

How it works: The first LLM call classifies the user’s intent or the input’s type. Based on the classification, the system routes to a specialized prompt, tool set, or sub-workflow optimized for that category. Each route can use different system prompts, different tools, or even different models.

When to use: When your system handles multiple distinct task types. A support agent that handles billing questions differently from technical issues. A coding assistant that routes bug reports to one workflow and feature requests to another.

Where this breaks: When categories overlap significantly or when the routing decision itself requires multi-step reasoning. LLM-based routing achieves 95% or higher accuracy on well-defined categories, but the remaining 5% are genuinely ambiguous requests that may need clarification rather than forced routing.

Level 2 Patterns: Reasoning Agents

When your task requires dynamic decision-making, these patterns give the agent the ability to reason about its next action based on what it has observed so far.

ReAct (Reason and Act)

The most widely known agent design pattern. The agent operates in an iterative loop: observe the current state, reason about what to do next, take an action (usually a tool call), observe the result, and repeat until the task is complete.

How it works: The agent receives a task and enters a cycle. At each step, it generates a “thought” (reasoning about the current state and what to do next), then an “action” (a specific tool call or response). The action produces an “observation” (the tool’s result), and the cycle continues. The agent decides when to stop and produce a final answer.

When to use: Tasks where the path to the solution is not known in advance. Research tasks where the next search depends on what the previous search found. Debugging sessions where the fix depends on what the diagnostic revealed. Exploratory analysis where each finding shapes the next investigation.

Where this breaks: ReAct consumes additional tokens at every reasoning cycle, increasing both latency and cost. The sequential nature means each step waits for the previous one. Error propagation is a risk, since a bad reasoning step early in the chain can send the agent down a wrong path that compounds through subsequent steps. For tasks with predictable structure, ReAct is overkill.

Plan-and-Execute

The agent generates a complete plan upfront, then executes each step sequentially. If a step fails or reveals new information, the agent can optionally re-plan.

How it works: The planning phase uses one LLM call to decompose the task into an ordered list of concrete steps. The execution phase works through each step, using tools and generating outputs. A re-planning trigger fires if execution deviates significantly from expectations.

When to use: Tasks where the overall structure is predictable but the details require LLM generation. Multi-step research projects, complex data analysis, code generation across multiple files, and structured report creation.

Where this breaks: Dynamic environments where the plan becomes obsolete quickly. If each step’s outcome fundamentally changes what the next step should be, Plan-and-Execute spends tokens on plans it will discard. In these cases, ReAct’s step-by-step adaptation outperforms.

ReAct vs Plan-and-Execute: the decision is simpler than most articles make it. If the path to the solution is mostly predictable, use Plan-and-Execute; it uses fewer tokens on multi-step tasks and runs faster. If the path is genuinely unknown and each step depends on what the previous step discovered, use ReAct. A hybrid approach works for many real systems: Plan-and-Execute for the high-level strategy, ReAct for individual steps that require exploration.

Reflection and Self-Critique

The agent reviews its own output before finalizing it, generating feedback and revising iteratively.

How it works: After generating an initial output, a separate LLM call evaluates the output against defined criteria (accuracy, completeness, style, technical correctness). The evaluation produces specific feedback. A third call revises the output based on the feedback. This cycle can repeat.

When to use: Quality-critical outputs where the cost of revision is lower than the cost of a bad first draft. Code generation, technical documentation, and any task where automated evaluation criteria exist.

Where this breaks: Diminishing returns. The first revision cycle captures most improvements. The second catches a few more. By the third cycle, the agent often introduces new issues while fixing old ones. Production systems typically cap reflection at 2 cycles.

Level 3 Patterns: Multi-Agent Systems

Multi-agent design patterns distribute work across multiple specialized agents. They are powerful for complex problems but carry significant overhead.

Before reaching for multi-agent patterns, internalize this: data from production deployments shows that multi-agent systems consume roughly 15 times the tokens of single-agent approaches. Each agent in the chain typically receives the full conversation history from prior agents. Most production systems that use multi-agent architectures limit themselves to two levels of hierarchy, not because of a design constraint, but because deeper hierarchies add cost and latency without proportional quality improvements.

Orchestrator-Workers

A central orchestrator agent receives the task, decomposes it, and delegates sub-tasks to specialized worker agents. The orchestrator synthesizes the results.

How it works: The orchestrator analyzes the incoming task and decides which workers to invoke. Each worker has a specialized system prompt, tool set, and focus area. Workers execute independently and return results to the orchestrator, which combines them into a final output.

When to use: Tasks that clearly decompose into independent sub-tasks requiring different specializations. A PR review system that needs style checking, security auditing, and performance analysis in parallel. A research system that needs data gathering, analysis, and report writing as distinct capabilities.

Where this breaks: When sub-tasks are not truly independent. If Worker B needs the output of Worker A before it can start, you have a sequential dependency disguised as parallel work. The orchestrator also becomes a single point of failure and a latency bottleneck, since every request passes through it.

Sequential Pipeline

Agents arranged in a fixed order, like an assembly line. Each agent processes the output of the previous one.

How it works: Agent 1 processes the raw input and passes structured output to Agent 2. Agent 2 transforms or enriches the data and passes it to Agent 3. The pipeline is deterministic: the order is fixed and every input passes through every agent.

When to use: Document processing workflows where each stage adds a specific transformation. Content pipelines (generate, fact-check, edit, format). Data analysis chains where each agent handles a different analysis type.

Where this breaks: Inflexibility. You cannot skip unnecessary steps. If Agent 3 does not need to modify the output, it still processes it, burning tokens for no value. Errors compound: a mistake in Agent 1 propagates through the entire chain.

Parallel Fan-Out and Gather

Multiple agents work simultaneously on different aspects of the same task. A synthesis step combines their outputs.

How it works: A dispatcher sends the task (or different aspects of it) to multiple agents simultaneously. Each agent works independently. A gathering step collects all results and synthesizes them into a coherent output.

When to use: When the task has multiple independent dimensions that benefit from specialized handling. Market analysis that needs financial, competitive, and technical perspectives simultaneously. Quality assurance that runs style, accuracy, and compliance checks in parallel.

Where this breaks: Latency depends on the slowest agent. Synthesis is the hard part: combining outputs from multiple agents into a coherent, non-contradictory result often requires its own LLM call and can introduce new errors. Cost is higher than sequential approaches because all agents run regardless of whether their output is needed.

Level 4 Patterns: Advanced Compositions

In production, pure patterns are rare. Real systems combine patterns based on the specific requirements of each component.

Evaluator-Optimizer

One agent generates content while another evaluates it. The evaluation feeds back to the generator for iterative improvement.

When to use: Code generation with automated testing (the evaluator runs tests), content creation with quality standards, and any domain where evaluation criteria can be automated. This is the pattern behind most AI coding assistants.

Where this breaks: Requires reliable, automated evaluation. If the evaluator itself produces inconsistent judgments, the optimization loop can oscillate rather than converge.

Composite Patterns

This is the real-world default. A production system might use routing at the entry point, orchestrator-workers for complex tasks, Plan-and-Execute within each worker, and reflection before returning results. Google’s documentation calls this the composite pattern and acknowledges it as the most common production architecture.

The key to successful composition: each component should be independently testable. If you cannot test your routing logic without spinning up the entire multi-agent system, your architecture is too coupled.

The Pattern Decision Framework

When choosing agent design patterns for a new project, follow this decision tree.

Question 1: Can the task be solved with a single LLM call plus tools?
Yes: Use Augmented LLM. Stop here. Do not add complexity.

Question 2: Is the multi-step execution path predictable?
Yes: Use Prompt Chaining (2-5 steps) or Plan-and-Execute (5+ steps with possible re-planning).
No: Continue.

Question 3: Does the task handle multiple distinct input types?
Yes: Add Routing at the entry point. Each route can use a different downstream pattern.

Question 4: Does the task require exploratory reasoning?
Yes: Use ReAct. The agent decides what to do next based on observations.
No: Revisit Plan-and-Execute with stricter step definitions.

Question 5: Does the task require multiple distinct specializations?
No: Stay with single-agent patterns. A single agent with multiple tools often outperforms multiple agents.
Yes: Use Orchestrator-Workers if sub-tasks are independent; Sequential Pipeline if they depend on each other.

Question 6: Is output quality critical enough to justify additional cost?
Yes: Add Reflection/Self-Critique as a final step before returning results.

The guiding principle: start with the simplest pattern that could work, instrument it thoroughly, and let production data guide your evolution to more complex patterns. The agent that ships beats the perfect architecture that never deploys.

Common Mistakes and How to Avoid Them

Starting with multi-agent when single-agent works. The most common mistake. Developers reach for multi-agent orchestration because it feels architecturally sophisticated. But a single well-prompted agent with good tools often outperforms a multi-agent system on the same task, at a fraction of the cost. Start at Level 1. Prove you need Level 3.

Not defining explicit completion criteria. Agents that do not know when to stop keep generating, asking follow-up questions, or offering unsolicited help after the task is done. Production data shows that adding explicit completion markers (the agent outputs a structured signal when the task is finished) reduced conversation turns from 18 to 8-12 and improved task completion rates from 73% to 94%.

Ignoring context contamination. In systems that handle multiple tasks per session, context from one task bleeds into the next. A customer service agent that discussed billing in the previous turn brings billing context into a technical support question. Isolate context by task, not by conversation. Each new task should receive only task-relevant information.

Choosing patterns based on demos. A pattern that looks impressive in a demo with curated inputs may fail with real user data. Evaluate patterns against your actual use cases, edge cases, and failure modes, not conference presentations.

Using frameworks before understanding patterns. Frameworks are valuable, but they add abstraction layers that make debugging harder. Anthropic recommends starting with raw LLM API calls: “many patterns can be implemented in a few lines of code.” Understand the pattern first. Add the framework when you need its specific capabilities (state persistence, built-in tracing, managed checkpoints).

Frequently Asked Questions

Which agent design pattern should I start with?

Start with the Augmented LLM pattern: a language model enhanced with retrieval, tools, and memory. This handles the majority of practical use cases with the lowest complexity. Only move to more complex patterns when you have measured evidence that the Augmented LLM approach is insufficient for your specific task.

When should I use ReAct vs Plan-and-Execute?

Use Plan-and-Execute when the overall task structure is predictable and steps can be planned in advance. It uses fewer tokens and runs faster. Use ReAct when the path to the solution is genuinely unknown and each step depends on discoveries from the previous step. For many systems, a hybrid approach works: Plan-and-Execute for the high-level strategy, ReAct for individual steps that require exploration.

How many agents do I actually need for my project?

Probably fewer than you think. Most production systems use one to three agents. Data from production deployments shows that systems with more than two levels of hierarchy rarely justify the additional complexity. Each agent multiplies token consumption, latency, and debugging surface. Start with one agent and add more only when you can quantify what the additional agent buys you.

Should I use a framework or build agent patterns from scratch?

Start with raw API calls to understand the pattern. Anthropic’s recommendation: “many patterns can be implemented in a few lines of code.” Once you need specific capabilities, such as managed state persistence, distributed tracing, or built-in checkpoint-resume, frameworks like LangGraph, CrewAI, or the Claude Agent SDK earn their complexity. The framework should solve a measured problem, not a hypothetical one.

What is the most common agent design pattern in production?

The composite pattern. Real production systems combine routing at the entry point, single-agent processing for simple tasks, and orchestrator-workers for complex ones, often with reflection as a quality gate. Pure single-pattern architectures exist mainly in tutorials. Production systems evolve toward pattern combinations guided by operational data.

Building Better Agents Starts with Better Pattern Choices

Agent design patterns are not a competition where more complex wins. They are engineering tools, and the right tool depends on the job. The progression from Augmented LLM through multi-agent orchestration exists because problems vary in complexity, and your architecture should match your problem, not exceed it.

Three things to do before your next agent project:

Map your task to the decision framework. Walk through the six questions above with your specific use case. Most developers discover their task fits a Level 1 or Level 2 pattern.
Build the simplest version first. Implement the pattern the decision framework suggests. Instrument it with task completion rate, token consumption, and latency metrics. Run it against real inputs.
Let data drive the next step. If the simple pattern hits a wall, the metrics will show you exactly where. That targeted evidence tells you which Level 3 or Level 4 pattern addresses the specific gap, rather than guessing at architecture upfront.

The developers building the most reliable agents are not the ones using the most complex patterns. They are the ones who chose the right pattern for their specific problem and invested their engineering effort in the harness layer: tool quality, observability, cost controls, and evaluation pipelines.

For tutorials, career guides, and hands-on learning paths in harness engineering, subscribe to the harnessengineering.academy newsletter.

Agent Design Patterns: A Developer’s Guide to Building Better AI Agents