The most common mistake in multi-agent system design is building a multi-agent system in the first place. Most agent tasks don’t need multiple agents. A single agent with well-chosen tools handles the majority of use cases more simply, more cheaply, and with fewer failure modes.
But there are tasks where a single agent genuinely isn’t enough: when the task spans distinct domains that require specialized knowledge, when the context window can’t hold everything a single agent needs, when parallel execution matters for latency, or when different steps require fundamentally different capabilities.
This guide covers the four multi-agent design patterns, when each one is the right choice, and the engineering decisions that determine whether your multi-agent system works reliably or fails expensively.
Interactive Concept Map
Click any node to expand or collapse. Use the controls to zoom, fit to view, or go fullscreen.
Before you build a multi-agent system
Start with a single agent. Add tools before you add agents. Graduate to multi-agent patterns only when you hit clear limits.
Three signals that you’ve hit those limits:
Tool overload. Your single agent has 15+ tools and consistently picks the wrong one. When the model’s tool selection degrades because it has too many options, splitting tools across specialized agents improves accuracy.
Context overflow. The task requires more information than fits in one context window. Each domain needs extensive context (system prompts, few-shot examples, domain knowledge) that competes for space. Splitting into specialized agents gives each one a focused context window.
Parallelization needs. Steps that could run simultaneously are running sequentially because a single agent processes one step at a time. Multiple agents can execute independent subtasks in parallel, reducing total latency.
If none of these signals apply, stay with a single agent. The complexity cost of multi-agent coordination is real: more failure points, harder debugging, higher token consumption, and more infrastructure to maintain.
Pattern 1: Subagents (centralized orchestration)
The subagent pattern uses a supervisor agent that receives the user’s request, decomposes it into subtasks, delegates each subtask to a specialized subagent, collects results, and synthesizes a final response.
How it works
The supervisor agent treats each subagent as a tool. It calls the research subagent with a research query, the analysis subagent with data to analyze, and the writing subagent with content to produce. Each subagent operates independently with its own system prompt, tools, and context window. Results flow back to the supervisor for synthesis.
When to use subagents
This pattern works best when the task naturally decomposes into independent subtasks that can run in parallel. A research task that needs information from three different domains (academic papers, industry reports, and social media) is a natural fit: three subagents search in parallel, and the supervisor synthesizes their findings.
Trade-offs
Pros:
– Parallel execution across independent subtasks
– Each subagent has a focused, clean context window
– Subagents can use different models optimized for their task
– Easy to add new capabilities by adding new subagents
Cons:
– Every subtask result passes through the supervisor, adding an extra model call
– The supervisor becomes a single point of failure
– Debugging requires tracing through multiple agent contexts
– Token consumption scales roughly linearly with the number of subagents
Implementation considerations
The supervisor’s system prompt is critical. It needs clear instructions about which subagent handles which type of task, how to decompose complex requests, and when to synthesize results versus when to ask follow-up questions. A poorly prompted supervisor will either delegate everything to one subagent or over-decompose simple tasks into unnecessary subtasks.
Verify subagent outputs before the supervisor synthesizes them. A subagent that returns an error or low-quality result should trigger a retry or fallback, not get silently included in the final synthesis.
Pattern 2: Handoffs (sequential state transitions)
The handoff pattern passes control between specialized agents in sequence. Each agent handles one phase of the workflow, then hands off to the next agent with the accumulated state.
How it works
Agent A handles the initial intake and information gathering. When it has enough information, it hands off to Agent B for analysis. Agent B completes its analysis and hands off to Agent C for output generation. Each handoff transfers the conversation state and any intermediate results.
When to use handoffs
This pattern fits workflows with natural phases where each phase requires different expertise. Customer support escalation is the classic example: a triage agent gathers initial information, a diagnostic agent investigates the issue, and a resolution agent provides the solution. Each agent specializes in its phase.
Trade-offs
Pros:
– Clean separation of concerns between workflow phases
– Each agent can maintain conversational context within its phase
– Natural for workflows that humans already process in stages
– Easy to add or modify individual phases without affecting others
Cons:
– Sequential execution means no parallelization; total latency is the sum of all phases
– State transfer between agents can lose context if not carefully managed
– Debugging requires understanding the full handoff chain
– Not suitable for tasks where phases aren’t clearly defined
Implementation considerations
The handoff protocol matters more than the individual agents. Define exactly what information transfers between agents, in what format, and what constitutes a valid handoff. An agent that hands off prematurely (before gathering sufficient information) creates cascading failures in downstream agents.
Include a “return” capability. Sometimes a downstream agent discovers that the upstream agent missed something. The ability to hand control back with a specific request (“I need the user’s account number, which wasn’t collected during intake”) prevents dead ends.
Pattern 3: Router (parallel dispatch and synthesis)
The router pattern classifies the incoming request, dispatches it to the appropriate specialized agent (or multiple agents in parallel), and synthesizes the results.
How it works
A lightweight router (often just a classifier, not a full agent) examines the request and determines which specialist agents should handle it. For a customer support system, the router might classify the request as billing, technical, or account management and route accordingly. For complex requests that span categories, the router dispatches to multiple specialists in parallel and merges their responses.
When to use routers
This pattern excels when requests fall into distinct categories with specialized handlers. Enterprise knowledge bases that span departments (HR policies, IT documentation, legal guidelines) are a natural fit: the router identifies the relevant department and dispatches to the specialist who has that department’s knowledge loaded.
Trade-offs
Pros:
– Low latency for single-category requests (one classification step + one specialist)
– Parallel dispatch for multi-category requests
– Specialists can be independently developed and maintained
– The router can be a simple, fast classifier rather than a full LLM
Cons:
– Misclassification sends the request to the wrong specialist
– Multi-category requests require result merging, which can produce inconsistent outputs
– Stateless design means each request starts fresh (no conversation memory without additional infrastructure)
– Adding new categories requires updating the router’s classification logic
Implementation considerations
The router’s classification accuracy determines system quality. A router that sends 10% of billing questions to the technical support agent will frustrate users. Invest in the router’s classification logic proportionally to the cost of misrouting.
For multi-category requests, define a merging strategy. Does one specialist’s answer take priority? Are answers concatenated? Does a synthesis agent review all specialist responses and produce a unified answer? The merging strategy affects both quality and cost.
Pattern 4: Supervisor with evaluation (hierarchical orchestration)
The supervisor with evaluation pattern extends the basic subagent pattern by adding quality evaluation after each subagent completes its work. The supervisor doesn’t just delegate and collect; it evaluates, requests revisions, and only accepts results that meet quality criteria.
How it works
The supervisor decomposes the task and delegates to subagents. When a subagent returns results, the supervisor (or a dedicated evaluator agent) checks the quality against defined criteria. If the result doesn’t meet standards, the supervisor provides feedback and requests a revision. This create-evaluate-revise loop continues until the result passes or a retry limit is reached.
When to use supervisor with evaluation
This pattern is essential when output quality varies and the cost of low-quality output is high. Content generation pipelines, code generation systems, and research synthesis all benefit from evaluation loops. The evaluation agent catches hallucinations, logical errors, and quality issues that would otherwise reach the user.
Trade-offs
Pros:
– Significantly higher output quality due to evaluation loops
– Catches errors that single-pass generation misses
– The evaluator can use different criteria for different output types
– Quality improves without changing the generation agent’s prompts
Cons:
– Higher latency from evaluation and revision cycles
– Higher token consumption (2-4x per evaluated output)
– The evaluator needs clear, specific quality criteria to be effective
– Risk of infinite revision loops if the evaluator’s standards are unreachable
Implementation considerations
Set a maximum revision count (typically 2-3 revisions). If the subagent can’t produce acceptable output after the maximum revisions, escalate to a human or return the best attempt with a quality warning. Unbounded revision loops are a common failure mode.
The evaluator’s criteria must be specific and measurable. “Is this good?” is not a useful evaluation prompt. “Does this summary accurately reflect the source material? Does it address all three questions from the original request? Is it between 200 and 300 words?” produces consistent, useful evaluations.
Decision framework: Choosing the right pattern
| Your situation | Best pattern | Why |
|---|---|---|
| Independent subtasks that can run in parallel | Subagents | Parallel execution, focused context windows |
| Workflow with distinct sequential phases | Handoffs | Clean phase separation, natural state progression |
| Requests that fall into distinct categories | Router | Fast classification, specialist knowledge isolation |
| Quality-critical output that needs review | Supervisor + evaluation | Built-in quality assurance through revision loops |
| Tasks spanning domains AND requiring quality | Subagents + evaluation | Combines parallel execution with quality gates |
Most production systems combine patterns. A common architecture uses a router at the top level to classify requests, subagents for parallel research within each category, and evaluation for quality-critical outputs. Don’t feel constrained to a single pattern.
The cost of multi-agent systems
Token consumption in multi-agent systems scales significantly compared to single-agent systems. A single agent that completes a task in 5,000 tokens might consume 50,000-75,000 tokens as a multi-agent system handling the same task.
The multipliers come from three sources:
Context duplication. Each agent needs its own system prompt and relevant context, even if some of that context overlaps with other agents.
Coordination overhead. Every delegation, result collection, and synthesis step is an additional model call with its own token consumption.
Evaluation loops. If you add quality evaluation, each revision cycle doubles the generation cost for that component.
Budget for 10-15x token consumption compared to a single-agent baseline when planning multi-agent deployments. Monitor actual consumption closely in production, because cost overruns from multi-agent systems are one of the most common reasons for project cancellation.
Debugging multi-agent systems
Multi-agent systems are harder to debug than single-agent systems because failures can originate in any agent, in the communication between agents, or in the orchestration logic itself.
Essential debugging practices
Trace every inter-agent communication. Log every message passed between agents with timestamps, sender, receiver, and content. When the final output is wrong, the trace shows you where the error was introduced.
Test agents in isolation first. Before testing the full system, verify each agent produces correct output for its specific task type. An agent that fails in isolation will definitely fail in orchestration.
Add circuit breakers. If an agent fails repeatedly (3+ consecutive failures), stop routing to it and use a fallback rather than letting failures cascade through the system.
Monitor per-agent metrics. Track success rate, latency, and token consumption per agent, not just system-wide. An agent with degrading performance will drag down the entire system if you’re not monitoring at the agent level.
For the foundational single-agent patterns that multi-agent systems build on, read our agent design patterns guide. For how memory works across agents, see our agent memory patterns guide.
Frequently asked questions
How many agents should a multi-agent system have?
Start with the minimum that addresses your specific bottleneck. Two or three specialized agents plus a supervisor is a common starting point. Systems with more than 8-10 agents become difficult to debug and expensive to run. If you need that many specializations, consider whether some can be implemented as tools rather than agents.
Can different agents in the system use different models?
Yes, and this is one of the key advantages of multi-agent systems. Use a capable, expensive model (Claude, GPT-4) for complex reasoning tasks and a smaller, cheaper model for classification, extraction, or simple tool use. This optimization can reduce total cost by 50-70% compared to using the most capable model everywhere.
How do I handle failures in multi-agent systems?
Implement three layers of failure handling: retry at the agent level (retry the failed tool call or model call), fallback at the pattern level (route to an alternative agent if the primary fails), and graceful degradation at the system level (return partial results if some agents fail). Never let a single agent failure crash the entire system.
What is the biggest risk in multi-agent systems?
Uncontrolled cost. Multi-agent systems consume 10-15x more tokens than single-agent systems, and a stuck loop in any agent can burn through budgets quickly. Implement per-agent and system-wide token budgets, step limits, and time limits before deploying to production.
For the complete picture on building production-ready harness infrastructure around multi-agent systems, read our harness engineering introduction. For verification techniques specific to multi-agent outputs, see our agent verification guide.
Subscribe to the newsletter for weekly tutorials on agent architecture, design patterns, and production engineering.