Running a single AI agent is manageable. Running five agents that need to share context, hand off tasks, recover from failures, and stay within budget? That’s a different challenge entirely — and it’s exactly what multi-agent orchestration is designed to solve.
This workshop uses OpenClaw, a lightweight Python orchestration framework built specifically for teams learning to coordinate multiple agents without drowning in infrastructure complexity. By the end of this guide, you’ll have a working multi-agent pipeline that handles real-world research and summarization tasks, complete with role assignment, task routing, shared memory, and graceful error handling.
No prior orchestration experience required. You’ll need Python 3.11+ and familiarity with basic agent concepts (LLM calls, tool use). Let’s build something real.
Why Multi-Agent Orchestration Matters
Before you write a single line of code, it’s worth understanding why you’d split work across multiple agents instead of giving one agent a big prompt.
Single agents hit predictable limits:
- Context window exhaustion — Long tasks overflow the context budget
- Skill dilution — One agent trying to research, write, and fact-check simultaneously does all three worse than specialized agents doing one each
- No parallelism — Sequential processing is slow when subtasks are independent
- Fragile error recovery — A single failure kills the entire pipeline
Multi-agent systems solve each of these. A supervisor agent breaks work into subtasks. Worker agents execute those subtasks in parallel. A reviewer agent validates outputs before they leave the system. When a worker fails, the supervisor reroutes — the rest of the pipeline keeps running.
This is the core pattern OpenClaw makes approachable for beginners.
What Is OpenClaw?
OpenClaw is an open-source Python framework designed for structured multi-agent coordination. Its design philosophy prioritizes explicitness over magic: every routing decision, memory write, and agent invocation is logged and inspectable.
Key concepts you’ll use in this workshop:
| Concept | What It Does |
|---|---|
Orchestrator |
Central coordinator that owns the task graph |
Agent |
A named unit of work backed by an LLM + tools |
TaskQueue |
Ordered work list the orchestrator dispatches from |
SharedMemory |
Key-value store all agents can read and write |
HarnessPolicy |
Rules governing agent behavior (budget, retries, scope) |
OpenClaw doesn’t enforce a specific LLM provider — you wire in OpenAI, Anthropic, or a local model through a simple adapter interface.
Workshop Overview
We’ll build a Research Pipeline with four specialized agents:
- Planner — Breaks a research question into subtopics
- Researcher — Fetches and summarizes information per subtopic
- Critic — Flags gaps or contradictions in the research
- Writer — Synthesizes everything into a final report
This mirrors real production architectures used for competitive intelligence, due diligence, and automated content pipelines.
Part 1: Installation and Project Setup
Start by creating a project directory and installing dependencies:
mkdir openclaw-workshop && cd openclaw-workshop
python -m venv .venv && source .venv/bin/activate
pip install openclaw openai python-dotenv
Create a .env file for your API credentials:
# .env
OPENAI_API_KEY=sk-...
OPENCLAW_LOG_LEVEL=INFO
Initialize your project structure:
mkdir -p agents tools memory
touch main.py agents/__init__.py tools/__init__.py
Your final structure will look like:
openclaw-workshop/
├── agents/
│ ├── planner.py
│ ├── researcher.py
│ ├── critic.py
│ └── writer.py
├── tools/
│ └── search.py
├── memory/
├── main.py
└── .env
Part 2: Defining Agent Roles
The Planner Agent
The Planner receives the user’s research question and produces a list of subtopics. It’s the entry point of your pipeline.
# agents/planner.py
from openclaw import Agent, AgentConfig
from openai import OpenAI
client = OpenAI()
SYSTEM_PROMPT = """You are a research planner. Given a research question,
break it into 3-5 focused subtopics that together provide comprehensive coverage.
Return a JSON list of subtopic strings. No explanation — just the list."""
def create_planner() -> Agent:
config = AgentConfig(
name="planner",
role="planning",
max_tokens=512,
temperature=0.3,
)
def run(task: str, memory: dict) -> dict:
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": f"Research question: {task}"},
],
response_format={"type": "json_object"},
)
subtopics = response.choices[0].message.content
return {"subtopics": subtopics, "original_question": task}
return Agent(config=config, run_fn=run)
Notice that the Planner writes its output as structured data. This is a key discipline in multi-agent design: agents should hand off structured data, not prose, so downstream agents don’t need to parse freeform text.
The Researcher Agent
The Researcher receives a single subtopic and returns a focused summary. In production you’d plug in a real search API; here we’ll simulate it with a stub tool to keep the workshop self-contained.
# tools/search.py
def web_search(query: str) -> str:
"""Stub: replace with SerpAPI, Tavily, or Exa in production."""
return (
f"[Simulated search results for: {query}]\n"
f"Key findings: This topic has significant recent developments in 2025-2026, "
f"with practitioners reporting improved reliability when using structured "
f"orchestration patterns. Major frameworks have converged on supervisor-worker "
f"architectures as the default production pattern."
)
# agents/researcher.py
from openclaw import Agent, AgentConfig
from openai import OpenAI
from tools.search import web_search
client = OpenAI()
SYSTEM_PROMPT = """You are a research specialist. Given a subtopic, search for
relevant information and produce a concise 150-200 word summary. Focus on
facts, specific examples, and cited patterns. Be direct."""
def create_researcher() -> Agent:
config = AgentConfig(
name="researcher",
role="research",
max_tokens=400,
temperature=0.2,
)
def run(task: str, memory: dict) -> dict:
search_results = web_search(task)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{
"role": "user",
"content": f"Subtopic: {task}\n\nSearch results:\n{search_results}",
},
],
)
summary = response.choices[0].message.content
return {"subtopic": task, "summary": summary}
return Agent(config=config, run_fn=run)
The Critic Agent
The Critic reads all researcher outputs and identifies gaps, contradictions, or low-confidence claims. This is the quality gate before writing begins.
# agents/critic.py
from openclaw import Agent, AgentConfig
from openai import OpenAI
import json
client = OpenAI()
SYSTEM_PROMPT = """You are a critical reviewer of research summaries. Given a set of
subtopic summaries, identify: (1) logical contradictions, (2) missing important angles,
(3) unsupported claims. Return a JSON object with keys: 'issues' (list of strings)
and 'confidence' (float 0-1). If confidence < 0.7, flag for human review."""
def create_critic() -> Agent:
config = AgentConfig(
name="critic",
role="review",
max_tokens=512,
temperature=0.1,
)
def run(task: str, memory: dict) -> dict:
summaries = memory.get("research_summaries", [])
combined = "\n\n".join(
f"**{s['subtopic']}**\n{s['summary']}" for s in summaries
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": f"Review these summaries:\n\n{combined}"},
],
response_format={"type": "json_object"},
)
return json.loads(response.choices[0].message.content)
return Agent(config=config, run_fn=run)
The Writer Agent
The Writer synthesizes all summaries into a polished final report, informed by the Critic’s feedback.
# agents/writer.py
from openclaw import Agent, AgentConfig
from openai import OpenAI
client = OpenAI()
SYSTEM_PROMPT = """You are a technical writer. Given research summaries and a critic's
feedback, write a well-structured report (400-600 words) with an executive summary,
key findings, and a conclusion. Use clear headings. Address any issues the critic raised."""
def create_writer() -> Agent:
config = AgentConfig(
name="writer",
role="synthesis",
max_tokens=1024,
temperature=0.5,
)
def run(task: str, memory: dict) -> dict:
summaries = memory.get("research_summaries", [])
critique = memory.get("critique", {})
combined = "\n\n".join(
f"**{s['subtopic']}**\n{s['summary']}" for s in summaries
)
issues = "\n".join(critique.get("issues", ["None identified"]))
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{
"role": "user",
"content": (
f"Original question: {task}\n\n"
f"Research summaries:\n{combined}\n\n"
f"Critic's issues to address:\n{issues}"
),
},
],
)
return {"report": response.choices[0].message.content}
return Agent(config=config, run_fn=run)
Part 3: Wiring the Orchestrator
Now that agents are defined, the Orchestrator connects them into a pipeline. This is where OpenClaw’s HarnessPolicy comes in — it governs retry behavior, budget limits, and escalation rules.
# main.py
import json
from dotenv import load_dotenv
from openclaw import Orchestrator, SharedMemory, HarnessPolicy, TaskQueue
from agents.planner import create_planner
from agents.researcher import create_researcher
from agents.critic import create_critic
from agents.writer import create_writer
load_dotenv()
def build_pipeline() -> Orchestrator:
policy = HarnessPolicy(
max_retries=2,
retry_backoff_seconds=2,
budget_limit_usd=1.00, # Hard stop at $1 for this workshop
escalate_on_low_confidence=True,
confidence_threshold=0.7,
)
memory = SharedMemory()
orchestrator = Orchestrator(
name="research-pipeline",
policy=policy,
memory=memory,
)
orchestrator.register_agent("planner", create_planner())
orchestrator.register_agent("researcher", create_researcher())
orchestrator.register_agent("critic", create_critic())
orchestrator.register_agent("writer", create_writer())
return orchestrator
def run_research(question: str) -> str:
orchestrator = build_pipeline()
memory = orchestrator.memory
# Step 1: Plan
print("Step 1/4: Planning subtopics...")
plan_result = orchestrator.invoke("planner", task=question)
subtopics = json.loads(plan_result["subtopics"])
print(f" → {len(subtopics)} subtopics identified")
# Step 2: Research (parallel dispatch)
print("Step 2/4: Researching subtopics in parallel...")
queue = TaskQueue(tasks=subtopics, agent="researcher")
research_results = orchestrator.dispatch_parallel(queue)
memory.write("research_summaries", research_results)
print(f" → {len(research_results)} summaries collected")
# Step 3: Critique
print("Step 3/4: Running quality review...")
critique = orchestrator.invoke("critic", task=question)
memory.write("critique", critique)
confidence = critique.get("confidence", 1.0)
print(f" → Confidence score: {confidence:.2f}")
if confidence < 0.7:
print(" ⚠ Low confidence — flagging for human review")
# In production, this would trigger a Slack alert or pause for review
print(" Continuing for workshop purposes...")
# Step 4: Write
print("Step 4/4: Synthesizing final report...")
write_result = orchestrator.invoke("writer", task=question)
return write_result["report"]
if __name__ == "__main__":
question = "What are the emerging best practices for AI agent harness engineering in 2026?"
print(f"\nResearch question: {question}\n{'='*60}\n")
report = run_research(question)
print("\nFINAL REPORT\n" + "="*60)
print(report)
Run it:
python main.py
You’ll see each step log to the console, with the final report printed at the end. The entire pipeline is observable, inspectable, and budget-capped.
Part 4: Handling Failures Gracefully
Real pipelines fail. Researchers time out. LLMs return malformed JSON. APIs rate-limit you. OpenClaw’s HarnessPolicy handles retries automatically, but you should also build fallback logic at the orchestration level.
Add this helper to main.py:
from openclaw.exceptions import AgentFailureError, BudgetExceededError
def safe_invoke(orchestrator, agent_name: str, task: str, fallback=None):
try:
return orchestrator.invoke(agent_name, task=task)
except AgentFailureError as e:
print(f" ✗ Agent '{agent_name}' failed after retries: {e}")
return fallback or {}
except BudgetExceededError:
print(f" ✗ Budget limit reached — stopping pipeline")
raise
Replace bare orchestrator.invoke() calls with safe_invoke(). If a Researcher fails on one subtopic, the pipeline continues with the remaining results rather than crashing entirely.
This is the difference between a fragile demo and a production-ready system.
Part 5: Inspecting Shared Memory
One of OpenClaw’s most useful debugging features is its memory inspector. After a pipeline run, you can dump the full shared memory state:
# At the end of run_research()
print("\n--- Memory Snapshot ---")
for key, value in orchestrator.memory.snapshot().items():
print(f"{key}: {str(value)[:120]}...")
This lets you trace exactly what each agent read and wrote. In production, you’d persist this snapshot to a database for post-run auditing — a core requirement for any regulated use case.
What to Build Next
You’ve completed the core workshop. Here’s how to extend this into a production system:
Add real search tools — Swap the stub web_search() function for Tavily or Exa, both of which have Python SDKs designed for agent use.
Add streaming output — OpenClaw supports streaming agent outputs so users see progress in real-time rather than waiting for the full pipeline.
Persist memory to a database — Replace SharedMemory with RedisSharedMemory or PostgresSharedMemory (available as OpenClaw extensions) so pipeline state survives restarts.
Add human-in-the-loop checkpoints — Use orchestrator.pause_for_review() at the Critic step. When confidence is low, the pipeline pauses and sends a webhook notification before proceeding.
Deploy as an API — Wrap run_research() in a FastAPI endpoint and deploy to a VPS or cloud function. The stateless design means horizontal scaling is trivial.
Key Takeaways
After this workshop, you should be able to:
- Define specialized agents with clear role boundaries and structured I/O
- Wire agents into a pipeline using an Orchestrator with shared memory
- Dispatch work in parallel using TaskQueues for independent subtasks
- Enforce behavioral guardrails through HarnessPolicy (budget, retries, escalation)
- Handle failures gracefully so one broken agent doesn’t kill the pipeline
The biggest shift in thinking from single-agent to multi-agent development is treating agents as services with contracts rather than clever prompts. The Planner promises to return a JSON list of subtopics. The Researcher promises to return a summary dict. The Critic promises to return confidence scores. When agents honor these contracts, the orchestrator can manage them reliably — and when they don’t, the harness catches it.
That’s harness engineering at its core.
Continue Learning
Ready to go deeper? These resources pair well with this workshop:
- What Is Harness Engineering? — The discipline explained from first principles
- Agent Harness Framework Comparisons — How OpenClaw compares to LangGraph, CrewAI, and AutoGen
- Becoming a Harness Engineer: Career Guide — Skills, job titles, and learning roadmap
Have questions about this workshop? Drop them in the comments below. Every pipeline you build teaches you something the docs don’t — share what you find.
Kai Renner is a senior AI/ML engineering leader and the author behind harnessengineering.academy. He writes tutorials and career guides for engineers learning to build reliable, production-grade AI agent systems.