Agents & OrchestrationHow It Works

Multi-agent orchestration: when one Claude isn't enough

In brief

Subagents, orchestrators, parallelism, and state management. The patterns that work and the ones that look good until they hit production.

8 min read·Multi-agent System

Contents

♡Sign in to save

A single Claude call handles a lot. But some tasks are too long for one context window, benefit from parallelism, or require specialized sub-tasks that should not pollute a single context. That is when you reach for multi-agent patterns.

Multi-agent does not mean "use multiple models for everything." It means decomposing a task so that multiple focused agents each do one thing well, coordinated by an orchestrator that manages the overall workflow.

The core pattern: orchestrator + subagents

The orchestrator receives the top-level task, breaks it into subtasks, dispatches subagents to handle each one, collects results, and produces a final output. Subagents are narrow: they know how to do one thing.

import anthropic
from concurrent.futures import ThreadPoolExecutor, as_completed

client = anthropic.Anthropic()

def run_subagent(task: str, context: str, system: str) -> str:
    """A focused agent that handles one subtask."""
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        system=system,
        messages=[{"role": "user", "content": f"Context:\n{context}\n\nTask: {task}"}]
    )
    return response.content[0].text


def orchestrate(user_request: str) -> str:
    # Step 1: Orchestrator plans the work
    plan_response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system="""You are a task planner. Break the user's request into 2-4 independent subtasks
that can be done in parallel. Return JSON: {"subtasks": [{"id": "1", "task": "...", "specialist": "..."}]}""",
        messages=[{"role": "user", "content": user_request}]
    )

    import json
    plan = json.loads(plan_response.content[0].text)

    # Step 2: Run subtasks in parallel
    results = {}
    with ThreadPoolExecutor(max_workers=4) as executor:
        futures = {
            executor.submit(
                run_subagent,
                subtask["task"],
                user_request,
                f"You are a specialist in {subtask['specialist']}. Be thorough and specific."
            ): subtask["id"]
            for subtask in plan["subtasks"]
        }
        for future in as_completed(futures):
            task_id = futures[future]
            results[task_id] = future.result()

    # Step 3: Synthesize
    synthesis_input = "\n\n".join([
        f"Subtask {tid}:\n{result}"
        for tid, result in sorted(results.items())
    ])

    final = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        system="Synthesize the subtask results into a coherent, complete response.",
        messages=[{"role": "user", "content": f"Original request: {user_request}\n\nSubtask results:\n{synthesis_input}"}]
    )

    return final.content[0].text

When parallelism actually helps

Parallel subagents reduce wall-clock time when subtasks are independent and each takes meaningful time. Common cases:

Research across multiple sources: agent A searches domain X, agent B searches domain Y, orchestrator combines
Processing a list in parallel: each item analyzed by its own subagent, results merged
Perspective generation: agent A argues for, agent B argues against, orchestrator synthesizes
Multi-step pipelines with independent branches: draft + fact-check + format can all run simultaneously if they operate on different inputs

Parallelism does not help when subtasks are sequential (each depends on the previous result) or when the orchestration overhead exceeds the time saved.

State management

Multi-agent systems fail at state. Each agent call is stateless — the subagent does not remember previous calls unless you explicitly pass context. You are responsible for:

Passing relevant context to each subagent. Do not assume subagents know what the orchestrator knows. Pass the original request, any relevant intermediate results, and any constraints that should shape the output.

context = {
    "original_request": user_request,
    "constraints": ["max 500 words", "cite sources", "avoid jargon"],
    "prior_results": results_so_far,  # if sequential
}

Storing intermediate results. For long-running workflows, save subagent outputs to persistent storage (database, file) rather than holding them in memory. If a subagent fails mid-workflow, you can resume from the last saved state rather than restarting from scratch.

Passing the right amount of context. Context windows are large but not infinite. If you pass every prior result to every subsequent agent, context grows until it overflows or performance degrades. Pass only what is necessary for each step.

Handling subagent failures

Subagents fail. Network errors, model errors, unexpected output formats. Your orchestrator needs to handle this without failing the whole workflow:

def run_subagent_safe(task: str, context: str, system: str, retries: int = 2) -> dict:
    for attempt in range(retries + 1):
        try:
            result = run_subagent(task, context, system)
            return {"success": True, "result": result}
        except Exception as e:
            if attempt == retries:
                return {"success": False, "error": str(e), "task": task}
    return {"success": False, "error": "Max retries exceeded", "task": task}

# In orchestrator: handle partial failures
results = {}
failed = []
for future in as_completed(futures):
    task_id = futures[future]
    outcome = future.result()
    if outcome["success"]:
        results[task_id] = outcome["result"]
    else:
        failed.append(outcome)

if failed:
    # Decide: proceed with partial results, retry failed tasks, or surface error
    pass

Human-in-the-loop checkpoints

For consequential workflows — ones that write to a database, send communications, execute financial transactions — add explicit checkpoints where a human reviews before the agent proceeds.

def checkpoint(description: str, payload: dict) -> bool:
    """Show the human what the agent is about to do. Return True to proceed."""
    print(f"\n--- CHECKPOINT ---")
    print(f"About to: {description}")
    print(f"Payload: {json.dumps(payload, indent=2)}")
    response = input("Proceed? [y/n]: ")
    return response.lower() == "y"

# In your agent loop:
if checkpoint("Send email to client", {"to": email, "subject": subject, "body": body}):
    send_email(email, subject, body)
else:
    print("Cancelled by user")

In production, checkpoints surface in a UI rather than a terminal prompt — but the pattern is the same. The agent pauses, the human reviews, the agent continues or stops.

The patterns that look good but fail

Too many agents for a simple task. Orchestration adds latency, complexity, and failure surface. If a single well-prompted Claude call can do the job, use it. Multi-agent is for tasks where the single-call approach genuinely falls short.

Agents that call each other in cycles. Agent A calls Agent B, which calls Agent C, which calls Agent A. You get a loop. Design flows as DAGs (directed acyclic graphs) — each agent's output feeds forward, never backward.

Passing full conversation history through agents. If Agent A's full 10,000-token conversation is passed as context to Agent B, your context consumption compounds quickly. Extract summaries or structured outputs, not raw histories.

No output validation between stages. If Agent A produces malformed JSON that Agent B expects to parse, Agent B fails and the error is hard to trace. Validate outputs at each handoff point.

Choosing the right model per agent

You do not need to use the same model for every agent. Use the most capable model where quality matters most (the synthesis step, complex reasoning) and cheaper/faster models where precision is less critical (classification, routing, simple extraction):

ORCHESTRATOR_MODEL = "claude-sonnet-4-6"   # planning and synthesis
SUBAGENT_MODEL = "claude-haiku-4-5-20251001"  # fast, cheap parallel workers
JUDGE_MODEL = "claude-haiku-4-5-20251001"     # eval/validation

This can cut cost 60-80% on parallelized workloads with no meaningful quality loss on the subtasks themselves.