Claude APIupdate

Claude Managed Agents update: multiagent sessions, outcomes, and dreaming

In brief

On May 6, 2026 Anthropic shipped three new capabilities for Claude Managed Agents: multiagent sessions (public beta), outcomes (public beta), and dreaming (research preview). Multiagent lets a lead agent delegate to specialist subagents on a shared filesystem; outcomes turns a rubric into a self-correction loop; dreaming lets an agent review its past sessions overnight and curate its memory.

9 min read·AI Agent

Contents

♡Sign in to save

Claude Managed Agents shipped at GA on April 8, 2026. Memory followed on April 23. The May 6, 2026 release adds three features that turn the product from a single-agent harness with state into something closer to a coordinated team.

All three sit under the same beta header — managed-agents-2026-04-01 — that you already use for the rest of the API.

Multiagent sessions (public beta)

A multiagent session has a lead agent and one or more specialist subagents. The lead delegates work; the subagents run in parallel; everything writes to a shared filesystem and a shared event log.

The shape, in pseudo-API terms:

POST /v1/managed-agents/sessions
{
  "agent_id": "agent_lead_xxx",
  "subagents": [
    { "agent_id": "agent_research_xxx", "name": "research" },
    { "agent_id": "agent_writer_xxx",   "name": "writer"   },
    { "agent_id": "agent_reviewer_xxx", "name": "reviewer" }
  ],
  "shared_filesystem": true,
  "memory_stores": ["mem_xxx"]
}

The lead receives the user request. It decides which subagent to call, what filesystem path each one writes to, and how the outputs feed each other. Each subagent gets its own context window and tool set. The shared filesystem (mounted at /mnt/shared/) is the integration surface.

Three details that matter:

Different models per agent. A lead on Opus 4.7 can dispatch to a Haiku 4.5 specialist for a fast, cheap subtask. The token bill is per-agent.

Persistent event memory. Every session writes to an event stream. A subagent that ran an hour ago can be inspected — what it read, what it wrote, what tool it called. Useful for debugging and for the agent itself to read back what its peers did.

Console visibility. The Claude Console shows the full agent tree per session: which agent fired what, in what order, with what rationale. This is the difference between "an agent did something I can't explain" and "I can see exactly where the run forked."

When this matters: you have a workflow that is too long for one context window, or that benefits from specialization (research vs. writing vs. review) more than from monolithic intelligence. When it doesn't: a single agent with memory and tools is enough. Don't reach for multiagent for tasks Opus 4.7 alone can finish in one session.

Outcomes (public beta)

Outcomes is the API surface for rubric-graded agents. You define what success looks like; a separate grader scores the agent's output; if the score is below threshold, the agent retries with the grader's notes.

POST /v1/managed-agents/outcomes
{
  "name": "deck_quality",
  "rubric": [
    { "criterion": "Each slide has one clear claim",        "weight": 0.3 },
    { "criterion": "Numbers are sourced from the brief",    "weight": 0.4 },
    { "criterion": "Tone is appropriate for executive aud.", "weight": 0.3 }
  ],
  "passing_score": 0.80,
  "max_attempts": 3
}

You attach the outcome to a session. The agent runs; the grader evaluates the output in its own context window (so the grader doesn't share the agent's blind spots); if the score is below the threshold, the grader's structured feedback is fed back into the next attempt.

Anthropic reports +8.4% on .docx generation success and +10.1% on .pptx generation success against their internal benchmark, with up to 10 points of improvement on more complex tasks. Concrete and modest — not the headline number a vendor would lead with if they were inflating it.

When this matters: the task has a clear quality definition (a brief, a spec, a contract) and the cost of a wrong output is higher than a few extra grader calls. When it doesn't: open-ended creative work where the rubric is the hard part.

A practical note: the grader is a separate billed call. For high-volume jobs, profile the cost. A 3-attempt rubric with a Sonnet 4.6 grader on a Haiku 4.5 worker can still be cheaper than a single Opus 4.7 call — but you have to measure.

Dreaming (research preview)

Dreaming is the most speculative of the three. Per the announcement: "a scheduled process that reviews your agent sessions and memory stores, extracts patterns, and curates memories so your agents improve over time."

What this means in practice: between sessions (typically overnight), the dreaming process reads your agent's session history and memory stores. It pulls out patterns — recurring user preferences, converged workflows, recurring mistakes — and writes them back to the memory stores. Next session, the agent has a higher-signal memory to read from.

This is research preview, not public beta. You request access; not everyone gets it. The interface, the cadence, and the reliability characteristics are likely to change.

Two reasons to care anyway:

It is a different memory model. Existing memory (covered here) is write-during-session. Dreaming is curate-between-sessions. They compose: an agent writes facts during the day; dreaming compresses them at night; the compressed version is what next-day's agent reads.

It ships in the standard product surface. Unlike a separate "long-term learning" SDK, this is just another control on the existing memory store. Once it leaves research preview, it should be a small change for teams already using memory.

Two honest cautions: this is the feature most likely to change shape between research preview and public beta, and the value depends on whether your sessions actually have patterns to extract. A single-purpose agent that does one task the same way every time has nothing to dream about.

How to think about all three together

A practical mental model:

Memory is per-agent state.
Outcomes is per-task quality control.
Dreaming is per-agent learning over time.
Multiagent is task decomposition across agents.

Most production agent setups don't need all four. The progression most teams will follow:

Single agent with tools — start here.
Add memory when the agent forgets things between sessions that the user shouldn't have to repeat.
Add outcomes when output quality on long-tail edge cases drops below an acceptable bar.
Add multiagent when one context window or one model class is genuinely the constraint.
Wait on dreaming until it leaves research preview.

The features compose well because they share the same primitives — sessions, events, memory stores, and the existing beta header. You don't need a parallel SDK to use them.

Claude Managed Agents — the original April 8 launch
Claude Managed Agents memory — the memory store API this update builds on
Multi-agent orchestration basics — the conceptual framing for when multiagent helps
Multi-agent failure handling — what to plan for when subagents break

Source: New in Claude Managed Agents: dreaming, outcomes, and multiagent orchestration, Anthropic blog, May 6, 2026. Public beta features available under the managed-agents-2026-04-01 beta header.

Related tools

Claude vs GPT-4

Side-by-side comparison across code quality, context, debugging, and cost.

See comparison →

Claude API Cost Calculator

Estimate your monthly spend by model, message volume, and caching strategy.

Calculate your cost →

Weekly brief

For people actually using Claude at work.

Each week: one thing Claude can do in your work that most people haven't figured out yet — plus the failure modes to avoid. No tutorials. No hype.

No spam. Unsubscribe anytime.

What to read next

All articles →

AI Agent·Core Definition·5 min

When Claude stops answering and starts doing

There's a clean line between a model that responds to questions and one that takes actions in the world. Understanding that line is the most important thing to know about building with AI right now.

AI Agent·Failure Modes·6 min

Why most AI agent pilots fail in the first month

Building an AI agent that demos well is easy. Building one that works reliably in production is hard. The gap between the two is almost always one of the same five problems.

update·7 min

The Rate Limits API: read your org limits in code

Anthropic shipped the Rate Limits API on April 24, 2026. It lets admins read the rate limits configured for their organization and workspaces over HTTP, so gateways, alerting, and provisioning automation stop drifting against hardcoded numbers. Here's what it returns, how to call it, and where it fits in an admin stack.