AI Agent
Also: autonomous agent
An AI system set up to take a sequence of actions to complete a goal — not just answer a single question. Instead of just responding to your message, an agent can search the web, read documents, write code, send requests to other systems, and keep working through multiple steps until it finishes the task. Claude can act as an agent when given the right tools. The key difference from a regular chatbot: agents do, not just say.
In practice
Instead of asking Claude "what should I do next?" and acting on its answer yourself, an AI agent does the acting. You give it a goal — "monitor this inbox and draft replies to anything tagged urgent" — and it runs, takes actions, checks results, and keeps going until the job is done.
Related concepts
Where AI Agent shows up
28 articlesMost developers focus on the model. The engineers building production AI applications focus on everything around it. Here is what the agent harness is, why it determines whether your app actually works, and where to start building it intentionally.
A new Claude API feature lets Sonnet or Haiku call Opus mid-task when they need help. You pay Opus rates only for those calls — everything else runs at Sonnet or Haiku cost. Here's what it does and when to use it.
Anthropic now runs the full agent loop for you — sandboxed execution, built-in tools, and event streaming included. Here's what you get and when it makes sense over building the loop yourself.
System prompts, ticket workflows, escalation patterns, and QBR prep — the operational guide for deploying Claude across a customer success team.
Claude Code's desktop app was rebuilt for running multiple coding tasks at once. A new sidebar manages sessions across repos, an integrated terminal and diff viewer replace external tools, and side chat lets you branch conversations without interrupting ongoing work.
Routines let you configure a Claude Code task once — a nightly bug triage, a PR review on every push, an alert fix triggered by your monitoring system — and have it run in the cloud on its own schedule. Here's how they work and what they're useful for.
Agent Teams let you run multiple Claude Code instances with distinct roles — a frontend dev, a backend dev, a QA reviewer — all coordinating in parallel. There's a real setup cost and most tasks don't need it. Here's how to tell when it's worth it.
Aaron Levie said career counselors should be figuring out how to help students get these jobs. The path exists — 800% hiring growth, $180K–$700K+ comp, every major AI company hiring. It just hasn't been written down anywhere useful. Until now.
A GitHub full of side projects tells an FDE hiring manager that you can code. What they actually want to see is evidence that you can build in the real world — against legacy systems, ambiguous requirements, and non-technical stakeholders. These five projects show exactly that.
Most people handed AI responsibility try to do everything at once and ship nothing reliable, or wait for a perfect plan and never start. The 90-day path is simpler: one team, one workflow, one agent that actually works. Then you expand.
Most agents in production have never been formally tested. The person who set them up tried a few examples and it seemed fine. That's how you end up with a contract review agent that hallucinates clause details. Evaluation doesn't require code — it requires a spreadsheet and 30 minutes a week.
Most Agent Operators think connecting their internal systems to Claude requires an engineer. For the majority of use cases, it doesn't. Four levels of integration exist — and Level 2 (native connectors for Google/Microsoft) or Level 3 (Zapier) solve 80% of what you need.
Your Claude bill went from $200 to $2,000 and you can't explain why. The four cost drivers — bloated system prompts, unnecessary context loading, high failure rates, and no usage monitoring — each have fixes. Cost per task is the metric that matters, not total spend.
Aaron Levie said career counselors should quickly figure out how to get students into forward deployed engineer roles. The role exists, it's exploding, and it pays $150K–$700K+ total comp. The career infrastructure just hasn't caught up yet. Here's what to tell students.
Your CEO doesn't want a technology update. They want to know if the investment is working and whether to do more. Three types of evidence actually work: time saved, error rate improvement, and throughput. Here's how to measure them and how to present them.
There's a clean line between a model that responds to questions and one that takes actions in the world. Understanding that line is the most important thing to know about building with AI right now.
Anthropic and OpenAI both launched billion-dollar deployment companies in the same week — and both are built around the same type of engineer: someone who moves into a company, builds production AI systems against their actual messy environment, and leaves something that lasts. That engineer has a name now.
Aaron Levie says 500,000 to 1 million companies will hire for this role. Most won't call it 'Agent Operator.' Some will call it an AI program manager, an automation lead, an AI systems admin. Whatever the title, the job is the same: you are responsible for making AI agents actually work inside your company.
Building an AI agent that demos well is easy. Building one that works reliably in production is hard. The gap between the two is almost always one of the same five problems.
Agents fail differently than APIs. When a sub-agent times out halfway through a pipeline, you don't just get an error — you get partial state. The patterns that make multi-agent systems actually recover.
When your agent starts producing bad outputs, the instinct is to assume the model got worse. It usually didn't. 90% of agent failures are context failures or prompt failures — both of which you can diagnose and fix without any technical help.
Building the agent is the easy part. Getting people to use it is where most Agent Operators fail. Three types of resistance — trust, speed, job fear — each with a different fix. And one thing that kills adoption faster than anything else.
An agent isn't just a chatbot that can click buttons. It's a fundamentally different relationship between a human and an AI. Here's what that looks like when it's working.
A CS manager who uses Claude well can do meaningful work on renewals, QBRs, and escalations in the gaps between other work. Here's what that workflow actually looks like across a full day.
Where Claude genuinely saves hours for marketing managers, where it falls flat, and what the actual workflow looks like.
By default, a Managed Agents session starts fresh and forgets everything when it ends. Memory stores change that — they're workspace-scoped document collections the agent reads and writes across sessions. The feature entered public beta on April 23, 2026.
On May 5, 2026 Anthropic shipped 10 ready-made agent templates for financial services — pitch builder, KYC screener, month-end closer, and seven more. Each one bundles skills, connectors, and subagents in a way that's reusable beyond finance. Here's what shipped and what the pattern teaches anyone designing agents.
On May 6, 2026 Anthropic shipped three new capabilities for Claude Managed Agents: multiagent sessions (public beta), outcomes (public beta), and dreaming (research preview). Multiagent lets a lead agent delegate to specialist subagents on a shared filesystem; outcomes turns a rubric into a self-correction loop; dreaming lets an agent review its past sessions overnight and curate its memory.