Evaluation & SafetyFailure Modes

When your agent breaks — how to diagnose it and fix it

In brief

When your agent starts producing bad outputs, the instinct is to assume the model got worse. It usually didn't. 90% of agent failures are context failures or prompt failures — both of which you can diagnose and fix without any technical help.

8 min read·AI Agent

Contents

♡Sign in to save

It's Tuesday afternoon. Your agent worked fine on Friday. Now the customer service team is sending you screenshots of outputs that are clearly wrong — wrong product names, wrong procedures, outdated pricing. Or maybe they're not wrong so much as off: the format changed, the tone changed, the outputs are longer than they used to be and nobody knows why.

You don't have a developer to call. You have your system prompt, a few documents, and the admin console. You have to figure this out yourself.

The first and most important thing to understand: this is almost certainly not a model failure. The model didn't get dumber. Claude doesn't randomly degrade on a Tuesday afternoon. What changed is almost always one of five things — and all five are diagnosable and fixable without any code.

Here's the systematic approach.

The five failure modes

Failure Mode 1: Stale context

What it looks like: The agent gives outdated information. It references policies you changed last month, discontinued products, last quarter's pricing, a process you updated after a compliance audit. Everything else about the output looks fine — format, tone, structure — but the facts are wrong.

Why it happens: The documents or data the agent uses as context haven't been updated. The agent can only work with what it knows. If it knows the old version of your supplier policy, it will give answers based on the old version, confidently and consistently.

How to diagnose it: Ask yourself: when did the underlying information change? When did the agent last get updated context? If those dates don't line up, you've found your problem.

How to fix it: Update the context document. Then set a process to prevent this from happening again. At minimum: a calendar reminder on the schedule the underlying data changes. If the data changes frequently, look at connecting a live source rather than maintaining a static document (see wiring-internal-systems-to-agents).

Failure Mode 2: Prompt drift

What it looks like: The output format changed. The agent used to produce structured bullet points; now it produces paragraphs. It used to be concise; now it adds lengthy caveats. The content quality may be the same, but the form is different — and users notice.

Why it happens: Someone changed the system prompt. This is more common than it sounds. If multiple people have access to the Claude Project admin settings, prompt changes can happen without coordination. Even small changes — adding a sentence, reordering instructions — can change output format significantly.

How to diagnose it: Check who has admin access to the agent's Project. Check whether any changes were made recently. If you're the only admin and you haven't changed it, check whether you accidentally edited it (this happens — it's easier than you think to accidentally modify a system prompt in the admin console).

How to fix it: Restore the original prompt. If you don't have a copy, you'll need to reconstruct it — this is a reminder to always save your system prompts somewhere outside the admin console (a Google Doc works fine). After restoring, add a comment at the top: "Do not modify without running test cases first." Then run your test set to confirm the fix.

Failure Mode 3: Out-of-scope inputs

What it looks like: The agent was designed to handle one type of task and users are asking it to do related but different things. It's trying to answer but producing low-quality outputs. Or it's answering a different question than the one the user actually had, because it's interpreting out-of-scope inputs through the lens of what it knows how to do.

Why it happens: Users naturally push boundaries. Your invoice triage agent gets asked to draft supplier emails. Your policy lookup agent gets asked to make judgment calls about policy exceptions. Your scheduling agent gets asked about payroll. The agent doesn't know to redirect — it tries its best, and its best at out-of-scope tasks is usually poor.

How to diagnose it: Pull the actual user inputs from the last two weeks and read through them. Look for the pattern: are users asking for things that were never in the original design? If more than 10% of inputs are asking for something outside scope, you have scope creep.

How to fix it: Two options depending on whether the new use cases are worth supporting. If they are: update the system prompt to cover them explicitly, add test cases for them, and verify quality. If they're not: add explicit redirect instructions to the system prompt. "For questions about [X], I can't help directly — please contact [name/channel]." Specific redirects work much better than general ones.

Failure Mode 4: Missing context for specific case types

What it looks like: The agent works for most inputs but consistently fails on a specific type. Your contract review agent works for standard contracts but fails on contracts with non-standard clause numbering. Your supplier lookup works for domestic suppliers but gives wrong results for international ones. It's not random failure — it's patterned.

Why it happens: The agent's context covers the common case well but has gaps for certain edge cases. The agent doesn't know what it doesn't know — it makes its best attempt with incomplete information.

How to diagnose it: Find the pattern in the failures. What do they have in common? Is it a specific category, a specific format, a specific type of input? If you can describe the pattern in one sentence, you've found the gap.

How to fix it: Add examples of the failing case type to the agent's context. Add explicit instructions to the system prompt that cover the edge case. "When the contract uses non-standard clause numbering (e.g., A.1, A.2 instead of 1, 2), use the section header to identify clauses, not the number." Explicit examples of edge cases in the system prompt consistently fix these pattern failures.

Failure Mode 5: Model upgrade without testing

What it looks like: Everything was working fine, then it wasn't. You may not have noticed anything changed — no documents were updated, no prompts were edited. But the outputs are subtly different: longer, or more conservative, or formatted differently.

Why it happens: Claude's underlying models get upgraded periodically. When Anthropic releases an improved version, behavior can change even for the same prompt. Most of the time these changes are improvements. Occasionally they break something specific to your use case.

How to diagnose it: Check when the failure started. If you have a log of your test set results, look at when the pass rate dropped. Check the Anthropic changelog for model updates around that time.

How to fix it: Run your test set on the new model version. If specific cases are failing that weren't before, adjust your system prompt to restore the original behavior. Add explicit instructions where the new defaults diverge from what you need.

This is the single best argument for having a test set before you need one. If you run your 20 test cases every week and you catch a regression the day after a model upgrade, you know immediately what changed and when. Without a test set, you find out when users complain, which is usually 2–3 weeks after the change.

The 45-minute debugging workflow

When an agent breaks and you don't immediately know why, work through this sequence:

Step 1 (5 min): Reproduce it with a specific example.
Don't debug in the abstract. Find one specific input that produced a bad output. Run it again now. Does it still produce the bad output? If yes, continue. If it's intermittent, run it 5 more times and note the pattern.

Step 2 (5 min): Narrow to when it started.
Ask users when they first noticed the problem. Check your test set history if you have one. Knowing when it started tells you what changed — it narrows the universe of causes from everything to "something that happened in this time window."

Step 3 (10 min): Check the context.
When were the documents or data the agent uses last updated? Does anything in those documents conflict with the failing output? Is there information that's now outdated?

Step 4 (10 min): Check the prompt.
Pull the current system prompt. Does it have clear instructions that cover this type of case? Did anyone change it recently? Is there anything in it that could be causing the behavior you're seeing?

Step 5 (15 min): Check for scope creep.
Pull the last 50 user inputs. Are users asking for things outside the original design? Is the agent trying to answer questions it wasn't built to answer?

In most cases, one of steps 3–5 will reveal the problem. The fix is usually either updating a document, restoring or editing the system prompt, or adding explicit instructions for an edge case.

When to escalate

If you've run through all five steps and genuinely can't find the cause, document before you ask for help. Write down:

The specific input that fails
What the agent produces
What it should produce
When the problem started
What you've already checked and ruled out

"The agent is giving wrong answers" takes someone 2 hours to diagnose. "On inputs containing non-standard invoice formats, the agent assigns them to the wrong category. Started around May 8. Context documents are current. System prompt hasn't changed. Happens on about 30% of non-standard inputs, consistently." takes someone 20 minutes.

What not to do

Don't rebuild from scratch. When an agent is failing, the instinct is to throw it out and start over. Don't. The failure is almost always in the context or prompt, not the underlying architecture. Rebuilding costs you weeks and doesn't fix the root cause.

Don't swap models without testing. "Maybe Claude 3.5 would be better" is sometimes true. It's also sometimes worse for your specific use case. Never change the model without running your test set on both versions first.

Don't turn it off without communicating. If you need to take an agent offline while you debug, tell the users. "The invoice triage agent is down for maintenance until Thursday — please submit to [email] in the meantime." Silence is worse than a clear message.

Try this today

Take the agent that's been giving you the most trouble. Find one specific example of a bad output — an actual input and output pair where the output was wrong or unhelpful. Walk through steps 1–5 of the debugging workflow. Write down what you find.

If you've been blaming the model, you're about to find out it's something else.

Weekly brief

For people actually using Claude at work.

Each week: one thing Claude can do in your work that most people haven't figured out yet — plus the failure modes to avoid. No tutorials. No hype.

No spam. Unsubscribe anytime.

What to read next

Picked for where you are now

All articles →

How It Works·9 min

How to evaluate your agents — without being a developer

The test set that catches failures before users do — how to build and run one without code.

Failure Modes·8 min

Getting your team to actually use the agent — the change management problem nobody warned you about

One bad public failure can undo months of adoption progress — here's how to manage the human side of reliability.