Infrastructure & DeploymentHow It Works

Production error handling for Claude applications

In brief

The errors you will definitely hit, the ones that will surprise you, and the patterns that make your app resilient when Claude or the API behaves unexpectedly.

6 min read·

Contents

♡Sign in to save

Production Claude applications fail in predictable ways. The API goes down. You get rate limited. Claude generates output in an unexpected format. Users send inputs that break your prompts. Here is how to handle each category cleanly.

API error taxonomy

The Anthropic Python SDK raises specific exception types. Know them:

import anthropic

try:
    response = client.messages.create(...)
except anthropic.AuthenticationError:
    # Invalid API key. Fix immediately — this is a config error.
    pass
except anthropic.PermissionDeniedError:
    # Key doesn't have access to this model or feature.
    pass
except anthropic.NotFoundError:
    # Model doesn't exist or was deprecated.
    pass
except anthropic.RateLimitError:
    # Too many requests or too many tokens. Retry with backoff.
    pass
except anthropic.UnprocessableEntityError:
    # Request was malformed — message structure invalid.
    pass
except anthropic.APIStatusError as e:
    if e.status_code >= 500:
        # Server-side error. Retry.
        pass
    else:
        # Other 4xx — don't retry.
        pass
except anthropic.APIConnectionError:
    # Network issue. Retry.
    pass
except anthropic.APITimeoutError:
    # Request timed out. Retry.
    pass

In TypeScript:

import Anthropic from '@anthropic-ai/sdk';

try {
  const response = await client.messages.create(params);
} catch (err) {
  if (err instanceof Anthropic.AuthenticationError) { /* config error */ }
  else if (err instanceof Anthropic.RateLimitError)  { /* retry */      }
  else if (err instanceof Anthropic.APIError) {
    console.error(err.status, err.message);
  }
}

Output format errors

If you ask Claude to return JSON and it returns something else, your JSON.parse() call fails. This is common when:

The system prompt is ambiguous about format
The input is long enough to push format instructions out of context
Claude includes explanation text around the JSON

The robust pattern:

import json
import re

def parse_json_response(text: str) -> dict:
    """Extract JSON from Claude's response, even if surrounded by text."""
    # Try direct parse first
    try:
        return json.loads(text.strip())
    except json.JSONDecodeError:
        pass

    # Try extracting from markdown code block
    match = re.search(r'```(?:json)?s*([sS]*?)```', text)
    if match:
        try:
            return json.loads(match.group(1).strip())
        except json.JSONDecodeError:
            pass

    # Try finding first { ... } block
    match = re.search(r'{[sS]*}', text)
    if match:
        try:
            return json.loads(match.group(0))
        except json.JSONDecodeError:
            pass

    raise ValueError(f"Could not parse JSON from response: {text[:200]}")

Better long-term: use structured output properly with a schema in your system prompt, and validate with Pydantic:

from pydantic import BaseModel, ValidationError

class SummaryOutput(BaseModel):
    title: str
    key_points: list[str]
    sentiment: str

def get_summary(text: str) -> SummaryOutput:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=512,
        system="Return a JSON object with keys: title (string), key_points (list of strings), sentiment (positive/negative/neutral). No other text.",
        messages=[{"role": "user", "content": text}]
    )
    raw = response.content[0].text
    data = parse_json_response(raw)
    try:
        return SummaryOutput(**data)
    except ValidationError as e:
        raise ValueError(f"Claude returned unexpected schema: {e}")

Hallucination detection

For applications where accuracy matters, do not trust Claude blindly. Implement a verification step:

def verify_claim(claim: str, source_text: str) -> bool:
    """Ask a second Claude call to verify a claim against source text."""
    verification = client.messages.create(
        model="claude-haiku-4-5-20251001",  # cheap for verification
        max_tokens=16,
        messages=[{
            "role": "user",
            "content": f"Does this source text support this claim? Answer only YES or NO.

Source: {source_text}

Claim: {claim}"
        }]
    )
    return verification.content[0].text.strip().upper().startswith("YES")

Use this for:

Factual claims extracted from documents (RAG applications)
Data pulled from tools (is this what the API actually returned?)
Customer-facing content where errors are costly

Handling toxic or out-of-scope user input

Claude refuses some requests automatically. Your app should handle refusals gracefully:

def is_refusal(response_text: str) -> bool:
    """Heuristic check for Claude refusals."""
    refusal_signals = [
        "i cannot", "i can't", "i'm unable", "i won't", "i will not",
        "as an ai", "i don't have the ability"
    ]
    lower = response_text.lower()
    return any(signal in lower for signal in refusal_signals)

def safe_call(user_message: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": user_message}]
    )
    text = response.content[0].text
    if is_refusal(text):
        return "I can't help with that in this context. Try rephrasing, or contact support."
    return text

Context window overflow

If your input exceeds the context window, you get an error. Guard against it:

def estimate_tokens(text: str) -> int:
    """Rough estimate: 4 characters ≈ 1 token."""
    return len(text) // 4

MAX_INPUT_TOKENS = 180_000  # leave room for output

def safe_messages(messages: list[dict], system: str = "") -> list[dict]:
    """Trim message history if approaching context limit."""
    system_tokens = estimate_tokens(system)
    budget = MAX_INPUT_TOKENS - system_tokens - 1000  # safety margin

    result = []
    total = 0
    for msg in reversed(messages):
        content = msg.get("content", "")
        tokens = estimate_tokens(str(content))
        if total + tokens > budget:
            break
        result.insert(0, msg)
        total += tokens

    return result

User-facing error messages

Never surface raw API errors to users. Map them to friendly messages:

def user_friendly_error(error: Exception) -> str:
    if isinstance(error, anthropic.RateLimitError):
        return "We're getting a lot of requests right now. Please try again in a moment."
    if isinstance(error, anthropic.APIConnectionError):
        return "Connection issue. Please check your internet and try again."
    if isinstance(error, anthropic.APITimeoutError):
        return "This is taking longer than expected. Please try again."
    if isinstance(error, anthropic.APIStatusError) and error.status_code >= 500:
        return "Something went wrong on our end. We've been notified and are looking into it."
    return "Something went wrong. Please try again."

Robustness is not exciting to build. It is what separates a demo from a product.