AI Codex
Tools & EcosystemRole-Specific

Prompt caching: why it matters when you're building with Claude at scale

If your application sends the same long system prompt on every request, you're paying to re-process it every time. Prompt caching stops that.

4 min read·Prompt Caching

Prompt caching is a feature that most people deploying Claude in products should know about — but it mostly matters at scale, not for casual use.

Here's the honest version of what it does and when it's worth caring about.

What it actually does

When you make an API call to Claude, you typically send a system prompt, some context, and the user's message. If your system prompt is long — detailed instructions, a large knowledge base, company documentation — Claude has to process all of that on every single request.

Prompt caching lets you save a portion of that context server-side. The first request processes everything and caches the expensive part. Subsequent requests reference the cache instead of re-processing it. The result: faster responses and lower costs on those cached tokens.

When this matters

If you're building a product where:

  • Every API call includes the same large system prompt (common)
  • You're passing the same long document as context on many requests
  • You're running a lot of requests per day

...then prompt caching can make a real difference to both latency and cost.

The official numbers from Anthropic: cache reads are 90% cheaper than standard input tokens and come back significantly faster.

When it doesn't matter

For direct use of Claude.ai (the web app) — it's not something you configure. For infrequent API use, or short system prompts, the savings aren't worth the added complexity.

The operator question

If you're deploying Claude in a product and you've noticed API costs are significant, or latency is a problem for users, prompt caching is one of the first things to look at. It's a relatively straightforward implementation and the economics are good.

If you're not at that stage yet — early pilot, small team, low volume — file this away for later. It's not a feature you need on day one.


Further reading