Prompt Caching
Also: context caching
Prompt caching saves a portion of your prompt — typically a large system prompt or document — so Claude doesn't need to re-process it on every request. The first call includes the full context; subsequent calls reference the cached version. The practical effect: faster responses and lower API costs when you're repeatedly sending the same long context, which is common in production applications.
In practice
Your app sends the same 2,000-word system prompt on every API call. Without caching, you pay to process those 2,000 words every single time. With prompt caching, Anthropic processes the system prompt once and keeps it in memory — subsequent calls that reuse it cost 90% less for that portion. It's one of the most impactful cost levers for production apps.
Related concepts
Where Prompt Caching shows up
3 articlesPrompt caching is Claude's way of remembering the expensive part of a conversation so you don't have to re-send — and re-pay for — the same context on every request.
Prompt caching can cut your Claude API costs by 80% on the requests that matter most. Here's exactly how to implement it, why most teams cache the wrong things first, and what to fix.
If your application sends the same long system prompt on every request, you're paying to re-process it every time. Prompt caching stops that.