Foundation Models & LLMsHow It Works

The context window in practice: what it means for how you work

In brief

The context window shapes what Claude can and can't do in any given conversation. Here is how to work with it.

Contents

The context window is the amount of text Claude can hold in its working memory at once — everything from the current conversation, uploaded documents, your Project instructions, and its own responses. Claude 3.5 Sonnet has a 200,000 token context window, which is roughly 150,000 words, or about 500 pages of text.

That sounds enormous. In practice, context windows fill up faster than you expect, and how you manage them affects the quality of your outputs.

What goes into the context window

Every token in the context window costs something, and affects how Claude attends to different parts of the conversation. The context window contains, in order:

Your system prompt or Project instructions
Any documents you have uploaded to the Project
The full conversation history — every message from you and every response from Claude
Your current message

The more that is in the context window, the more Claude has to process — and the more it may weight earlier instructions less heavily as the conversation gets longer.

The practical implications

Long conversations drift. In a very long conversation (40+ exchanges), Claude may start to lose track of instructions given early in the conversation, or produce outputs that are less consistent with the original setup. This is a property of attention — the model processes the full context, but recent content gets more weight. For long, complex tasks, it is often better to start a new conversation with a fresh context than to continue an old one indefinitely.

Big documents consume context aggressively. A 50-page PDF uploaded to a conversation is 30,000+ tokens. If you upload three such documents, you have used 90,000 tokens before you say anything. For large document sets, be selective about what you upload — only what Claude actually needs for the task at hand.

Project instructions are always present. Your Project system prompt is sent with every message. If it is 3,000 tokens, that is 3,000 tokens consumed on every turn. This is why keeping Project instructions tight matters — see the guidance on writing system prompts for how to do this well.

Prompt caching helps at scale. For teams using Claude via the API, prompt caching lets you cache repeated context (like a large document that is referenced in every call) so it does not need to be re-processed each time. This is a significant cost and latency saving for high-volume use cases.

When context length actually matters

For most everyday use — drafting emails, answering questions, producing reports — you will never notice context window limits. The 200k window is genuinely large.

Context management matters when you are:

Working with very large documents (50+ pages)
Running long analytical conversations
Building workflows that involve many back-and-forth exchanges
Using the API for high-volume automated tasks

For Claude.ai users: if a conversation is getting long and Claude seems to be losing track of earlier instructions, start a fresh conversation. Your Project instructions reload from scratch, and you get clean attention.

The quality principle

A smaller context with the right information in it produces better outputs than a large context full of noise. Claude attends better to focused, relevant input. Before uploading a document, ask: does Claude actually need all of this? Often the answer is no — a 5-page summary of a 50-page document does better than the full document for most tasks.

This is the same principle as a good briefing: give someone exactly what they need to do the work, not everything you have.

The context window in practice: what it means for how you work

What goes into the context window

The practical implications

When context length actually matters

The quality principle

Further reading

What to read next