AI Codex
Business Strategy & ROIHow It Works

How to minimise your Claude token usage without sacrificing quality

Tokens are what you pay for. Here are the practical things you can do to use fewer of them — from how you prompt to which model you choose.

6 min read·Token

Every message you send Claude, every document you upload, every response Claude generates — all of it is measured in tokens. Tokens are how usage is tracked (on Claude.ai plans) and how costs are calculated (on the API). Using fewer tokens does not mean using Claude less. It means using Claude more efficiently.

The biggest token costs (and what to do about each)

1. Long system prompts and Project instructions

Your system prompt or Project instructions are sent with every single message. If your instructions are 3,000 tokens, that is 3,000 tokens consumed on every turn of the conversation — even if the user just says "thanks."

What to do: Keep instructions focused and specific. Remove examples that illustrate the same point. Cut the "do not" lists — Claude follows positive instructions better than negative ones anyway. A tight 500-token system prompt that covers the essentials outperforms a 3,000-token one that covers every edge case.

2. Uploading large documents

When you upload a document to a Project or a conversation, the entire document is loaded into the context window. A 50-page PDF can be 30,000+ tokens.

What to do: Only upload documents Claude actually needs. If Claude only needs your product FAQ, don't also upload your entire employee handbook. For large documents, extract the relevant sections rather than uploading the whole thing. If you reference the same documents repeatedly, consider using prompt caching on the API.

3. Long conversation histories

Every message in a conversation — yours and Claude's — stays in the context window. A 40-message conversation can consume 20,000+ tokens before you type anything new.

What to do: Start fresh conversations for new topics rather than continuing old ones. In Claude.ai, each new chat in a Project still gets your Project instructions, so you don't lose context. On the API, manage conversation history deliberately — trim or summarise old messages rather than sending the entire history every time.

4. Verbose Claude responses

By default, Claude gives thorough, detailed responses. If you only need a short answer, you're paying for paragraphs you don't read.

What to do: Tell Claude how you want the response formatted. "Answer in 2-3 sentences" or "give me a bulleted list, no explanation" or "just the number." Adding output format instructions to your system prompt ensures every response is appropriately concise.

Choosing the right model

Not every task needs the most powerful model. Claude has three tiers:

  • Haiku — fastest, cheapest, good for simple tasks (classification, extraction, reformatting)
  • Sonnet — balanced, good for most everyday work (drafting, summarising, analysis)
  • Opus — most capable, best for complex reasoning, nuanced judgment, and difficult problems

If your team uses Claude.ai with a Pro or Team plan, model selection is handled in the model picker. On the API, routing simple tasks to Haiku and complex tasks to Sonnet or Opus can reduce costs by 60-80% without quality loss on the simple tasks.

Connectors and token awareness

Connectors pull in external content — documents from Google Drive, messages from Slack, pages from Notion. Every piece of content a connector retrieves consumes tokens.

Best practice: Disable connectors you don't need for a given conversation. If you are writing marketing copy, you probably don't need your Jira connector active — it will pull in issue context that adds tokens without adding value. Enable only the connectors relevant to your current task.

The 80/20 rule

For most teams, two changes make the biggest difference:

  1. Tighter system prompts — cut yours in half, test, iterate
  2. Start new conversations instead of running one endlessly

Everything else is optimisation. Get these two right first.

Further reading