AI Codex
Foundation Models & LLMsCore Definition

The unit everything in AI is priced and measured in

Tokens are how language models read and write text — and how every AI API charges you. Understanding them turns abstract pricing into something you can predict and control.

4 min read·Token

If you've ever wondered why AI APIs charge by "tokens" instead of words or characters, this is the explanation.

What a token is

A token is a chunk of text — somewhere between a character and a word. It's the atomic unit that language models work with.

English text breaks down roughly like this:

  • Common short words are usually one token: "the," "is," "a," "in"
  • Longer words often split into two or more tokens: "token" is one token, "tokenization" might be two or three
  • Punctuation, spaces, and special characters each take tokens
  • Numbers are chunked in various ways

The rule of thumb: one token is about ¾ of a word, or roughly 4 characters. So 1,000 tokens ≈ 750 words, and a typical page of text is around 500–600 tokens.

Why models use tokens instead of words

Words are inconsistent. "Run" and "running" are related but different strings. "Unbelievable" is one word but contains recognizable sub-units. "New York" is two words but often functions as one concept.

Tokens let the model work at a level that captures meaningful sub-units without being arbitrarily split at spaces. The tokenizer — the system that converts text to tokens before feeding it to the model — is trained to find cuts that preserve semantic meaning.

How this affects you as a builder

Pricing. Every AI API, including Anthropic's, charges per token — for input (what you send) and output (what Claude generates). Understanding token counts lets you estimate and control costs before they surprise you.

context window limits. Claude's 200,000-token context window means 200,000 tokens of combined input and output. That's roughly 150,000 words — a lot, but finite. Long documents, conversation history, and system prompts all count toward this limit.

Performance. Fewer input tokens means faster responses and lower latency. Verbose prompts cost more and process more slowly than concise ones.

The practical implications

A few things worth knowing for everyday use:

Code uses more tokens than prose. Programming languages have many special characters and unusual patterns that tokenize inefficiently.

Non-English text often uses more tokens per "word" than English. Languages with complex morphology or non-Latin scripts can require significantly more tokens to express the same content.

Whitespace and formatting add up. Excessive newlines, indentation, and markdown syntax all consume tokens. Clean, tight formatting uses context window more efficiently.

How to estimate your usage

For rough estimates: take your word count, multiply by 1.3, and that's approximately your token count. For precise counts before making API calls, Anthropic provides a tokenizer tool — or you can use the count_tokens endpoint to get exact figures before committing to a request.

Understanding tokens turns the abstract "AI cost" into something predictable. Once you can estimate token counts reliably, you can design applications that are efficient by default rather than expensive by accident.