AI Codex
Foundation Models & LLMsDevelopers

Tokenization

The step where AI breaks your text into small pieces — called tokens — before it can process anything. A token is roughly three-quarters of a word. "Hello, how are you?" is about 6 tokens. This matters practically because API costs and usage limits are measured in tokens, not words or characters. The more text you send (and receive), the more tokens you use.

In practice

Before Claude can process your message, it gets broken into tokens — small text chunks, roughly ¾ of a word each. "Unbelievable" might be one token. "AI" might be one token. This matters practically: a 10,000-word document is roughly 13,000 tokens, which affects both cost and whether it fits in the context window.

Related concepts