AI Codex
Foundation Models & LLMsDevelopers

Attention Mechanism

Also: self-attention

The core reason modern AI handles long, complex text without losing track of what's important. Without it, AI would process text left-to-right, easily forgetting earlier context. With it, every word in a passage can weigh its relevance to every other word. It's why Claude can read a 100-page document and still accurately answer a question about something on page 3 — it hasn't forgotten it. The "attention" is which parts of the context the model focuses on when generating each word of its response.

In practice

You paste a 20-page contract into Claude and ask "what are the auto-renewal clauses?" Claude doesn't scan left-to-right and forget page 1 by page 20. The attention mechanism is why it can hold the whole document in mind and know exactly which paragraphs are relevant to your question.

Related concepts