AI Codex
Infrastructure & DeploymentDevelopersCTOsOperators

Latency

The time between sending a request to Claude and getting a response back. Low latency makes AI feel fast and responsive. High latency makes users feel like they're waiting. Latency depends on the model (larger models are slower), the length of the response, and network conditions. Streaming — showing Claude's response as it's being generated rather than waiting for the full answer — is the most common way to make high-latency responses feel faster.

In practice

You're building a live chat support tool where customers expect responses in under 2 seconds. You test Claude Sonnet and the average response takes 4 seconds. You switch to Claude Haiku — faster, slightly less capable — and get to 1.5 seconds. That response time is latency. For real-time user-facing apps, it often matters as much as quality.

Related concepts