AI Codex
Infrastructure & DeploymentDevelopersCTOs

Throughput

How many requests a system can handle per unit of time — the volume capacity, as opposed to how fast any single request is. A system might have low latency (each request is fast) but low throughput (it can only handle a few at once). At production scale, you need both: fast individual responses and the ability to handle many users simultaneously. Throughput is what determines whether your AI feature works fine for 10 users and falls apart at 10,000.

In practice

Your app processes insurance claims with Claude — 500 claims per hour during peak time. If Claude can only handle 300 per hour before you hit API limits, claims back up and SLAs break. Throughput is how much work the system can handle per unit of time. Optimizing it means batching, parallelizing, or upgrading your API tier.

Related concepts