AI Codex
Foundation Models & LLMsDevelopersCTOs

Model Inference

The process of running a trained AI model to get a response — what happens every single time you send a message. 'Inference' is just the technical word for 'using the model.' Training is the expensive, one-time process of teaching the model. Inference is the ongoing, per-request process of getting answers out of it. When you're paying API costs, you're paying for inference.

In practice

Every time someone sends a message to your Claude-powered app, inference happens: your text goes into the model, the model processes it, and a response comes out. Inference is the live, per-request use of the model — as opposed to training, which is the one-time process of building it. Your API costs are entirely inference costs.

Related concepts