Foundation Models & LLMsDevelopersCTOs
Model Inference
The process of running a trained AI model to get a response — what happens every single time you send a message. 'Inference' is just the technical word for 'using the model.' Training is the expensive, one-time process of teaching the model. Inference is the ongoing, per-request process of getting answers out of it. When you're paying API costs, you're paying for inference.
◎
In practice
Every time someone sends a message to your Claude-powered app, inference happens: your text goes into the model, the model processes it, and a response comes out. Inference is the live, per-request use of the model — as opposed to training, which is the one-time process of building it. Your API costs are entirely inference costs.
Related concepts