AI Codex
Foundation Models & LLMsDevelopersCTOs

Model Distillation

Also: knowledge distillation

Training a smaller, faster AI model to behave like a larger, slower one — by having the small model learn from the big model's outputs. The result is a 'student' model that is much cheaper to run but retains most of the quality of the 'teacher.' This is how companies make AI fast enough and cheap enough for production use at scale, without sacrificing all the capability of their best models.

In practice

You need Claude-level quality for a task but can't afford Claude Opus pricing at your volume. Model distillation creates a smaller, cheaper model that learned to behave like the larger one — by training on Opus outputs. The student model is faster and cheaper, and retains most of the quality for your specific task.

Related concepts