Quantization
A technique for shrinking AI models so they run faster and use less memory, by reducing the precision of the numbers stored inside them. Imagine rounding pi from 3.14159 to 3.14 — you lose a little precision, but the number takes up less space. Quantized models can run on laptops or phones instead of requiring massive data center servers. The tradeoff is usually a small drop in quality for a big gain in speed and efficiency.
In practice
You want to run a capable AI model on a laptop without a GPU. Quantization compresses the model's internal numbers from high precision to lower precision — like rounding 3.14159 to 3.14. The model gets significantly smaller and faster with a small quality tradeoff, making it practical to run on consumer hardware.
Related concepts