What is Quantization?

Question

Accepted Answer

A technique for shrinking AI models so they run faster and use less memory, by reducing the precision of the numbers stored inside them. Imagine rounding pi from 3.14159 to 3.14 — you lose a little precision, but the number takes up less space. Quantized models can run on laptops or phones instead of requiring massive data center servers. The tradeoff is usually a small drop in quality for a big gain in speed and efficiency.