The History, Present, and Future of Low-Bit Quantization for Large Language Models: A Case Study on Complex-domain Quantization
-
Graphical Abstract
-
Abstract
With the exponential growth in the parameter scale of Large Language Models (LLMs), model deployment and inference face severe challenges in terms of memory and computational resources. Quantization, as a core compression technique, significantly reduces storage requirements and computational overhead by lowering the numerical precision of weights and activations. This paper first reviews the development of quantization techniques, from classic Int8/4 methods to cutting-edge extreme-low bit algorithms. It summarizes the technical characteristics and performance evolution of typical methods, identifying a key challenge: traditional real-domain quantization is limited by discretization errors at extreme-low bitrates, making it difficult to break its performance ceiling. To address this limitation, this paper systematically reviews the complex-domain quantization series of work. This series introduces a quantization paradigm based on the complex domain, which significantly expands the model's expressive space by utilizing amplitude and phase as two degrees of freedom in the parameter representation. Experimental results demonstrate that iFairy outperforms existing extreme-low-bit methods on multiple benchmark datasets and effectively breaks the real-domain performance ceiling. This showcases the potential value of complex-domain quantization in achieving both efficient modeling and performance preservation. Through a systematic analysis of quantization’s evolution and the case study on complex-domain quantization, this paper reveals development patterns and future trends, providing a reference for the theoretical research and engineering implementation of efficient LLMs.
-
-