Learning-based Model Quantization: Methods, Challenges, and Prospects

Huang Xinyi; Qu Zhihao; Ye Baoliu; Hu Shihong; Jia Ninghui; Qian Lin

doi:10.7544/issn1000-1239.202660128

Huang Xinyi, Qu Zhihao, Ye Baoliu, Hu Shihong, Jia Ninghui, Qian Lin. Learning-based Model Quantization: Methods, Challenges, and ProspectsJ. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202660128

Citation:

Learning-based Model Quantization: Methods, Challenges, and Prospects

Graphical Abstract

Abstract

Abstract

The increasing parameter scale and structural complexity of deep neural networks (DNNs) pose significant challenges for their efficient deployment in cloud-edge-end collaborative computing architectures, where edge and terminal nodes must support real-time inference under stringent constraints on computation, storage, inference latency, and communication overhead. Although model quantization effectively reduces resource consumption, traditional approaches based on fixed rules or empirical heuristics exhibit notable limitations in accuracy preservation, training stability, and adaptability across diverse architectures and application scenarios, particularly under very low bit-width conditions (≤4 bit). To mitigate these issues, learning-based quantization incorporates learnable components and diverse supervisory signals, substantially improving accuracy preservation, optimization robustness, and hardware adaptability under low-bit conditions. In this paper, we systematically review representative studies on learning-based quantization, centered on two primary paradigms: post-training quantization (PTQ) and quantization-aware training (QAT). We summarize performance enhancement strategies including learnable parameter modeling, reconstruction and approximation optimization, auxiliary supervision, and learning-driven methods under data and hardware constraints. We further organize existing technical approaches and their interrelationships from the perspectives of quantization process stages and learning signal mechanisms, and analyze the applicability of different strategies under varying data conditions, model scales, and application constraints. Finally, we discuss future research directions in learning-based quantization from three aspects: error-analysis-based optimization of quantization parameters and learning signals, the trade-off between accuracy and resources in learning-based quantization, and a unified quantization framework across model architectures.

FullText(HTML)

References (148)

Cited By

Turn off MathJax

Article Contents

Learning-based Model Quantization: Methods, Challenges, and Prospects

Abstract

Catalog

Export File

Citation

Format

Content