学习驱动的模型量化：方法、挑战与展望

黄馨怡; 屈志昊; 叶保留; 胡世红; 贾柠晖; 钱琳

doi:10.7544/issn1000-1239.202660128

摘要: 随着深度神经网络参数规模与结构复杂度持续提升，以及云边端协同计算架构的广泛部署，模型在边端侧高效推理面临计算存储受限、推理延迟高及通信开销大等关键技术挑战。模型量化能够有效降低资源开销，但在极低比特（≤4 bit）条件下，基于固定规则或经验启发的传统量化方法在精度保持、训练稳定性以及跨架构与应用场景适配性方面仍存在明显局限。为缓解上述问题，学习驱动量化通过引入可学习机制与多样化监督信号，显著提升了极低比特条件下模型的精度保持能力、优化鲁棒性及异构硬件适配效率。围绕学习驱动量化的核心方法与关键机制，系统梳理并分析了相关代表性研究工作，以后训练量化（post-training quantization，PTQ）与量化感知训练（quantization-aware training，QAT）2类典型范式为主线，总结了可学习参数建模、重构与近似优化、辅助信号监督以及数据和硬件约束下的学习驱动量化的性能增强方法。此外，从量化流程阶段与学习信号作用机制2个角度，对现有技术路径及其内在联系进行了系统整理，分析不同策略在数据条件、模型规模与应用约束下的适用特征。在此基础上，从基于误差解析的量化参数与学习信号优化、学习驱动量化中精度与资源的权衡，以及跨模型架构的学习驱动量化统一框架等方面，对学习驱动量化的未来发展趋势进行了展望。

Abstract: The increasing parameter scale and structural complexity of deep neural networks (DNNs) pose significant challenges for their efficient deployment in cloud-edge-end collaborative computing architectures, where edge and terminal nodes must support real-time inference under stringent constraints on computation, storage, inference latency, and communication overhead. Although model quantization effectively reduces resource consumption, traditional approaches based on fixed rules or empirical heuristics exhibit notable limitations in accuracy preservation, training stability, and adaptability across diverse architectures and application scenarios, particularly under very low bit-width conditions (≤4 bit). To mitigate these issues, learning-based quantization incorporates learnable components and diverse supervisory signals, substantially improving accuracy preservation, optimization robustness, and hardware adaptability under low-bit conditions. In this paper, we systematically review representative studies on learning-based quantization, centered on two primary paradigms: post-training quantization (PTQ) and quantization-aware training (QAT). We summarize performance enhancement strategies including learnable parameter modeling, reconstruction and approximation optimization, auxiliary supervision, and learning-driven methods under data and hardware constraints. We further organize existing technical approaches and their interrelationships from the perspectives of quantization process stages and learning signal mechanisms, and analyze the applicability of different strategies under varying data conditions, model scales, and application constraints. Finally, we discuss future research directions in learning-based quantization from three aspects: error-analysis-based optimization of quantization parameters and learning signals, the trade-off between accuracy and resources in learning-based quantization, and a unified quantization framework across model architectures.

学习驱动的模型量化：方法、挑战与展望

Learning-based Model Quantization: Methods, Challenges, and Prospects