面向视觉算法的知识蒸馏研究综述

潘海为; 于丰铭; 张可佳; 兰海燕; 孟庆宇; 李哲

doi:10.7544/issn1000-1239.202440694

摘要: 知识蒸馏作为深度学习中的关键技术，通过将大型教师模型的知识传递给较小的学生模型，实现了模型的压缩与加速. 在保证性能的前提下，显著减少了计算资源和存储需求，促进了高性能模型在资源受限的边缘设备上的部署. 围绕知识蒸馏的最新研究进展，进行了系统性的综述，从知识类型和师生模型架构2个角度对知识蒸馏进行分类，详细汇总了输出特征知识、中间特征知识、关系特征知识3种典型知识类型的蒸馏方法，以及卷积架构到卷积架构、卷积架构到ViT（vision Transformer）架构、ViT架构到卷积架构和ViT架构到ViT架构的蒸馏方法；探讨了离线蒸馏、在线蒸馏、自蒸馏、无数据蒸馏、多教师蒸馏和助理蒸馏的学习方式；归纳了基于蒸馏过程、知识结构、温度系数及损失函数的蒸馏优化方法，分析了对抗性技术、自动机器学习、强化学习和扩散模型对蒸馏的改进，并总结了蒸馏技术在常见应用中的实现. 尽管知识蒸馏取得了显著进展，但在实际应用和理论研究中仍面临诸多挑战. 最后，对这些问题进行了深入分析，并对未来发展方向提出了见解.

Abstract: Knowledge distillation, as a key technique in deep learning, achieves model compression and acceleration by transferring knowledge from a large teacher model to a smaller student model. Under the premise of maintaining performance, this technology significantly reduces the requirements of computational resources and storage, and facilitates the deployment of high-performance models on resource-constrained edge devices. Firstly, this paper provides a systematic review of the recent research in knowledge distillation and categorizes it from two perspectives: the type of knowledge and teacher-student model architectures. We comprehensively summarize the distillation methods based on three typical types of knowledge: output feature knowledge, intermediate feature knowledge, and relational feature knowledge, as well as distillation methods based on CNN to CNN architecture, CNN to ViT (vision Transformer) architecture, ViT to CNN architecture, and ViT to ViT architecture. Next, the paper explores various learning paradigms such as offline distillation, online distillation, self-distillation, data-free distillation, multi-teacher distillation, and assistant distillation. Then, the paper summarizes distillation optimization methods based on the distillation process, knowledge structure, temperature coefficient, and loss functions. It analyzes improvements in distillation brought by adversarial techniques, automated machine learning, reinforcement learning, and diffusion models, and concludes with the implementation of distillation technology in common applications. Despite significant advancements in knowledge distillation, numerous challenges remain in both practical applications and theoretical research. Finally, the paper provides an in-depth analysis of these issues and offers insights into future development directions.

面向视觉算法的知识蒸馏研究综述

Knowledge Distillation in Visual Algorithms: A Survey