基于双生成器网络的Data-Free知识蒸馏

张晶; 鞠佳良; 任永功

doi:10.7544/issn1000-1239.202220024

摘要: 知识蒸馏(knowledge distillation, KD)通过最大化近似输出分布使“教师网络”指导“学生网络”充分训练，成为大规模深度网络近端迁移、部署及应用的重要技术. 然而，隐私保护意识增强与传输问题加剧使网络训练数据难以获取. 如何在Data-Free的自由环境下，保证压缩网络准确率成为重要的研究方向. Data-Free学生网络学习(data-free learning of student networks, DAFL)模型，建立“教师”端生成器获得与预训练网络分布近似的伪数据集，通过知识蒸馏训练“学生网络”. 然而，该框架中生成器构建及优化仍存在2个问题：1)过度信任“教师网络”对缺失真实标签伪样本的判别结果，同时，“教师网络”与“学生网络”优化目标不同，使“学生网络”难以获得准确、一致的优化信息；2)仅依赖于“教师网络”训练损失，导致数据特征多样性缺失，降低“学生网络”泛化性. 针对这2个问题，提出双生成器网络架构DG-DAFL(double generators-DAFL)，分别建立“教师”与“学生”端生成器并同时优化，实现网络任务与优化目标一致，提升“学生网络”判别性能. 进一步，增加双生成器样本分布差异损失，利用“教师网络”潜在分布先验信息优化生成器，保证“学生网络”识别准确率并提升泛化性. 实验结果表明，该方法在Data-Free环境中获得了更为有效且更鲁棒的知识蒸馏效果. DG-DAFL方法代码及模型已开源：https://github.com/LNNU-computer-research-526/DG-DAFL.git.

Abstract: Knowledge distillation (KD) maximizes the similarity of output distributions between teacher-network and student-network to achieve network compression and the large-scale network proximal-end deployment and application. However, the privacy protection and transmission problems result in that the training data are difficultly collected. In the scenario of training data shortage that is called data-free, improving the performance of KD is a meaningful task. Data-free learning (DAFL) builds up teacher-generator to obtain pseudo data that are similar as real samples, and then pseudo data are utilized to train student-network by distilling. Nevertheless, the training process of teacher-generator will produce both problems: 1) Absolutely trusting the discrimination outputs of teacher-network maybe include incorrectly information from unlabeled pseudo data, moreover, teacher-network and student-network have different learning targets. Therefore, it is difficult to obtain the accuracy and coincident information for training student-network. 2) Over-dependences loss values originated from teacher-network, which induces pseudo data with un-diversity damaging the generalization of student-network. Aim to resolve above problems, we propose a double generators network framework DG-DAFL for data-free by building up double generators. In DG-DAFL, student-network and teacher-network obtain the same learning tasks by optimizing double generators at the same time, which enhances the performance of student-network. Moreover, we construct the distribution loss between student-generator and teacher-generator to enrich sample diversity and further improve the generalization of student-network. According to the results of experiments, our method achieves the more efficient and robust performances in three popular datasets. The code and model of DG-DAFL are published in https://github.com/LNNU-computer-research-526/DG-DAFL.git.

基于双生成器网络的Data-Free知识蒸馏

Double-Generators Network for Data-Free Knowledge Distillation