Double-Generators Network for Data-Free Knowledge Distillation

Zhang Jing; Ju Jialiang; Ren Yonggong

doi:10.7544/issn1000-1239.202220024

Zhang Jing, Ju Jialiang, Ren Yonggong. Double-Generators Network for Data-Free Knowledge Distillation[J]. Journal of Computer Research and Development, 2023, 60(7): 1615-1627. DOI: 10.7544/issn1000-1239.202220024

Citation:

Double-Generators Network for Data-Free Knowledge Distillation

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Knowledge distillation (KD) maximizes the similarity of output distributions between teacher-network and student-network to achieve network compression and the large-scale network proximal-end deployment and application. However, the privacy protection and transmission problems result in that the training data are difficultly collected. In the scenario of training data shortage that is called data-free, improving the performance of KD is a meaningful task. Data-free learning (DAFL) builds up teacher-generator to obtain pseudo data that are similar as real samples, and then pseudo data are utilized to train student-network by distilling. Nevertheless, the training process of teacher-generator will produce both problems: 1) Absolutely trusting the discrimination outputs of teacher-network maybe include incorrectly information from unlabeled pseudo data, moreover, teacher-network and student-network have different learning targets. Therefore, it is difficult to obtain the accuracy and coincident information for training student-network. 2) Over-dependences loss values originated from teacher-network, which induces pseudo data with un-diversity damaging the generalization of student-network. Aim to resolve above problems, we propose a double generators network framework DG-DAFL for data-free by building up double generators. In DG-DAFL, student-network and teacher-network obtain the same learning tasks by optimizing double generators at the same time, which enhances the performance of student-network. Moreover, we construct the distribution loss between student-generator and teacher-generator to enrich sample diversity and further improve the generalization of student-network. According to the results of experiments, our method achieves the more efficient and robust performances in three popular datasets. The code and model of DG-DAFL are published in https://github.com/LNNU-computer-research-526/DG-DAFL.git.

FullText(HTML)

References (40)

Cited By

Turn off MathJax

Article Contents

Double-Generators Network for Data-Free Knowledge Distillation

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content