• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Zhang Jing, Ju Jialiang, Ren Yonggong. Double-Generators Network for Data-Free Knowledge Distillation[J]. Journal of Computer Research and Development, 2023, 60(7): 1615-1627. DOI: 10.7544/issn1000-1239.202220024
Citation: Zhang Jing, Ju Jialiang, Ren Yonggong. Double-Generators Network for Data-Free Knowledge Distillation[J]. Journal of Computer Research and Development, 2023, 60(7): 1615-1627. DOI: 10.7544/issn1000-1239.202220024

Double-Generators Network for Data-Free Knowledge Distillation

Funds: This work was supported by the National Natural Science Foundation of China (61902165,61976109), the Dalian Science and Technology Innovation Fund ( 2018J12GX047), and the Social Science Foundation of Ministry of Education of China(21YJC880104)
More Information
  • Author Bio:

    Zhang Jing: born in 1984. PhD, associate professor. Her main research interests include machine learning and reinforcement learning

    Ju Jialiang: born in 1998. Master. His main research interests include deep learning and machine learning

    Ren Yonggong: born in 1972. PhD, professor. His main research interests include data mining and artificial intelligence

  • Received Date: January 03, 2022
  • Revised Date: August 07, 2022
  • Available Online: February 26, 2023
  • Knowledge distillation (KD) maximizes the similarity of output distributions between teacher-network and student-network to achieve network compression and the large-scale network proximal-end deployment and application. However, the privacy protection and transmission problems result in that the training data are difficultly collected. In the scenario of training data shortage that is called data-free, improving the performance of KD is a meaningful task. Data-free learning (DAFL) builds up teacher-generator to obtain pseudo data that are similar as real samples, and then pseudo data are utilized to train student-network by distilling. Nevertheless, the training process of teacher-generator will produce both problems: 1) Absolutely trusting the discrimination outputs of teacher-network maybe include incorrectly information from unlabeled pseudo data, moreover, teacher-network and student-network have different learning targets. Therefore, it is difficult to obtain the accuracy and coincident information for training student-network. 2) Over-dependences loss values originated from teacher-network, which induces pseudo data with un-diversity damaging the generalization of student-network. Aim to resolve above problems, we propose a double generators network framework DG-DAFL for data-free by building up double generators. In DG-DAFL, student-network and teacher-network obtain the same learning tasks by optimizing double generators at the same time, which enhances the performance of student-network. Moreover, we construct the distribution loss between student-generator and teacher-generator to enrich sample diversity and further improve the generalization of student-network. According to the results of experiments, our method achieves the more efficient and robust performances in three popular datasets. The code and model of DG-DAFL are published in https://github.com/LNNU-computer-research-526/DG-DAFL.git.

  • [1]
    LeCun Y, Boser B, Denker J S, et al. Backpropagation applied to handwritten zip code recognition[J]. Neural Computation, 1989, 1(4): 541−551 doi: 10.1162/neco.1989.1.4.541
    [2]
    Neill J O. An overview of neural network compression[J]. arXiv preprint, arXiv: 2006.03669, 2020
    [3]
    Buciluǎ C, Caruana R, Niculescu-Mizil A. Model compression[C]//Proc of the 12th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2006: 535−541
    [4]
    Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. Computer Science, 2015, 14(7): 38−39
    [5]
    Chen Hanting, Wang Yunhe, Xu Chang, et al. Data-free learning of student networks[C]//Proc of the 17th IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2019: 3514−3522
    [6]
    Anwar S, Hwang K, Sung W. Structured pruning of deep convolutional neural networks[J]. ACM Journal on Emerging Technologies in Computing Systems, 2017, 13(3): 1−18
    [7]
    Yang Yingzhen, Yu Jiahui, Jojic N, et al. FSNet: Compression of deep convolutional neural networks by filter summary[J]. arXiv preprint, arXiv: 1902.03264, 2019
    [8]
    Gong Yunchao, Liu Liu, Yang Ming, et al. Compressing deep convolutional networks using vector quantization[J]. arXiv preprint, arXiv: 1412.6115, 2014
    [9]
    Jaderberg M, Vedaldi A, Zisserman A. Speeding up convolutional neural networks with low rank expansions[J]. arXiv preprint, arXiv: 1405.3866, 2014
    [10]
    Ba L J, Caruana R. Do deep nets really need to be deep?[J]. Advances in Neural Information Processing Systems, 2014, 3: 2654−2662
    [11]
    Urban G, Geras K J, Kahou S E, et al. Do deep convolutional nets really need to be deep and convolutional?[J]. arXiv preprint, arXiv: 1603.05691, 2016
    [12]
    Cho J H, Hariharan B. On the efficacy of knowledge distillation[C]//Proc of the 17th IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2019: 4794 − 4802
    [13]
    Furlanello T, Lipton Z, Tschannen M, et al. Born again neural networks[C]//Proc of the 35th Int Conf on Machine Learning. New York: ACM, 2018: 1607−1616
    [14]
    Mirzadeh S I, Farajtabar M, Li A, et al. Improved knowledge distillation via teacher assistant[C]//Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020, 34(04): 5191−5198
    [15]
    Yang Chenlin, Xie Lingxi, Su Chi, et al. Snapshot distillation: Teacher-student optimization in one generation[C]//Proc of the 32nd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2019: 2859−2868
    [16]
    Zhang Ying, Xiang Tao, Hospedales T M, et al. Deep mutual learning[C]//Proc of the 31st IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018: 4320−4328
    [17]
    Heo B, Kim J, Yun S, et al. A comprehensive overhaul of feature distillation[C]//Proc of the 17th IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2019: 1921−1930
    [18]
    Heo B, Lee M, Yun S, et al. Knowledge transfer via distillation of activation boundaries formed by hidden neurons[C]//Proc of the 33rd AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2019: 3779 − 3787
    [19]
    Park W, Kim D, Lu Yan, et al. Relational knowledge distillation[C]//Proc of the 32nd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2019: 3967−3976
    [20]
    Romero A, Ballas N, Kahou S E, et al. Fitnets: Hints for thin deep nets[J]. arXiv preprint, arXiv: 1412.6550, 2014
    [21]
    Tian Yonglong, Krishnan D, Isola P. Contrastive representation distillation[J]. arXiv preprint, arXiv: 1910.10699, 2019
    [22]
    Tung F, Mori G. Similarity-preserving knowledge distillation[C]//Proc of the 17th IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2019: 1365−1374
    [23]
    Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[J]. Advances in Neural Information Processing Systems, 2014, 27: 2672−2680 doi: 10.1145/3422622
    [24]
    Wang Yang. Survey on deep multi-modal data analytics: Collaboration, rivalry, and fusion[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2021, 17(S1): 1−25
    [25]
    Salimans T, Goodfellow I, Zaremba W, et al. Improved techniques for training GANs[J]. Advances in Neural Information Processing Systems, 2016, 29: 2234−2242
    [26]
    Vondrick C, Pirsiavash H, Torralba A. Generating videos with scene dynamics[J]. Advances in Neural Information Processing Systems, 2016, 29: 613−621
    [27]
    Nguyen A, Dosovitskiy A, Yosinski J, et al. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks[C]//Proc of the 17th IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2019: 4794−4802
    [28]
    Bhardwaj K, Suda N, Marculescu R. Dream distillation: A data-independent model compression framework[J]. arXiv preprint, arXiv: 1905.07072, 2019
    [29]
    Liu Pengpeng, King I, Lyu M R, et al. DDFlow: Learning optical flow with unlabeled data distillation[C]//Proc of the 33rd AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2019: 8770−8777
    [30]
    Zhang Wentao, Miao Xupeng, Shao Yingxia et al. Reliable data distillation on graph convolutional network[C]//Proc of the 45th ACM SIGMOD Int Conf on Management of Data. New York ACM, 2020: 1399−1414
    [31]
    Yin Hongxu, Molchanov P, Alvarez J M, et al. Dreaming to distill: Data-free knowledge transfer via deepinversion[C]//Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 8715−8724
    [32]
    Lopes R G, Fenu S, Starner T. Data-free knowledge distillation for deep neural networks[J]. arXiv preprint, arXiv: 1710.07535, 2017
    [33]
    Fang Gongfan, Jie Song, Shen Chenchao, et al. Data-free adversarial distillation[J]. arXiv preprint, arXiv: 1912.11006, 2019
    [34]
    Han Pengchao, Park J, Wang Shiqiang, et al. Robustness and diversity seeking data-free knowledge distillation[C]//Proc of the 46th IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2021: 2740−2744
    [35]
    Nayak G K, Mopuri K R, Shaj V, et al. Zero-shot knowledge distillation in deep networks[C]//Proc of the 36th Int Conf on Machine Learning. New York: ACM, 2019: 4743−4751
    [36]
    Micaelli P, Storkey A J. Zero-shot knowledge transfer via adversarial belief matching[J]. Advances in Neural Information Processing Systems, 2019, 32: 9551−9561
    [37]
    Kimura A, Ghahramani Z, Takeuchi K, et al. Few-shot learning of neural networks from scratch by pseudo example optimization[J]. arXiv preprint, arXiv: 1802.03039, 2018
    [38]
    Shen Chengchao, Wang Xinchao, Yin Youtan, et al. Progressive network grafting for few-shot knowledge distillation[C]//Proc of the 35th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2021, 35(3): 2541−2549
    [39]
    Micaelli P, Storkey A J. Zero-shot knowledge transfer via adversarial belief matching[J]. Advances in Neural Information Processing Systems, 2019, 32: 9551−9561
    [40]
    Radosavovic I, Dollár P, Girshick R, et al. Data distillation: Towards omni-supervised learning[C]//Proc of the 31st IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018: 4119−4128
  • Related Articles

    [1]Qu Zhiguo, Chen Weilong, Sun Le, Liu Wenjie, Zhang Yanchun. ECG-QGAN: A ECG Generative Information System Based on Quantum Generative Adversarial Networks[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440527
    [2]Xue Zhihang, Xu Zheming, Lang Congyan, Feng Songhe, Wang Tao, Li Yidong. Text-to-Image Generation Method Based on Image-Text Semantic Consistency[J]. Journal of Computer Research and Development, 2023, 60(9): 2180-2190. DOI: 10.7544/issn1000-1239.202220416
    [3]Zhang Xiaoyu, Li Dongdong, Ren Pengjie, Chen Zhumin, Ma Jun, Ren Zhaochun. Memory Networks Based Knowledge-Aware Medical Dialogue Generation[J]. Journal of Computer Research and Development, 2022, 59(12): 2889-2900. DOI: 10.7544/issn1000-1239.20210851
    [4]Guo Zhengshan, Zuo Jie, Duan Lei, Li Renhao, He Chengxin, Xiao Yingjie, Wang Peiyan. A Generative Adversarial Negative Sampling Method for Knowledge Hypergraph Link Prediction[J]. Journal of Computer Research and Development, 2022, 59(8): 1742-1756. DOI: 10.7544/issn1000-1239.20220074
    [5]Chen Dawei, Fu Anmin, Zhou Chunyi, Chen Zhenzhu. Federated Learning Backdoor Attack Scheme Based on Generative Adversarial Network[J]. Journal of Computer Research and Development, 2021, 58(11): 2364-2373. DOI: 10.7544/issn1000-1239.2021.20210659
    [6]Jiang Bin, Liu Hongyu, Yang Chao, Tu Wenxuan, Zhao Zilong. A Face Inpainting Algorithm with Local Attribute Generative Adversarial Networks[J]. Journal of Computer Research and Development, 2019, 56(11): 2485-2493. DOI: 10.7544/issn1000-1239.2019.20180656
    [7]Tian Jiwei, Wang Jinsong, Shi Kai. Positive and Unlabeled Generative Adversarial Network on POI Positioning[J]. Journal of Computer Research and Development, 2019, 56(9): 1843-1850. DOI: 10.7544/issn1000-1239.2019.20180847
    [8]Song Kehui, Zhang Ying, Zhang Jiangwei, Yuan Xiaojie. A Generative Model for Synthesizing Structured Datasets Based on GAN[J]. Journal of Computer Research and Development, 2019, 56(9): 1832-1842. DOI: 10.7544/issn1000-1239.2019.20180353
    [9]Dai Chenchao, Wang Hongyuan, Ni Tongguang, Chen Shoubing. Person Re-Identification Based on Deep Convolutional Generative Adversarial Network and Expanded Neighbor Reranking[J]. Journal of Computer Research and Development, 2019, 56(8): 1632-1641. DOI: 10.7544/issn1000-1239.2019.20190195
    [10]Jiang Jinsong, Yan Kun, Ni Guiqiang, He Ming, and Yang Bo. Generic GUI Generator Based on XML and XSD[J]. Journal of Computer Research and Development, 2012, 49(4): 826-832.
  • Cited by

    Periodical cited type(1)

    1. LUO Haoran,HU Shuisong,WANG Wenyong,TANG Yuke,ZHOU Junwei. Research on Multi-Core Processor Analysis for WCET Estimation. ZTE Communications. 2024(01): 87-94 .

    Other cited types(4)

Catalog

    Article views (0) PDF downloads (1) Cited by(5)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return