Citation: | Yu Ying, Wei Wei, Tang Hong, Qian Jin. Multi-Stage Training with Multi-Level Knowledge Self-Distillation for Fine-Grained Image Recognition[J]. Journal of Computer Research and Development, 2023, 60(8): 1834-1845. DOI: 10.7544/issn1000-1239.202330262 |
Fine-grained image recognition is characterized by large intra-class variation and small inter-class variation, with wide applications in intelligent retail, biodiversity protection, and intelligent transportation. Extracting discriminative multi-granularity features is the key to improve the accuracy of fine-grained image recognition. Most of existing methods only perform knowledge acquisition at a single level, ignoring the effectiveness of multi-level information interaction for extracting robust features. The other work introduces attention mechanisms to locate discriminative local regions to extract discriminative features, but this inevitably increases the network complexity. To address these issues, a MKSMT (multi-level knowledge self-distillation with multi-step training) model for fine-grained image recognition is proposed. The model first learns features in the shallow network, then performs feature learning in the deep network, and uses self-distillation to transfer knowledge from the deep network to the shallow network. The optimized shallow network can help the deep network extract more robust features, thus improving the performance of the whole model. Experimental results show that MKSMT achieves classification accuracy of 92.8%, 92.6%, and 91.1% on three publicly available fine-grained image datasets, respectively, outperforming most state-of-the-art fine-grained recognition algorithms.
[1] |
Wei Xiushen, Song Yizhe, Mac Aodha O, et al. Fine-grained image analysis with deep learning: A survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(12): 8927−8948 doi: 10.1109/TPAMI.2021.3126648
|
[2] |
Tuia D, Kellenberger B, Beery S, et al. Perspectives in machine learning for wildlife conservation[J]. Nature Communications, 2022, 13(1): 792
|
[3] |
Yin Jiahang, Wu Ancong, Zheng Weishi. Fine-grained person re-identification[J]. International Journal of Computer Vision, 2020, 128: 1654−1672 doi: 10.1007/s11263-019-01259-0
|
[4] |
Wei Yucheng, Tran S, Xu Shuxiang, et al. Deep learning for retail product recognition: Challenges and techniques[J]. Computational Intelligence and Neuroscience, 2020. https://dl.acm.org/doi/10.1155/2020/88759
|
[5] |
Zhang Ning, Donahue J, Girshick R, et al. Part-based R-CNNs for fine-grained category detection[C] //Proc of the 13th European Conf on Computer Vision (ECCV). Berlin: Springer, 2014: 834−849
|
[6] |
Ding Yifeng, Ma Zhanyu, Wen Shaoguo, et al. AP-CNN: Weakly supervised attention pyramid convolutional neural network for fine-grained visual classification[J]. IEEE Transactions on Image Processing, 2021, 30: 2826−2836 doi: 10.1109/TIP.2021.3055617
|
[7] |
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C] //Proc of the 31st Advances in Neural Information Processing Systems. Cambridge, MA: MIT, 2017: 5998−6008
|
[8] |
Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: Transformers for image recognition at scale [C] //Proc of the 9th International Conference on Learning Representations, 2021. https://openreview.net/forum?id=YicbFdNTTy
|
[9] |
He Ju, Chen Jieneng, Liu Shuai, et al. TransFg: A transformer architecture for fine-grained recognition[C] //Proc of the AAAI Conf on Artificial Intelligence. Menlo Park, CA: AAAI, 2022, 36(1): 852−860
|
[10] |
Sun Hongbo, He Xiangteng, Peng Yuxin. SIM-Trans: Structure information modeling transformer for fine-grained visual categorization[C] //Proc of the 30th ACM Int Conf on Multimedia. New York: ACM, 2022: 5853−5861
|
[11] |
Zhang Yuan, Cao Jian, Zhang Ling, et al. A free lunch from vit: Adaptive attention multi-scale fusion transformer for fine-grained visual recognition[C] //Proc of IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2022: 3234−3238
|
[12] |
Yu Ying, Tang Hong, Qian Jin, et al. Fine-grained image recognition via trusted multi-granularity information fusion[J]. International Journal of Machine Learning and Cybernetics, 2023, 14(4): 1105−1117 doi: 10.1007/s13042-022-01685-6
|
[13] |
Liu Ze, Lin Yutong, Cao Yue, et al. Swin Transformer: Hierarchical vision transformer using shifted windows[C] //Proc of the IEEE/CVF Intl Conf on Computer Vision. Piscataway, NJ: IEEE, 2021: 10012−10022
|
[14] |
Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. arXiv preprint, arXiv: 1503.02531, 2015
|
[15] |
Gou Jiaping, Yu Baosheng, Maybank S J, et al. Knowledge distillation: A survey[J]. International Journal of Computer Vision, 2021, 129: 1789−1819 doi: 10.1007/s11263-021-01453-z
|
[16] |
Zhang Ying, Xiang Tao, Hospedales T M, et al. Deep mutual learning[C] //Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018: 4320−4328
|
[17] |
Zhang Linfeng, Song Jiebo, Gao Anni, et al. Be your own teacher: Improve the performance of convolutional neural networks via self distillation[C] //Proc of the IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2019: 3713−3722
|
[18] |
Wah C, Branson S, Welinder P, et al. The CaltechUCSD Birds-200−2011 dataset[R]. Pasadena: California Institute of Technology, 2011
|
[19] |
Khosla A, Jayadevaprakash N, Yao Bangpeng, et al. Novel dataset for fine-grained image categorization: Stanford Dogs [C] //Proc of CVPR Workshop on Fine-Grained Visual Categorization (FGVC). Piscataway, NJ: IEEE, 2011:806−813
|
[20] |
Van Horn G, Branson S, Farrell R, et al. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection[C] //Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2015: 595−604
|
[21] |
Zhang Lianbo, Huang Shaoli, Liu Wei. Learning sequentially diversified representations for fine-grained categorization[J]. Pattern Recognition, 2022, 121: 108219 doi: 10.1016/j.patcog.2021.108219
|
[22] |
Song Yue, Sebe N, Wang Wei. On the eigenvalues of global covariance pooling for fine-grained visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(3): 3554−3566
|
[23] |
Du Ruoyi, Xie Jiyang, Ma Zhanyu, et al. Progressive learning of category-consistent multi-granularity features for fine-grained visual classification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(12): 9521−9535
|
[24] |
Wang Jun, Yu Xiaohan, Gao Yongsheng. Feature fusion vision transformer for fine-grained visual categorization[J]. arXiv preprint, arXiv: 2107.02341, 2021
|
[25] |
Xu Qin, Wang Jiahui, Jiang Bo, et al. Fine-grained visual classification via internal ensemble learning transformer[J]. IEEE Transactions on Multimedia, 2023: 10042971
|
[26] |
Huang Shaoli, Wang Xiaochao, Tao Dacheng. Stochastic partial swap: Enhanced model generalization and interpretability for fine-grained recognition[C] //Proc of the IEEE/CVF IntConf on Computer Vision. Piscataway, NJ: IEEE, 2021: 620−629
|
[27] |
Behera A, Wharton Z, Hewage P R P G, et al. Context-aware attentional pooling (CAP) for fine-grained visual classification[C] //Proc of the AAAI conf on artificial intelligence. Menlo Park, CA: AAAI, 2021, 35(2): 929−937
|
[28] |
Ke Xiao, Cai Yuhang, Chen Baitao, et al. Granularity-aware distillation and structure modeling region proposal network for fine-grained image classification[J]. Pattern Recognition, 2023. 137: 109305
|
[29] |
Liu Hai, Zhang Cheng, Deng Yongjia, et al. TransIFC: Invariant cues-aware feature concentration learning for efficient fine-grained bird image classification[J]. IEEE Transactions on Multimedia, 2023. https://doi.org/10.1109/TMM.2023.3238548
|
[30] |
Zhu Haowei, Ke Wenjing, Li Dong, et al. Dual cross-attention learning for fine-grained visual categorization and object re-identification[C] //Proc of the IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2022: 4692−4702
|
[31] |
Zhao Yifan, Yan Ke, Huang Feiyue, et al. Graph-based high-order relation discovery for fine-grained recognition[C] //Proc of the IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 15079−15088
|
[32] |
Zhuang Peiqin, Wang Yali, Qiao Yu. Learning attentive pairwise interaction for fine-grained classification[C] //Proc of the AAAI Conf on Artificial Intelligence. Menlo Park, CA: AAAI, 2020: 13130−1313
|
[33] |
Bera A, Wharton Z, Liu Yonghui, et al. SR-GNN: Spatial relation-aware graph neural network for fine-grained image categorization[J]. IEEE Transactions on Image Processing, 2022, 31: 6017−6031 doi: 10.1109/TIP.2022.3205215
|
[34] |
Luo Wei, Zhang Hengmin, Li Jun, et al. Learning semantically enhanced feature for fine-grained image classification[J]. IEEE Signal Processing Letters, 2020, 27: 1545−1549 doi: 10.1109/LSP.2020.3020227
|
[35] |
Luo Wei, Yang Xitong, Mo Xianjie, et al. Cross-x learning for fine-grained visual categorization[C] //Proc of the IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2019: 8242−8251
|
[36] |
Fu Jianlong, Zheng Heliang, Mei Tao. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition[C] //Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2017: 4438−4446
|
[37] |
Liu Chuanbin, Xie Hongtao, Zha Zhengjun, et al. Filtration and distillation: Enhancing region attention for fine-grained visual categorization[C] //Proc of the AAAI Conf on Artificial Intelligence. Menlo Park, CA: AAAI 2020: 11555−11562
|
[38] |
Dubey A, Gupta O, Raskar R, et al. Maximum-entropy fine grained classification[J]. Advances in Neural Information Processing Systems, 2018, 31: 635−645
|
[39] |
Lvaraju R R, Cogswell M, Das A, et al. Gradcam: Visual explanations from deep networks via gradientbased localization[C] //Proc of the 2017 IEEE Int Conf on Computer Vision. Los Alamitos, CA: IEEE Computer Society, 2017: 618−626
|
[40] |
余鹰,朱慧琳,钱进,等. 基于深度学习的人群计数研究综述[J]. 计算机研究与发展,2021,12:931−942
Yu Ying, Zhu Huilin, Qian Jin, et al. A review of research on population counting based on deep learning[J]. Journal of Computer Research and Development, 2021, 12: 931−942 (in Chinese)
|
[41] |
Chinbat V, Bae S H. GA3N: Generative adversarial AutoAugment network[J]. Pattern Recognition, 2022, 127: 108637 doi: 10.1016/j.patcog.2022.108637
|
[42] |
Cubuk E D, Zoph B, Shlens J, et al. RandAugment: Practical automated data augmentation with a reduced search space[C] //Proc of the IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops. Piscataway, NJ: IEEE, 2020: 702−703
|
[1] | Li Dongwen, Zhong Zhenyu, Sun Yufei, Shen Junyu, Ma Zizhi, Yu Chuanyue, Zhang Yuzhi. LingLong: A High-Quality Small-Scale Chinese Pre-trained Language Model[J]. Journal of Computer Research and Development, 2025, 62(3): 682-693. DOI: 10.7544/issn1000-1239.202330844 |
[2] | Jiang Yi, Yang Yong, Yin Jiali, Liu Xiaolei, Li Jiliang, Wang Wei, Tian Youliang, Wu Yingcai, Ji Shouling. A Survey on Security and Privacy Risks in Large Language Models[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440265 |
[3] | Zhang Naizhou, Cao Wei, Zhang Xiaojian, Li Shijun. Conversation Generation Based on Variational Attention Knowledge Selection and Pre-trained Language Model[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440551 |
[4] | Yi Xiaoyuan, Xie Xing. Unpacking the Ethical Value Alignment in Big Models[J]. Journal of Computer Research and Development, 2023, 60(9): 1926-1945. DOI: 10.7544/issn1000-1239.202330553 |
[5] | Feng Jun, Shi Yichen, Gao Yuhao, He Jingjing, Yu Zitong. Domain Adaptation for Face Anti-Spoofing Based on Dual Disentanglement and Liveness Feature Progressive Alignment[J]. Journal of Computer Research and Development, 2023, 60(8): 1727-1739. DOI: 10.7544/issn1000-1239.202330251 |
[6] | Zeng Weixin, Zhao Xiang, Tang Jiuyang, Tan Zhen, Wang Wei. Iterative Entity Alignment via Re-Ranking[J]. Journal of Computer Research and Development, 2020, 57(7): 1460-1471. DOI: 10.7544/issn1000-1239.2020.20190643 |
[7] | Shi Haihe, Zhou Weixing. Design and Implementation of Pairwise Sequence Alignment Algorithm Components Based on Dynamic Programming[J]. Journal of Computer Research and Development, 2019, 56(9): 1907-1917. DOI: 10.7544/issn1000-1239.2019.20180835 |
[8] | Jia Xibin, Jin Ya, Chen Juncheng. Domain Alignment Based on Multi-Viewpoint Domain-Shared Feature for Cross-Domain Sentiment Classification[J]. Journal of Computer Research and Development, 2018, 55(11): 2439-2451. DOI: 10.7544/issn1000-1239.2018.20170496 |
[9] | Wang Yuquan, Wen Lijie, Yan Zhiqiang. Alignment Based Conformance Checking Algorithm for BPMN 2.0 Model[J]. Journal of Computer Research and Development, 2017, 54(9): 1920-1930. DOI: 10.7544/issn1000-1239.2017.20160756 |
[10] | Zhuang Yan, Li Guoliang, Feng Jianhua. A Survey on Entity Alignment of Knowledge Base[J]. Journal of Computer Research and Development, 2016, 53(1): 165-192. DOI: 10.7544/issn1000-1239.2016.20150661 |