Citation: | Chen Guilin, Wang Guanwu, Wang Kang, Hu Minhui, Deng Junquan. KCNN: A Neural Network Lightweight Method and Hardware Implementation Architecture[J]. Journal of Computer Research and Development, 2025, 62(2): 532-541. DOI: 10.7544/issn1000-1239.202330409 |
Convolutional neural network (CNN) has become one of the most important machine learning technologies in the field of image recognition. In recent years, with the increasing demand for CNN deployment at the mobile edge, the lightweight of CNN has become a research hotspot. The mainstream CNN lightweight methods include pruning and quantization, both of which can effectively reduce the calculation and storage overhead for CNN inference. However, none of these methods fully exploits the bilateral sparsity (weight sparsity and activation sparsity) and potential data-reuse in CNN. Therefore, in order to solve these problems, we propose a new neural network lightweight method, and the k-means algorithm is used to cluster the non-zero values of convolution kernel and feature map, and the CNN inference only uses limited cluster values as multipliers to complete all convolutional calculations. Compared with the computational complexity of original convolutional layer O(n3), the computational complexity of the convolutional layer after lightweight is O(n2), effectively reducing the amount of computation. Similarly, the non-zero weights of the fully connected layer are also clustered, and only the cluster values and corresponding index vectors are stored on chip, which significantly reduces the storage overhead. Finally, a customized architecture KCNN is designed for this lightweight method. The architecture modularizes the different processes of the CNN, and compared with the previous accelerator, a non-zero clustering module is added. In addition, some caches are added to make use of the data-reuse in the CNN after clustering. The experiment results show that without losing the derivation accuracy, the overall calculation of AlexNet network is reduced by 66%, and the storage expense is reduced by 85%.
[1] |
Krizhevsky A, Sutskever I, Hinton G. ImageNet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012, 25(2): 1097−1105
|
[2] |
Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional LSTM networks[C]//Proc of the IEEE Int Joint Conf on Neural Networks. Piscataway, NJ: IEEE, 2005: 2047−2052
|
[3] |
Conneau A, Schwenk H, Barrault L, et al. Very deep convolutional networks for text classification[C]//Proc of the 15th Conf of the European Chapter of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2017: 1107−1116
|
[4] |
Zhang Jiaqi, Chen Xiangru, Song Mingcong, et al. Eager pruning: Algorithm and architecture support for fast training of deep neural networks[C]//Proc of the 46th Annual Int Symp on Computer Architecture (ISCA). New York: ACM, 2019: 292−303
|
[5] |
Yu J, Lukefahr A, Palframan D, et al. Scalpel: Customizing DNN pruning to the underlying hardware parallelism[C]//Proc of the 44th Int Symp on Computer Architecture (ISCA). New York: ACM, 2017: 548−560
|
[6] |
Sharify S, Lascorz A D, Mahmoud M, et al. Laconic deep learning inference acceleration[C]//Proc of the 46th Int Symp Computer Architecture (ISCA). New York: ACM, 2019: 304−317
|
[7] |
Wang Naigang, Choi J, Brand D, et al. Training deep neural networks with 8-bit floating point numbers[C]//Proc of the 32nd Int Conf on Neural Information Processing Systems (NIPS’19). New York: ACM, 2019: 7686−7695
|
[8] |
Hoefler T, Dan A, Ben-Nun T, et al. Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks[J]. arXiv preprint, arXiv: 2102: 00554, 2021
|
[9] |
Ding Caiwen, Liao Siyu, Wang Yanzhi, et al. CirCNN: Accelerating and compressing deep neural networks using block-circulant weight matrices[C]//Proc of the 50th Annual IEEE/ACM Int Symp on Microarchitecture. New York: ACM, 2017: 395–408
|
[10] |
Guo Cong, Zheng Chen, Leng Jingwen, et al. ANT: Exploiting adaptive numerical data type for low-bit deep neural network quantization[C]//Proc of the 55th IEEE/ACM Int Symp on Microarchitecture (MICRO). Piscataway, NJ: IEEE, 2022: 1414−1433
|
[11] |
Parashar A, Rhu M, Mukkara A, et al. SCNN: An accelerator for compressed-sparse convolutional neural networks[C]//Proc of the 44th Annual Int Symp on Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2017: 27−40
|
[12] |
吴欣欣,欧焱,李文明,等. 基于粗粒度数据流架构的稀疏卷积神经网络加速[J]. 计算机研究与发展,2021,58(7):1504−1517
Wu Xinxin, Ou Yan, Li Wenming, et al. Acceleration of sparse convolutional neural network based on coarse-grained dataflow architecture[J]. Journal of Computer Research and Development, 2021, 58(7): 1504−1517 (in Chinese)
|
[13] |
Han Song, Liu Xingyu, Mao Huizi, et al. EIE: Efficient inference engine on compressed deep neural network[C]//Proc of the 43rd Int Symp on Computer Architecture. New York: ACM, 2016: 243−254
|
[14] |
Chen Yunji, Chen Tianshi, Du Zidong, et al. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning[C]//Proc of the 19th Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2014: 269−284
|
[15] |
Du Zidong, Fasthuber R, Chen Tianshi, et al. ShiDianNao: Shifting vision processing closer to the sensor[C]//Proc of the 42nd Annual Int Symp on Computer Architecture. New York: ACM, 2015: 92–104
|
[16] |
Farabet C, Martini B, Corda B, et al. NeuFlow: A runtime reconfigurable dataflow processor for vision[C]//Proc of the 29th Computer Vision and Pattern Recognition Workshops. Piscataway, NJ: IEEE, 2011: 109−116
|
[17] |
Chen Y H, Emer J, Sze V. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks[C]//Proc of the 43rd Annual Int Symp on Computer Architecture. New York: ACM, 2016: 367–379
|
[18] |
Sankaradas M, Jakkula V, Cadambi S, et al. A massively parallel coprocessor for convolutional neural networks[C]//Proc of the 20th IEEE Int Conf on Application-Specific Systems, Architectures and Processors. Piscataway, NJ: IEEE, 2009: 53−60
|
[19] |
Sriram V, Cox D, Tsoi K H, et al. Towards an embedded biologically-inspired machine vision processor[C]//Proc of the 9th Int Conf on Field-Programmable Technology. Piscataway, NJ: IEEE, 2011: 273−278
|
[20] |
Chakradhar S, Sankaradas M, Jakkula V, et al. A dynamically configurable coprocessor for convolutional neural networks[C]//Proc of the 38th Int Symp on Computer Architecture. New York: ACM, 2010: 247−257
|
[21] |
Edward H, Li Shiyu, Hai H L, et al. Cascading structured pruning: Enabling high data reuse for sparse DNN accelerators [C]//Proc of the 49th Annual Int Symp on Computer Architecture (ISCA 2022). New York: ACM, 2022: 522−535
|
[22] |
Li Gang, Xu Weixiang, Song Zhuoran, et al. Ristretto: An atomized processing architecture for sparsity-condensed stream flow in CNN[C]//Proc of the 55th IEEE/ACM Int Symp on Microarchitecture (MICRO). Piscataway, NJ: IEEE, 2022: 1434−1450
|
[23] |
陈桂林,马胜,郭阳. 硬件加速神经网络综述[J]. 计算机研究与发展,2019,56(2):240−253
Chen Guilin, Ma Sheng, Guo Yang. Survey on accelerating neural network with hardware[J]. Journal of Computer Research and Development, 2019, 56(2): 240−253(in Chinese)
|
[24] |
纪荣嵘,林绍辉,晁飞,等. 深度神经网络压缩与加速综述[J]. 计算机研究与发展,2018,55(9):1871−1888
Ji Rongrong, Lin Shaohui, Chao Fei, et al. Deep neural network compression and acceleration: A review[J]. Journal of Computer Research and Development, 2018, 55(9): 1871−1888 (in Chinese)
|
[25] |
Jouppi N P, Young C, Patil N, et al. In-datacenter performance analysis of a tensor processing unit[C]//Proc of the 44th Annual Int Symp on Computer Architecture. New York: ACM, 2017: 1−12
|
[26] |
Reagen B, Whatmough P, Adolf R, et al. Minerva: Enabling low-power, highly-accurate deep neural network accelerators[C]//Proc of the 43rd Annual Int Symp on Computer Architecture (ISCA). New York: ACM, 2016: 267−278
|
[27] |
Chen Guilin, Wang Guanwu, Ju Jian, et al. Research on the influence of k-means cluster preprocessing on adversarial images[C]//Proc of the 7th Int Conf on Information Technology. New York: ACM, 2019: 248−252
|
[28] |
Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks[J]. arXiv preprint, arXiv: 1312.6199, 2013
|
[29] |
Xu Dongkuan, Tian Yingjie. A comprehensive survey of clustering algorithms[J]. Annals of Data Science, 2015, 2(2): 165−193 doi: 10.1007/s40745-015-0040-1
|
[30] |
Macqueen J. Some methods for classification and analysis of multivariate observations[C]//Proc of the 5th Berkeley Symp on Mathematical Statistics and Probability. Oakland, CA: University of California, 1966: 281−297
|
[31] |
陈桂林,马胜,郭阳,等. 通过k-means算法实现神经网络的加速和压缩[J]. 计算机工程与科学,2019,41(5):796−803
Chen Guilin, Ma Sheng, Guo Yang. Towards convolutional neural network acceleration and compression via k-means cluster[J]. Computer Engineering & Science, 2019, 41(5): 796−803 (in Chinese)
|
[32] |
Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images[J]. Handbook of Systemic Autoimmune Diseases, 2009, 1(4): 1−60
|
[33] |
Li Zhen, Wang Yuqing, Zhi Tian, et al. A survey of neural network accelerators[J]. Frontiers of Computer Science, 2017, 11(5): 746−761 doi: 10.1007/s11704-016-6159-1
|
[34] |
Yang Zichao, Moczulski M, Denil M, et al. Deep fried convnets[C]//Proc of the IEEE Int Conf on Computer Vision (ICCV). Los Alamitos, CA: IEEE Computer Society, 2015: 1476−1483
|
[35] |
Denton E, Zaremba W, Bruna J, et al. Exploiting linear structure within convolutional networks for efficient evaluation[C]//Proc of the 27th Int Conf on Neural Information Processing Systems (NIPS’14). New York: ACM, 2014: 1269–1277
|
[36] |
Srinivas S, Babu R V. Data-free parameter pruning for deep neural networks[J]. arXiv preprint, arXiv: 1507.06149, 2015
|