KCNN: A Neural Network Lightweight Method and Hardware Implementation Architecture

Chen Guilin; Wang Guanwu; Wang Kang; Hu Minhui; Deng Junquan

doi:10.7544/issn1000-1239.202330409

Journal of Computer Research and Development > 2025 > 62(2): 532-541. > DOI: 10.7544/issn1000-1239.202330409 CSTR: 32373.14.issn1000-1239.202330409

Chen Guilin, Wang Guanwu, Wang Kang, Hu Minhui, Deng Junquan. KCNN: A Neural Network Lightweight Method and Hardware Implementation Architecture[J]. Journal of Computer Research and Development, 2025, 62(2): 532-541. DOI: 10.7544/issn1000-1239.202330409

Citation:

PDF (1836 KB)

KCNN: A Neural Network Lightweight Method and Hardware Implementation Architecture

The 63rd Research Institute, National University of Defense Technology, Nanjing 210007

Funds: This work was supported by the National Natural Science Foundation of China (61901497).

More Information

Author Bio:
Chen Guilin: born in 1994. PhD candidate, engineer. His main interests include architecture, and Chiplet and neural network accelerator. (cglnudt@163.com)

Wang Guanwu: born in 1986. PhD, assistant professor. His main research interests include architecture and system-on-chip design

Wang Kang: born in 1990. Master, assistant professor. His main research interests include architecture and system-on-chip design

Hu Minhui: born in 1994. PhD. Her main research interests include artificial intelligence and computer architecture

Deng Junquan: born in 1988. PhD, associate research fellow. His main research interest includes radio resource management, mobile positioning, and machine learning in wireless networks
Received Date: May 25, 2023
Revised Date: January 03, 2024
Available Online: December 11, 2024

Graphical Abstract

Abstract

Abstract

Convolutional neural network (CNN) has become one of the most important machine learning technologies in the field of image recognition. In recent years, with the increasing demand for CNN deployment at the mobile edge, the lightweight of CNN has become a research hotspot. The mainstream CNN lightweight methods include pruning and quantization, both of which can effectively reduce the calculation and storage overhead for CNN inference. However, none of these methods fully exploits the bilateral sparsity (weight sparsity and activation sparsity) and potential data-reuse in CNN. Therefore, in order to solve these problems, we propose a new neural network lightweight method, and the k-means algorithm is used to cluster the non-zero values of convolution kernel and feature map, and the CNN inference only uses limited cluster values as multipliers to complete all convolutional calculations. Compared with the computational complexity of original convolutional layer $O({n}^{3})$ , the computational complexity of the convolutional layer after lightweight is $O({n}^{2})$ , effectively reducing the amount of computation. Similarly, the non-zero weights of the fully connected layer are also clustered, and only the cluster values and corresponding index vectors are stored on chip, which significantly reduces the storage overhead. Finally, a customized architecture KCNN is designed for this lightweight method. The architecture modularizes the different processes of the CNN, and compared with the previous accelerator, a non-zero clustering module is added. In addition, some caches are added to make use of the data-reuse in the CNN after clustering. The experiment results show that without losing the derivation accuracy, the overall calculation of AlexNet network is reduced by 66%, and the storage expense is reduced by 85%.
- k-means,
- CNN,
- lightweight,
- acceleration,
- compression

FullText(HTML)

References (36)

References

[1]	Krizhevsky A, Sutskever I, Hinton G. ImageNet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012, 25(2): 1097−1105
[2]	Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional LSTM networks[C]//Proc of the IEEE Int Joint Conf on Neural Networks. Piscataway, NJ: IEEE, 2005: 2047−2052
[3]	Conneau A, Schwenk H, Barrault L, et al. Very deep convolutional networks for text classification[C]//Proc of the 15th Conf of the European Chapter of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2017: 1107−1116
[4]	Zhang Jiaqi, Chen Xiangru, Song Mingcong, et al. Eager pruning: Algorithm and architecture support for fast training of deep neural networks[C]//Proc of the 46th Annual Int Symp on Computer Architecture (ISCA). New York: ACM, 2019: 292−303
[5]	Yu J, Lukefahr A, Palframan D, et al. Scalpel: Customizing DNN pruning to the underlying hardware parallelism[C]//Proc of the 44th Int Symp on Computer Architecture (ISCA). New York: ACM, 2017: 548−560
[6]	Sharify S, Lascorz A D, Mahmoud M, et al. Laconic deep learning inference acceleration[C]//Proc of the 46th Int Symp Computer Architecture (ISCA). New York: ACM, 2019: 304−317
[7]	Wang Naigang, Choi J, Brand D, et al. Training deep neural networks with 8-bit floating point numbers[C]//Proc of the 32nd Int Conf on Neural Information Processing Systems (NIPS’19). New York: ACM, 2019: 7686−7695
[8]	Hoefler T, Dan A, Ben-Nun T, et al. Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks[J]. arXiv preprint, arXiv: 2102: 00554, 2021
[9]	Ding Caiwen, Liao Siyu, Wang Yanzhi, et al. CirCNN: Accelerating and compressing deep neural networks using block-circulant weight matrices[C]//Proc of the 50th Annual IEEE/ACM Int Symp on Microarchitecture. New York: ACM, 2017: 395–408
[10]	Guo Cong, Zheng Chen, Leng Jingwen, et al. ANT: Exploiting adaptive numerical data type for low-bit deep neural network quantization[C]//Proc of the 55th IEEE/ACM Int Symp on Microarchitecture (MICRO). Piscataway, NJ: IEEE, 2022: 1414−1433
[11]	Parashar A, Rhu M, Mukkara A, et al. SCNN: An accelerator for compressed-sparse convolutional neural networks[C]//Proc of the 44th Annual Int Symp on Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2017: 27−40
[12]	吴欣欣,欧焱,李文明,等. 基于粗粒度数据流架构的稀疏卷积神经网络加速[J]. 计算机研究与发展,2021,58(7):1504−1517 Wu Xinxin, Ou Yan, Li Wenming, et al. Acceleration of sparse convolutional neural network based on coarse-grained dataflow architecture[J]. Journal of Computer Research and Development, 2021, 58(7): 1504−1517 (in Chinese)
[13]	Han Song, Liu Xingyu, Mao Huizi, et al. EIE: Efficient inference engine on compressed deep neural network[C]//Proc of the 43rd Int Symp on Computer Architecture. New York: ACM, 2016: 243−254
[14]	Chen Yunji, Chen Tianshi, Du Zidong, et al. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning[C]//Proc of the 19th Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2014: 269−284
[15]	Du Zidong, Fasthuber R, Chen Tianshi, et al. ShiDianNao: Shifting vision processing closer to the sensor[C]//Proc of the 42nd Annual Int Symp on Computer Architecture. New York: ACM, 2015: 92–104
[16]	Farabet C, Martini B, Corda B, et al. NeuFlow: A runtime reconfigurable dataflow processor for vision[C]//Proc of the 29th Computer Vision and Pattern Recognition Workshops. Piscataway, NJ: IEEE, 2011: 109−116
[17]	Chen Y H, Emer J, Sze V. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks[C]//Proc of the 43rd Annual Int Symp on Computer Architecture. New York: ACM, 2016: 367–379
[18]	Sankaradas M, Jakkula V, Cadambi S, et al. A massively parallel coprocessor for convolutional neural networks[C]//Proc of the 20th IEEE Int Conf on Application-Specific Systems, Architectures and Processors. Piscataway, NJ: IEEE, 2009: 53−60
[19]	Sriram V, Cox D, Tsoi K H, et al. Towards an embedded biologically-inspired machine vision processor[C]//Proc of the 9th Int Conf on Field-Programmable Technology. Piscataway, NJ: IEEE, 2011: 273−278
[20]	Chakradhar S, Sankaradas M, Jakkula V, et al. A dynamically configurable coprocessor for convolutional neural networks[C]//Proc of the 38th Int Symp on Computer Architecture. New York: ACM, 2010: 247−257
[21]	Edward H, Li Shiyu, Hai H L, et al. Cascading structured pruning: Enabling high data reuse for sparse DNN accelerators [C]//Proc of the 49th Annual Int Symp on Computer Architecture (ISCA 2022). New York: ACM, 2022: 522−535
[22]	Li Gang, Xu Weixiang, Song Zhuoran, et al. Ristretto: An atomized processing architecture for sparsity-condensed stream flow in CNN[C]//Proc of the 55th IEEE/ACM Int Symp on Microarchitecture (MICRO). Piscataway, NJ: IEEE, 2022: 1434−1450
[23]	陈桂林,马胜,郭阳. 硬件加速神经网络综述[J]. 计算机研究与发展,2019,56(2):240−253 Chen Guilin, Ma Sheng, Guo Yang. Survey on accelerating neural network with hardware[J]. Journal of Computer Research and Development, 2019, 56(2): 240−253(in Chinese)
[24]	纪荣嵘,林绍辉,晁飞,等. 深度神经网络压缩与加速综述[J]. 计算机研究与发展,2018,55(9):1871−1888 Ji Rongrong, Lin Shaohui, Chao Fei, et al. Deep neural network compression and acceleration: A review[J]. Journal of Computer Research and Development, 2018, 55(9): 1871−1888 (in Chinese)
[25]	Jouppi N P, Young C, Patil N, et al. In-datacenter performance analysis of a tensor processing unit[C]//Proc of the 44th Annual Int Symp on Computer Architecture. New York: ACM, 2017: 1−12
[26]	Reagen B, Whatmough P, Adolf R, et al. Minerva: Enabling low-power, highly-accurate deep neural network accelerators[C]//Proc of the 43rd Annual Int Symp on Computer Architecture (ISCA). New York: ACM, 2016: 267−278
[27]	Chen Guilin, Wang Guanwu, Ju Jian, et al. Research on the influence of k-means cluster preprocessing on adversarial images[C]//Proc of the 7th Int Conf on Information Technology. New York: ACM, 2019: 248−252
[28]	Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks[J]. arXiv preprint, arXiv: 1312.6199, 2013
[29]	Xu Dongkuan, Tian Yingjie. A comprehensive survey of clustering algorithms[J]. Annals of Data Science, 2015, 2(2): 165−193 doi: 10.1007/s40745-015-0040-1
[30]	Macqueen J. Some methods for classification and analysis of multivariate observations[C]//Proc of the 5th Berkeley Symp on Mathematical Statistics and Probability. Oakland, CA: University of California, 1966: 281−297
[31]	陈桂林,马胜,郭阳,等. 通过k-means算法实现神经网络的加速和压缩[J]. 计算机工程与科学,2019,41(5):796−803 Chen Guilin, Ma Sheng, Guo Yang. Towards convolutional neural network acceleration and compression via k-means cluster[J]. Computer Engineering & Science, 2019, 41(5): 796−803 (in Chinese)
[32]	Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images[J]. Handbook of Systemic Autoimmune Diseases, 2009, 1(4): 1−60
[33]	Li Zhen, Wang Yuqing, Zhi Tian, et al. A survey of neural network accelerators[J]. Frontiers of Computer Science, 2017, 11(5): 746−761 doi: 10.1007/s11704-016-6159-1
[34]	Yang Zichao, Moczulski M, Denil M, et al. Deep fried convnets[C]//Proc of the IEEE Int Conf on Computer Vision (ICCV). Los Alamitos, CA: IEEE Computer Society, 2015: 1476−1483
[35]	Denton E, Zaremba W, Bruna J, et al. Exploiting linear structure within convolutional networks for efficient evaluation[C]//Proc of the 27th Int Conf on Neural Information Processing Systems (NIPS’14). New York: ACM, 2014: 1269–1277
[36]	Srinivas S, Babu R V. Data-free parameter pruning for deep neural networks[J]. arXiv preprint, arXiv: 1507.06149, 2015

Cited By

Cited by

Periodical cited type(7)

1.	马辉，王瑞琴，杨帅. 一种渐进式增长条件生成对抗网络模型. 电信科学. 2023(06): 105-113 .
2.	杨华芬. 云存储环境下大数据实时动态迁移算法研究. 机械设计与制造工程. 2021(02): 117-122 .
3.	何少芳，沈陆明，谢红霞. 生成式对抗网络的土壤有机质高光谱估测模型. 光谱学与光谱分析. 2021(06): 1905-1911 .
4.	卢锦玲，张梦雪，郭鲁豫. 基于GAN的不平衡负荷数据类型辨识方法. 电力科学与工程. 2021(06): 26-34 .
5.	刘言林. 基于条件生成对抗网络的小样本机器学习数据处理算法研究. 宁夏师范学院学报. 2021(10): 66-73 .
6.	杨彦荣，宋荣杰，周兆永. 基于GAN-PSO-ELM的网络入侵检测方法. 计算机工程与应用. 2020(12): 66-72 .
7.	金秋，林馥. 定向网络中隐藏可逆数据的分层追踪算法. 计算机仿真. 2020(10): 226-229+277 .