Abstract:
Convolutional neural network (CNN) has become one of the most important machine learning technologies in the field of image recognition. In recent years, with the increasing demand for CNN deployment at the mobile edge, the Lightweight of CNN has become a research hotspot. The mainstream CNN lightweight methods include pruning and quantization, both of which can effectively reduce the calculation and storage overhead for CNN inference. However, none of these methods fully exploits the bilateral sparsity (weight sparsity and activation sparsity) and potential data-reuse in CNN. Therefore, in order to solve these problems, this paper proposes a new neural network lightweight method, the
k-means algorithm is used to cluster the non-zero values of convolution kernel and feature map, the CNN inference only uses limited cluster values as multipliers to complete all convolutional calculations. Compared with the computational complexity of original convolutional layer O(n^3) , the computational complexity of the convolutional layer after lightweight is O(n^2)
, effectively reducing the amount of computation. Similarly, the non-zero weights of the fully connected layer are also clustered, and only the cluster values and corresponding index vectors are stored on chip, which significantly reduces the storage overhead. Finally, a customized architecture KCNN is designed for this lightweight method. The architecture modularizes the different processes of the CNN, and compared with the previous accelerator, a non-zero clustering module is added. In addition, some caches are added to make use of the data-reuse in the CNN after clustering. The experiment results show that without losing the derivation accuracy, the overall calculation of AlexNet network is reduced by 66%, and the storage is reduced by 85%.