高级检索

    KCNN:一种神经网络轻量化方法和硬件实现架构

    KCNN: A Neural Network Lightweight Method and Hardware Architecture

    • 摘要: 卷积神经网络(convolutional neural network,CNN)已成为图像识别领域最重要的一项机器学习技术. 近年来,随着CNN在边缘端部署的需求越来越多,CNN的轻量化也成为研究热点. 主流的CNN轻量化方法包括剪枝和量化,这2项技术都能有效地减少CNN推导过程中计算和存储开销. 然而,这些方法未能完全挖掘CNN中的双边稀疏性(权重稀疏和激活值稀疏)和潜在的数据复用. 因此,为了解决上述问题,提出一种全新的神经网络轻量化方法,通过k-means算法对卷积核和特征图的非0值进行聚类,整个神经网络的推导过程中只使用有限的聚类值作为乘数去完成全部卷积计算. 与以往卷积层计算复杂度 O(n^3) 相比,轻量化处理后的卷积层计算复杂度仅为 O(n^2) ,大幅减少了计算量. 同时,将全连接层权重也进行非0值地聚类处理,片上只存储聚类值和对应的索引向量,极大地减少存储开销. 最后,针对该轻量化方法设计一种硬件实现架构KCNN. 该架构将CNN中的不同处理流程模块化实现,与以往的实现架构相比增加一个非0值聚类模块,此外还设计了一些缓存来利用聚类后CNN中的数据复用. 实验结果表明在不损失推导精度的情况下,AlexNet网络整体计算量减少66%,存储减少85%.

       

      Abstract: Convolutional neural network (CNN) has become one of the most important machine learning technologies in the field of image recognition. In recent years, with the increasing demand for CNN deployment at the mobile edge, the Lightweight of CNN has become a research hotspot. The mainstream CNN lightweight methods include pruning and quantization, both of which can effectively reduce the calculation and storage overhead for CNN inference. However, none of these methods fully exploits the bilateral sparsity (weight sparsity and activation sparsity) and potential data-reuse in CNN. Therefore, in order to solve these problems, this paper proposes a new neural network lightweight method, the k-means algorithm is used to cluster the non-zero values of convolution kernel and feature map, the CNN inference only uses limited cluster values as multipliers to complete all convolutional calculations. Compared with the computational complexity of original convolutional layer O(n^3) , the computational complexity of the convolutional layer after lightweight is O(n^2) , effectively reducing the amount of computation. Similarly, the non-zero weights of the fully connected layer are also clustered, and only the cluster values and corresponding index vectors are stored on chip, which significantly reduces the storage overhead. Finally, a customized architecture KCNN is designed for this lightweight method. The architecture modularizes the different processes of the CNN, and compared with the previous accelerator, a non-zero clustering module is added. In addition, some caches are added to make use of the data-reuse in the CNN after clustering. The experiment results show that without losing the derivation accuracy, the overall calculation of AlexNet network is reduced by 66%, and the storage is reduced by 85%.

       

    /

    返回文章
    返回