ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2021, Vol. 58 ›› Issue (7): 1504-1517.doi: 10.7544/issn1000-1239.2021.20200112

Previous Articles     Next Articles

Acceleration of Sparse Convolutional Neural Network Based on Coarse-Grained Dataflow Architecture

Wu Xinxin1,2,3, Ou Yan1,2,3, Li Wenming1,2, Wang Da1,2, Zhang Hao1,2, Fan Dongrui1,2,3   

  1. 1(State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190);2(Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190);3(School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049)
  • Online:2021-07-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61732018, 61872335, 61802367, 61672499), the Strategic Priority Research Program of Chinese Academy of Sciences (XDC05000000), the International Partnership Program of Chinese Academy of Sciences (171111KYSB20170032), and the Innovation Project of the State Key Laboratory of Computer Architecture (CARCH4408, CARCH4412).

Abstract: Convolutional neural network (CNN) achieves good performance in image processing, speech recognition, natural language processing and other fields. Large-scale neural network models often encounter resource constraints such as computing and storage. The emergence of sparse neural networks effectively relieves the need for computing and storage. Although existing domain-specific accelerators can effectively handle sparse networks, they achieve high energy efficiency through tight coupling of algorithms and structures, and lose the flexibility of the structure. The coarse-grained dataflow architecture can implement different neural network applications through flexible instruction scheduling. Based on this architecture, the regular computing characteristics of dense convolution allow different channels to share the same set of instruction to execute. However, there are sparse weights in sparse networks, making these instructions have 0-value-related invalid instructions, which makes the existing instruction execution method cannot automatically skip them, resulting in invalid calculations. At the same time, when executing an irregular sparse network, existing instruction mapping methods cause an unbalanced load on the computing array. These problems hinder the improvement of sparse network performance. In this paper, based on the premise that different channels share a set of instructions, we add an instruction control unit based on the data and instruction characteristics of the sparse network to achieve detection and skipping of 0-value related instructions in the weight data, while using the load balanced instruction mapping algorithm to solve the problem of uneven instruction execution in sparse networks. Experiments show that compared with dense networks, sparse networks achieve an average performance increase of 1.55X and an energy reduction of 63.77%. In addition, it achieves 2.39X(Alexnet), 2.28X(VGG16) and 1.14X(Alexnet), 1.23X(VGG16) speedup over GPU (cuSparse) and Cambricon-X, respectively.

Key words: domain-specific accelerator, coarse-grained dataflow, sparse convolutional neural network, instruction mapping, instruction control

CLC Number: