ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2019, Vol. 56 ›› Issue (6): 1192-1204.

Special Issue: 2019面向人工智能的计算机体系结构专题

### Accelerating Fully Connected Layers of Sparse Neural Networks with Fine-Grained Dataflow Architectures

Xiang Taoran1,2, Ye Xiaochun1, Li Wenming1, Feng Yujing1,2, Tan Xu1,2 , Zhang Hao1， Fan Dongrui1,2

1. 1（State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190）；2（University of Chinese Academy of Sciences, Beijing 100049)
• Online:2019-06-01
• Supported by:
This work was supported by the National Key Research and Development Plan of China (2018YFB1003501), the National Natural Science Foundation of China (61732018, 61872335, 61802367), the International Partnership Program of Chinese Academy of Sciences (171111KYSB20170032), and the Innovation Project of the State Key Laboratory of Computer Architecture (CARCH3303, CARCH3407, CARCH3502, CARCH3505).

Abstract: Deep neural network (DNN) is a hot and state-of-the-art algorithm which is widely used in applications such as face recognition, intelligent monitoring, image recognition and text recognition. Because of its high computational complexity, many efficient hardware accelerators have been proposed to exploit high degree of parallel processing for DNN. However, the fully connected layers in DNN have a large number of weight parameters, which imposes high requirements on the bandwidth of the accelerator. In order to reduce the bandwidth pressure of the accelerator, some DNN compression algorithms are proposed. But accelerators which are implemented on FPGAs and ASICs usually sacrifice generality for higher performance and lower power consumption, making it difficult to accelerate sparse neural networks. Other accelerators, such as GPUs, are general enough, but they lead to higher power consumption. Fine-grained dataflow architectures, which break conventional Von Neumann architectures, show natural advantages in processing DNN-like algorithms with high computational efficiency and low power consumption. At the same time, it remains broadly applicable and adaptable. In this paper, we propose a scheme to accelerate the sparse DNN fully connected layers on a hardware accelerator based on fine-grained dataflow architecture. Compared with the original dense fully connected layers, the scheme reduces the peak bandwidth requirement of 2.44×~ 6.17×. In addition, the utilization of the computational resource of the fine-grained dataflow accelerator running the sparse fully-connected layers far exceeds the implementation by other hardware platforms, which is 43.15%, 34.57%, and 44.24% higher than the CPU, GPU, and mGPU, respectively.

CLC Number: