• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Xiang Taoran, Ye Xiaochun, Li Wenming, Feng Yujing, Tan Xu, Zhang Hao, Fan Dongrui. Accelerating Fully Connected Layers of Sparse Neural Networks with Fine-Grained Dataflow Architectures[J]. Journal of Computer Research and Development, 2019, 56(6): 1192-1204. DOI: 10.7544/issn1000-1239.2019.20190117
Citation: Xiang Taoran, Ye Xiaochun, Li Wenming, Feng Yujing, Tan Xu, Zhang Hao, Fan Dongrui. Accelerating Fully Connected Layers of Sparse Neural Networks with Fine-Grained Dataflow Architectures[J]. Journal of Computer Research and Development, 2019, 56(6): 1192-1204. DOI: 10.7544/issn1000-1239.2019.20190117

Accelerating Fully Connected Layers of Sparse Neural Networks with Fine-Grained Dataflow Architectures

Funds: This work was supported by the National Key Research and Development Plan of China (2018YFB1003501), the National Natural Science Foundation of China (61732018, 61872335, 61802367), the International Partnership Program of Chinese Academy of Sciences (171111KYSB20170032), and the Innovation Project of the State Key Laboratory of Computer Architecture (CARCH3303, CARCH3407, CARCH3502, CARCH3505).
More Information
  • Published Date: May 31, 2019
  • Deep neural network (DNN) is a hot and state-of-the-art algorithm which is widely used in applications such as face recognition, intelligent monitoring, image recognition and text recognition. Because of its high computational complexity, many efficient hardware accelerators have been proposed to exploit high degree of parallel processing for DNN. However, the fully connected layers in DNN have a large number of weight parameters, which imposes high requirements on the bandwidth of the accelerator. In order to reduce the bandwidth pressure of the accelerator, some DNN compression algorithms are proposed. But accelerators which are implemented on FPGAs and ASICs usually sacrifice generality for higher performance and lower power consumption, making it difficult to accelerate sparse neural networks. Other accelerators, such as GPUs, are general enough, but they lead to higher power consumption. Fine-grained dataflow architectures, which break conventional Von Neumann architectures, show natural advantages in processing DNN-like algorithms with high computational efficiency and low power consumption. At the same time, it remains broadly applicable and adaptable. In this paper, we propose a scheme to accelerate the sparse DNN fully connected layers on a hardware accelerator based on fine-grained dataflow architecture. Compared with the original dense fully connected layers, the scheme reduces the peak bandwidth requirement of 2.44×~ 6.17×. In addition, the utilization of the computational resource of the fine-grained dataflow accelerator running the sparse fully-connected layers far exceeds the implementation by other hardware platforms, which is 43.15%, 34.57%, and 44.24% higher than the CPU, GPU, and mGPU, respectively.
  • Related Articles

    [1]Du Ruizhong, Wang Yi, Li Mingyue. Dynamic Ciphertext Retrieval Scheme with Two-Way Verification[J]. Journal of Computer Research and Development, 2022, 59(11): 2635-2647. DOI: 10.7544/issn1000-1239.20210153
    [2]Guo Sixu, He Shen, Su Li, Zhang Xing, Zhou Fucai, Zhang Xinyue. Top-k Boolean Searchable Encryption Scheme Based on Multiple Keywords[J]. Journal of Computer Research and Development, 2022, 59(8): 1841-1852. DOI: 10.7544/issn1000-1239.20200605
    [3]Yan Xincheng, Chen Yue, Ba Yang, Jia Hongyong, Wang Zhonghui. Updatable Attribute-Based Encryption Scheme Supporting Dynamic Change of User Rights[J]. Journal of Computer Research and Development, 2020, 57(5): 1057-1069. DOI: 10.7544/issn1000-1239.2020.20190254
    [4]Liu Yining, Zhou Yuanjian, Lan Rushi, Tang Chunming. Blockchain-Based Verification Scheme for Deletion Operation in Cloud[J]. Journal of Computer Research and Development, 2018, 55(10): 2199-2207. DOI: 10.7544/issn1000-1239.2018.20180436
    [5]Wu Qiyu, Zhou Fucai, Wang Qiang, Li Yuxi. Publicly Verifiable Databases Scheme with Efficient Updates and Low Storage Overhead[J]. Journal of Computer Research and Development, 2018, 55(8): 1800-1808. DOI: 10.7544/issn1000-1239.2018.20170320
    [6]Bao Yibao, Yin Lihua, Fang Binxing, Guo Li. Logic-Based Dynamical Security Policy Language and Verification[J]. Journal of Computer Research and Development, 2013, 50(5): 932-941.
    [7]Liu Xiaozhu, Peng Zhiyong. On-Line Dynamic Index Hybrid Update Scheme Based on Self-Learning of Allocated Space[J]. Journal of Computer Research and Development, 2012, 49(10): 2118-2130.
    [8]Xiang Sen, Chen Yiyun, Lin Chunxiao, and Li Long. Safety Verification of Dynamic Storage Management in Coq[J]. Journal of Computer Research and Development, 2007, 44(2): 361-367.
    [9]Zhao Xin, Li Xiaojian, and Wu Wei. A Centralized Warrant Distributing and Updating Approach for Secure Multicast Group Control[J]. Journal of Computer Research and Development, 2006, 43(8): 1391-1397.
    [10]Li Guohui, Wang Hongya, Liu Yunsheng. Updates Dissemination in Mobile Real-Time Database Systems[J]. Journal of Computer Research and Development, 2005, 42(11): 2004-2009.

Catalog

    Article views (1645) PDF downloads (792) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return