• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Xiang Taoran, Ye Xiaochun, Li Wenming, Feng Yujing, Tan Xu, Zhang Hao, Fan Dongrui. Accelerating Fully Connected Layers of Sparse Neural Networks with Fine-Grained Dataflow Architectures[J]. Journal of Computer Research and Development, 2019, 56(6): 1192-1204. DOI: 10.7544/issn1000-1239.2019.20190117
Citation: Xiang Taoran, Ye Xiaochun, Li Wenming, Feng Yujing, Tan Xu, Zhang Hao, Fan Dongrui. Accelerating Fully Connected Layers of Sparse Neural Networks with Fine-Grained Dataflow Architectures[J]. Journal of Computer Research and Development, 2019, 56(6): 1192-1204. DOI: 10.7544/issn1000-1239.2019.20190117

Accelerating Fully Connected Layers of Sparse Neural Networks with Fine-Grained Dataflow Architectures

Funds: This work was supported by the National Key Research and Development Plan of China (2018YFB1003501), the National Natural Science Foundation of China (61732018, 61872335, 61802367), the International Partnership Program of Chinese Academy of Sciences (171111KYSB20170032), and the Innovation Project of the State Key Laboratory of Computer Architecture (CARCH3303, CARCH3407, CARCH3502, CARCH3505).
More Information
  • Published Date: May 31, 2019
  • Deep neural network (DNN) is a hot and state-of-the-art algorithm which is widely used in applications such as face recognition, intelligent monitoring, image recognition and text recognition. Because of its high computational complexity, many efficient hardware accelerators have been proposed to exploit high degree of parallel processing for DNN. However, the fully connected layers in DNN have a large number of weight parameters, which imposes high requirements on the bandwidth of the accelerator. In order to reduce the bandwidth pressure of the accelerator, some DNN compression algorithms are proposed. But accelerators which are implemented on FPGAs and ASICs usually sacrifice generality for higher performance and lower power consumption, making it difficult to accelerate sparse neural networks. Other accelerators, such as GPUs, are general enough, but they lead to higher power consumption. Fine-grained dataflow architectures, which break conventional Von Neumann architectures, show natural advantages in processing DNN-like algorithms with high computational efficiency and low power consumption. At the same time, it remains broadly applicable and adaptable. In this paper, we propose a scheme to accelerate the sparse DNN fully connected layers on a hardware accelerator based on fine-grained dataflow architecture. Compared with the original dense fully connected layers, the scheme reduces the peak bandwidth requirement of 2.44×~ 6.17×. In addition, the utilization of the computational resource of the fine-grained dataflow accelerator running the sparse fully-connected layers far exceeds the implementation by other hardware platforms, which is 43.15%, 34.57%, and 44.24% higher than the CPU, GPU, and mGPU, respectively.
  • Related Articles

    [1]Han Songshen, Guo Songhui, Xu Kaiyong, Yang Bo, Yu Miao. Perturbation Analysis of the Vital Region in Speech Adversarial Example Based on Frame Structure[J]. Journal of Computer Research and Development, 2024, 61(3): 685-700. DOI: 10.7544/issn1000-1239.202221034
    [2]Li Ru, Wang Zhiqiang, Li Shuanghong, Liang Jiye, Collin Baker. Chinese Sentence Similarity Computing Based on Frame Semantic Parsing[J]. Journal of Computer Research and Development, 2013, 50(8): 1728-1736.
    [3]Zhou Jingang, Zhao Dazhe, Xu Li, Liu Jiren. Frame Refinement: Combining Frame-Based Software Development with Stepwise Refinement[J]. Journal of Computer Research and Development, 2013, 50(4): 711-721.
    [4]Zhang Yan, Yu Shengyang, Zhang Chongyang, Yang Jingyu. Extraction and Removal of Frame Line in Form Bill[J]. Journal of Computer Research and Development, 2008, 45(5): 909-914.
    [5]Mi Congjie, Liu Yang, and Xue Xiangyang. Video Texts Tracking and Segmentation Based on Multiple Frames[J]. Journal of Computer Research and Development, 2006, 43(9): 1523-1529.
    [6]Zhang Dongming, Shen Yanfei, Lin Shouxun, Zhang Yongdong. Low Complexity Mode Decision for H.264 Inter Frame Encoding[J]. Journal of Computer Research and Development, 2006, 43(9): 1516-1522.
    [7]Tang Yunting, Cheng Xianyi. The Studying of Frame APRF of Pattern-Recognition Based on Agent[J]. Journal of Computer Research and Development, 2006, 43(5): 867-873.
    [8]Wang Fangshi, Xu De, and Wu Weixin. A Cluster Algorithm of Automatic Key Frame Extraction Based on Adaptive Threshold[J]. Journal of Computer Research and Development, 2005, 42(10): 1752-1757.
    [9]Wang Rongrong, Jin Wanjun, and Wu Lide. A Novel Video Caption Detection Approach Using Multi-Frame Integration[J]. Journal of Computer Research and Development, 2005, 42(7): 1191-1197.
    [10]Zhang Chongyang, Chen Qiang, Lou Zhen, Yang Jingyu. A Form Frame Line Removal Algorithm Based on Gray-Level Image[J]. Journal of Computer Research and Development, 2005, 42(4): 635-639.
  • Cited by

    Periodical cited type(2)

    1. 谢景明,胡伟方,韩林,赵荣彩,荆丽娜. 基于“嵩山”超级计算机系统的量子傅里叶变换模拟. 计算机科学. 2021(12): 36-42 .
    2. Ze-yao MO. 超大规模并行计算:瓶颈与对策(英文). Frontiers of Information Technology & Electronic Engineering. 2018(10): 1251-1261 .

    Other cited types(1)

Catalog

    Article views (1650) PDF downloads (792) Cited by(3)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return