基于细粒度数据流架构的稀疏神经网络全连接层加速

向陶然; 叶笑春; 李文明; 冯煜晶; 谭旭; 张浩; 范东睿

doi:10.7544/issn1000-1239.2019.20190117

基于细粒度数据流架构的稀疏神经网络全连接层加速

向陶然^1,2,
叶笑春¹,
李文明¹,
冯煜晶^1,2,
谭旭^1,2,
张浩¹,
范东睿^1,2

¹(计算机体系结构国家重点实验室(中国科学院计算技术研究所) 北京 100190)
²(中国科学院大学北京 100049) (xiangtaoran@ict.ac.cn)

基金项目: 国家重点研发计划项目(2018YFB1003501)；国家自然科学基金项目(61732018,61872335,61802367)；中国科学院国际伙伴计划(171111KYSB20170032)；计算机体系结构国家重点实验室创新项目(CARCH3303,CARCH3407,CARCH3502,CARCH3505)

详细信息

中图分类号: TP387
计量
- 文章访问数: 1644
- HTML全文浏览量: 10
- PDF下载量: 792
出版历程
- 发布日期: 2019-05-31

Accelerating Fully Connected Layers of Sparse Neural Networks with Fine-Grained Dataflow Architectures

¹(State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190)
²(University of Chinese Academy of Sciences, Beijing 100049)

Funds: This work was supported by the National Key Research and Development Plan of China (2018YFB1003501), the National Natural Science Foundation of China (61732018, 61872335, 61802367), the International Partnership Program of Chinese Academy of Sciences (171111KYSB20170032), and the Innovation Project of the State Key Laboratory of Computer Architecture (CARCH3303, CARCH3407, CARCH3502, CARCH3505).

摘要

摘要: 深度神经网络(deep neural network, DNN)是目前最先进的图像识别算法，被广泛应用于人脸识别、图像识别、文字识别等领域.DNN具有极高的计算复杂性，为解决这个问题，近年来涌出了大量可以并行运算神经网络的硬件加速器.但是，DNN中的全连接层有大量的权重参数，对加速器的带宽提出了很高的要求.为了减轻加速器的带宽压力，一些DNN压缩算法被提出.然而基于FPGA和ASIC的DNN专用加速器，通常是通过牺牲硬件的灵活性获得更高的加速比和更低的能耗，很难实现稀疏神经网络的加速.而另一类基于CPU，GPU的CNN加速方案虽然较为灵活，但是带来很高的能耗.细粒度数据流体系结构打破了传统的控制流结构的限制，展示出了加速DNN的天然优势，它在提供高性能的运算能力的同时也保持了一定的灵活性.为此，提出了一种在基于细粒度数据流体系结构的硬件加速器上加速稀疏的DNN全连接层的方案.该方案相较于原有稠密的全连接层的计算减少了2.44×~ 6.17×的峰值带宽需求.此外细粒度数据流加速器在运行稀疏全连接层时的计算部件利用率远超过其他硬件平台对稀疏全连接层的实现，平均比CPU，GPU和mGPU分别高了43.15%，34.57%和44.24%.
- 细粒度数据流 /
- 稀疏神经网络 /
- 通用加速器 /
- 数据重用 /
- 高并行性
Abstract: Deep neural network (DNN) is a hot and state-of-the-art algorithm which is widely used in applications such as face recognition, intelligent monitoring, image recognition and text recognition. Because of its high computational complexity, many efficient hardware accelerators have been proposed to exploit high degree of parallel processing for DNN. However, the fully connected layers in DNN have a large number of weight parameters, which imposes high requirements on the bandwidth of the accelerator. In order to reduce the bandwidth pressure of the accelerator, some DNN compression algorithms are proposed. But accelerators which are implemented on FPGAs and ASICs usually sacrifice generality for higher performance and lower power consumption, making it difficult to accelerate sparse neural networks. Other accelerators, such as GPUs, are general enough, but they lead to higher power consumption. Fine-grained dataflow architectures, which break conventional Von Neumann architectures, show natural advantages in processing DNN-like algorithms with high computational efficiency and low power consumption. At the same time, it remains broadly applicable and adaptable. In this paper, we propose a scheme to accelerate the sparse DNN fully connected layers on a hardware accelerator based on fine-grained dataflow architecture. Compared with the original dense fully connected layers, the scheme reduces the peak bandwidth requirement of 2.44×~ 6.17×. In addition, the utilization of the computational resource of the fine-grained dataflow accelerator running the sparse fully-connected layers far exceeds the implementation by other hardware platforms, which is 43.15%, 34.57%, and 44.24% higher than the CPU, GPU, and mGPU, respectively.
- fine-grained dataflow /
- sparse neural network /
- general accelerator /
- data reuse /
- high parallel

HTML全文

参考文献(0)

施引文献(52)

期刊类型引用(20)

1.	徐宁，李静秋，王岚君，刘安安. 时序特性引导下的谣言事件检测方法评测. 南京大学学报(自然科学). 2025(01): 71-82 . 百度学术
2.	张元园，袁嘉霁. 基于社交媒体的谣言检测研究综述. 数据通信. 2024(01): 28-33 . 百度学术
3.	廖劲智，赵和伟，连小童，纪文亮，石海明，赵翔. 基于对比图学习的跨文档虚假信息检测. 计算机科学. 2024(03): 14-19 . 百度学术
4.	凤丽洲，刘馥榕，王友卫. 基于图卷积网络和注意力机制的谣言检测方法. 数据分析与知识发现. 2024(04): 125-136 . 百度学术
5.	王晰巍，孙哲，姜奕冰，李玥琪. 社交媒体网络辟谣回音室效应分析模型及实验研究. 现代情报. 2024(10): 3-17 . 百度学术
6.	朱奕，王根生，金文文，黄学坚，李胜. 基于文本语义增强和评论立场加权的网络谣言检测. 计算机科学与探索. 2024(12): 3311-3323 . 百度学术
7.	甘臣权，付祥，冯庆东，祝清意. 基于公共情感特征压缩与融合的轻量级图文情感分析模型. 计算机研究与发展. 2023(05): 1099-1110 . 本站查看
8.	聂大成，汪明达，刘世钰，杨慧，张翔，邱鸿杰. 在线社会网络虚假信息检测关键技术研究综述. 通信技术. 2023(04): 391-399 . 百度学术
9.	李卓远，李军. 基于对比学习的多模态注意力网络虚假信息检测方法. 中国科技论文. 2023(11): 1192-1197 . 百度学术
10.	强子珊，顾益军. 基于多模态异质图的社交媒体谣言检测模型. 数据分析与知识发现. 2023(11): 68-78 . 百度学术
11.	陈志毅，隋杰. 基于DeepFM和卷积神经网络的集成式多模态谣言检测方法. 计算机科学. 2022(01): 101-107 . 百度学术
12.	陆恒杨，范晨悠，吴小俊. 面向网络社交媒体的少样本新冠谣言检测. 中文信息学报. 2022(01): 135-144+172 . 百度学术
13.	唐樾，马静. 基于增强对抗网络和多模态融合的谣言检测方法. 情报科学. 2022(06): 108-114+131 . 百度学术
14.	王壮，隋杰. 基于多级融合的多模态谣言检测模型. 计算机工程与设计. 2022(06): 1756-1761 . 百度学术
15.	吴诗苑，董庆兴，宋志君，张斌. 社交媒体中错误信息的检测方法研究述评. 情报学报. 2022(06): 651-661 . 百度学术
16.	范伟，刘勇. 基于时空Transformer的社交网络信息传播预测. 计算机研究与发展. 2022(08): 1757-1769 . 本站查看
17.	姜梦函，李邵梅，吴子仪，张建朋. 多模态特征融合的中文谣言检测. 信息工程大学学报. 2022(04): 485-490 . 百度学术
18.	孟佳娜，王晓培，李婷，刘爽，赵迪. 基于对抗神经网络的跨模态谣言检测. 数据分析与知识发现. 2022(12): 32-42 . 百度学术
19.	徐铭达，张子柯，许小可. 基于模体度的社交网络虚假信息传播机制研究. 计算机研究与发展. 2021(07): 1425-1435 . 本站查看
20.	胡斗，卫玲蔚，周薇，淮晓永，韩冀中，虎嵩林. 一种基于多关系传播树的谣言检测方法. 计算机研究与发展. 2021(07): 1395-1411 . 本站查看