Optimum Research on Inner-Inst Memory Access Conflict for Dataflow Architecture
-
摘要: 神经网络等人工智能应用的迅速兴起给传统处理器的设计带来了巨大的挑战,粗粒度数据流架构因具有高指令并发和高通用性的特点成为研究热点.然而,由于粗粒度数据流结构处理单元采用随机访问存储器作为存储结构,加之神经网络中大部分运算数据具有密集型特点,造成大量的指令内操作数访存冲突.通过分析典型神经网络的访存行为,发现此类应用存在指令内操作数冲突,会引起计算部件利用率的降低.基于此分析,提出了灵活的数据冗余策略.在编译指令阶段,为指令内有访存冲突的操作数申请数据冗余空间,降低指令内操作数访存延迟.实验以典型的神经网络LeNet,AlexNet为基准测试程序.采用灵活的数据冗余策略之后,能耗比相对于Round-Robin和ReHash的无数据冗余策略分别提高了30.21%和12.37%,相比于2套全数据冗余策略能耗比提高了27.95%.Abstract: The rapid development of artificial intelligence application, such as neural network, image recognition and test recognition, brings huge challenges to traditional processors. Coarse-grained dataflow architectures become hotspot for AI application because it possesses the characteristic of high instruction-level parallelism. At the same time, it remains broadly applicable and adaptable. However, with processing elements of coarse-dataflow adapt random access memory as memory, combined with the property of intensive memory requirement of neural networks, there are lots of memory access conflicts in inner-inst. After analyzing the memory access behavior of AI applications, it is found that there are a large number of inner-inst memory access conflicts which greatly degrade the utilization of computing units. Based on this observation, in dataflow processors, a flexible data redundancy strategy (FRS) for inner-inst memory access conflict is proposed to allocate multi-storage for operand access requests which induce conflicts in inner-inst during compile stage. By using FRS, the number of conflicts in the RAM is effectively degraded. We use typical AI application benchmarks in the experiments, such as LeNet, AlexNet. The experimental results show that FRS improves power efficiency by 30.21% and 12.37% compared with Round-Robin none-data redundancy strategy and Re-Hash none-data redundancy strategy, and by 27.95% compared with 2 multi-data redundancy strategy.
-
-
期刊类型引用(11)
1. 肖宇庭,吕晓琪,谷宇,刘传强. 基于拆分残差网络的糖尿病视网膜病变分类. 广西师范大学学报(自然科学版). 2024(01): 91-101 . 百度学术
2. 吕德珍,赵玉,苗素琴. 基于分布式多节点医疗管理系统进程设计. 计算机与数字工程. 2024(02): 382-387 . 百度学术
3. 盛文娟,赖振谱,杨宁,Peng Gangding. 基于改进AdaBoost算法的可调谐F-P滤波器温漂补偿方法. 光学学报. 2023(03): 48-56 . 百度学术
4. 傅懋钟,胡海洋,李忠金. 面向GPU集群的动态资源调度方法. 计算机研究与发展. 2023(06): 1308-1321 . 本站查看
5. 杨小琴,朱玉全. 基于距离限定优化的多姿态人脸图像智能识别. 计算机仿真. 2022(01): 200-203+282 . 百度学术
6. 王昕. 梯度下降及优化算法研究综述. 电脑知识与技术. 2022(08): 71-73 . 百度学术
7. 赵永亮,于倩,邓博,韩丽君,高红梅. 基于博弈论及机器学习的最优化算法设计与仿真. 电子设计工程. 2022(13): 23-27 . 百度学术
8. 李晓锋,燕少飞,吴宸. 移动终端操作系统应用程序恶意检测系统技术研究. 电子技术与软件工程. 2022(17): 75-79 . 百度学术
9. 蒋平. 基于卷积神经网络的图像精度深度优化. 淮阴工学院学报. 2021(03): 30-34 . 百度学术
10. 杨国葳,李宏坤,张明亮,黄刚劲. 基于一维深度卷积自动编码器的刀具状态监测方法. 振动与冲击. 2021(21): 223-233+274 . 百度学术
11. 郑雯,沈琪浩,任佳. 基于Improved DR-Net算法的糖尿病视网膜病变识别与分级. 光学学报. 2021(22): 72-83 . 百度学术
其他类型引用(24)
计量
- 文章访问数: 984
- HTML全文浏览量: 5
- PDF下载量: 441
- 被引次数: 35