ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2021, Vol. 58 ›› Issue (4): 862-887.doi: 10.7544/issn1000-1239.2021.20200110

• 系统结构 • 上一篇    下一篇

图计算加速架构综述

严明玉1,2,3,李涵1,2,邓磊3,胡杏3,叶笑春1,张志敏1,范东睿1,2,谢源3   

  1. 1(计算机体系结构国家重点实验室(中国科学院计算技术研究所) 北京 100190);2(中国科学院大学 北京 100049);3(美国加州大学圣塔芭芭拉分校 美国加利福利亚州圣塔芭芭拉 93106) (yanmingyu@ict.ac.cn)
  • 出版日期: 2021-04-01
  • 基金资助: 
    国家重点研发计划项目(2018YFB1003501);国家自然科学基金项目(61732018,61872335,61802367,61672499);中国科学院战略性先导科技专项(C类)(XDC05000000);数学工程与先进计算国家重点实验室开放基金(2019A07)

A Survey on Graph Processing Accelerators

Yan Mingyu1,2,3, Li Han1,2, Deng Lei3, Hu Xing3, Ye Xiaochun1, Zhang Zhimin1, Fan Dongrui1,2, Xie Yuan3   

  1. 1(State Key Laboratory of Computer Architecture (Institute of Computing Technology,Chinese Academy of Sciences),Beijing 100190);2(University of Chinese Academy of Sciences,Beijing 100049);3(University of California at Santa Barbara,Santa Barbara,California,USA 93106)
  • Online: 2021-04-01
  • Supported by: 
    This work was supported by the National Key Research and Development Plan of China (2018YFB1003501), the National Natural Science Foundation of China (61732018, 61872335, 61802367, 61672499), the Strategic Priority Research Program of Chinese Academy of Sciences (XDC05000000), and the Open Project Program of the State Key Laboratory of Mathematical Engineering and Advanced Computing (2019A07).

摘要: 在大数据时代,图被用于各种领域表示具有复杂联系的数据.图计算应用被广泛用于各种领域,以挖掘图数据中潜在的价值.图计算应用特有的不规则执行行为,引发了不规则负载、密集读改写更新操作、不规则访存和不规则通信等挑战.现有通用架构无法有效地应对上述挑战.为了克服加速图计算应用面临的挑战,大量的图计算硬件加速架构设计被提出.它们为图计算应用定制了专用的计算流水线、访存子系统、存储子系统和通信子系统.得益于这些定制的硬件设计,图计算加速架构相比于传统的通用处理器架构,在性能和能效上均取得了显著的提升.为了让相关的研究学者深入了解图计算硬件加速架构,首先基于计算机的金字塔组织结构,从上到下对现有工作进行分类和总结,并以多个完整架构实例分析应用于不同层次的优化技术之间的关系.接着以图神经网络加速架构的具体案例讨论新兴图计算应用的加速架构设计.最后对该领域的前沿研究方向进行了总结,并放眼于未来探讨图计算加速架构的发展趋势.

关键词: 图计算, 图神经网络, 加速架构, 不规则访存, 数据局部性, 动态访存调度, 负载均衡

Abstract: In the big data era, graphs are used as effective representations of data with the complex relationship in many scenarios. Graph processing applications are widely used in various fields to dig out the potential value of graph data. The irregular execution pattern of graph processing applications introduces irregular workload, intensive read-modify-write updates, irregular memory accesses, and irregular communications. Existing general architectures cannot effectively handle the above challenges. In order to overcome these challenges, a large number of graph processing accelerator designs have been proposed. They tailor the computation pipeline, memory subsystem, storage subsystem, and communication subsystem to the graph processing application. Thanks to these hardware customizations, graph processing accelerators have achieved significant improvements in performance and energy efficiency compared with the state-of-the-art software frameworks running on general architectures. In order to allow the related researchers to have a comprehensive understanding of the graph processing accelerator, this paper first classifies and summarizes customized designs of existing work based on the computer’s pyramid organization structure from top to bottom. This article then discusses the accelerator design of the emerging graph processing application (i.e., graph neural network) with specific graph neural network accelerator cases. In the end, this article discusses the future design trend of the graph processing accelerator.

Key words: graph processing, graph neural network, accelerator, irregular memory access, data locality, dynamic data access scheduling, workload balance

中图分类号: