面向高通量计算机的图算法优化技术

张承龙; 曹华伟; 王国波; 郝沁汾; 张洋; 叶笑春; 范东睿

doi:10.7544/issn1000-1239.2020.20200115

面向高通量计算机的图算法优化技术

¹(计算机体系结构国家重点实验室(中国科学院计算技术研究所) 北京 100190)
²(中国科学院大学计算机与控制学院北京 100049) (caohuawei@ict.ac.cn)

基金项目: 国家重点研发计划项目(2018YFB1003501)；国家自然科学基金项目(11904370，61732018，61672499)；计算机体系结构国家重点实验室创新项目(CARCH4509)

详细信息

中图分类号: TP301
计量
- 文章访问数: 1502
- HTML全文浏览量: 3
- PDF下载量: 693
出版历程
- 发布日期: 2020-05-31

Efficient Optimization of Graph Computing on High-Throughput Computer

¹(State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190)
²(School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing 100049)

Funds: This work was supported by the National Key Research and Development Program of China (2018YFB1003501), the National Natural Science Foundation of China (11904370, 61732018, 61672499), and the Innovation Project of the State Key Laboratory of Computer Architecture (CARCH4509).

摘要

摘要: 随着互联网技术的蓬勃发展，图数据的规模呈爆炸式增长.如何高效地处理大规模图数据逐渐成为工业界和学术界关注的焦点.宽度优先搜索算法是解决图遍历问题的经典算法，也是Graph500基准的核心测试程序之一.高通量计算机采用ARM架构的众核体系结构，具有高并发、强实时、低功耗等适于大数据计算的特点.在单节点上，BFS算法的优化已取得一系列进展，首先对现有的优化技术进行系统的介绍，并在此基础上提出2种面向高通量计算机的优化手段，通过减少冗余访存和提高缓存局部性，有效提高了算法的访存效率.通过这些优化手段，在高通量计算机上对BFS算法的性能进行了系统的评估.对于顶点规模为230的Kronecker图(顶点数为230，边数为234)，优化后的BFS算法在高通量计算机上的平均性能为24.26 GTEPS.与两路x86架构服务器相比，单节点具有1.18倍的性能优势.在性能功耗比方面，高通量计算机的结果为181.04 MTEPS/W.在2019年6月份的Green Graph500面向大数据集的排行榜上取得第2名的成绩.综上，高通量计算机的高并发和低功耗等特点非常适合处理大规模图计算等数据密集型应用.
- 宽度优先搜索 /
- 高通量 /
- Graph500 /
- 图算法 /
- 超算
Abstract: With the rapid development of computing technology, the scale of graph increases explosively and large-scale graph computing has been the focus in recent years. Breadth first search (BFS) is a classic algorithm to solve graph traverse problem. It is the main kernel of Graph500 benchmark that evaluates the performance of supercomputers and servers in terms of data-intensive applications. High-throughput computer (HTC) adopts ARM-based many-core architecture, which has the characteristics of high concurrency, strong real-time, low-power consumption. The optimization of BFS algorithm has made a series of progress on single-node systems. In this paper, we first introduce parallel BFS algorithm and existing optimizations. Then we propose two optimization techniques for HTC to improve the efficiency of data access and data locality. We systematically evaluate the performance of BFS algorithm on HTC. For the Kronecker graph with 2scale=230whose vertices are 230 and edges are 234, the average performance on HTC is 24.26 GTEPS and 1.18 times faster than the two-way x86 server. In terms of energy efficiency, the result on HTC is 181.04 MTEPS/W and rank 2nd place on the June 2019 Green Graph500 big data list. To our best knowledge, this is the first work that evaluates BFS performance on HTC platform. HTC is suitable for data intensive applications such as large-scale graph computing.
- breadth first search (BFS) /
- high throughput /
- Graph500 /
- graph algorithm /
- super computing

HTML全文

参考文献(0)

施引文献(12)

期刊类型引用(10)

1.	张园，曹华伟，张婕，申玥，孙一鸣，敦明，安学军，叶笑春. 面向多核CPU与GPU平台的图处理系统关键技术综述. 计算机研究与发展. 2024(06): 1401-1428 . 本站查看
2.	姚鹏程，廖小飞，金海，周宇航，徐鹏，张伟，曾圳，潘晨高，朱冰. 一种冗余感知的高能效图计算加速器. 中国科学:信息科学. 2024(06): 1369-1385 . 百度学术
3.	轩瑞，陈磊，石海鹤. 图类算法可重用设计及其实现. 江西师范大学学报(自然科学版). 2023(01): 52-60 . 百度学术
4.	郭晓龙，牛晋宇，杜永萍. 基于树莓派的高效卷积优化方法. 计算机技术与发展. 2023(05): 96-104 . 百度学术
5.	刘勇. 基于图算法的定向越野数据智能分析系统设计. 自动化与仪器仪表. 2022(08): 159-164 . 百度学术
6.	李策，章隆兵. 基于顶点度数的图数据分区域重排序. 高技术通讯. 2022(09): 903-913 . 百度学术
7.	李策，章隆兵. 基于社区结构的图数据预取器设计. 高技术通讯. 2022(12): 1251-1261 . 百度学术
8.	孙学军. 基于大数据的计算机基础教学改革研究. 电脑知识与技术. 2021(28): 237-238+249 . 百度学术
9.	谭雯，甘新标，白皓，肖调杰，陈旭光，雷书梦，刘杰. 面向超级计算机系统的大规模图遍历优化. 西安电子科技大学学报. 2021(06): 84-95 . 百度学术
10.	刘建友，蒋春霞. 一种基于高通量计算机的图算法优化技术. 信息与电脑(理论版). 2020(22): 69-71 . 百度学术