基于Multi-GPU平台的大规模图数据处理

张珩; 张立波; 武延军

doi:10.7544/issn1000-1239.2018.20170697

基于Multi-GPU平台的大规模图数据处理

¹(中国科学院软件研究所北京 100190)
²(中国科学院大学北京 100049) (zhangheng@nfs.iscas.ac.cn)

基金项目: 中国科学院战略性先导科技专项(XDA06010600)

详细信息

中图分类号: TP316.4
计量
- 文章访问数: 1325
- HTML全文浏览量: 4
- PDF下载量: 1070
出版历程
- 发布日期: 2018-01-31

Large-Scale Graph Processing on Multi-GPU Platforms

¹(Institute of Software, Chinese Academy of Sciences, Beijing 100190)
²(University of Chinese Academy of Sciences, Beijing 100049)

摘要

摘要: 在GPU高性能节点上构建高效的大规模图数据的算法和系统已经日益成为研究热点，以GPU协处理器为计算核心不仅能够提供大规模线程的并行环境，也能提供高吞吐的内存和缓存访问机制.随着图的规模增大，相对大小局限的GPU的设备访存空间逐渐不能满足缓存整个图数据的应用需求，也催生了大量以单节点上外存I/O优化(out-of-core graph)为主要研究方向的大规模图数据处理系统.为了应对这一瓶颈，现有的算法和系统研究采用对图切分的压缩数据形式(即shards)用以数据传输和迭代计算.然而，这类研究扩展到Multi-GPU平台上往往性能的局限性表现在对PCI-E带宽的高依赖性，同时也由于Multi-GPU上任务负载不均衡而缺乏一定的可扩展性.为了应对上述挑战，提出并设计了基于Multi-GPU平台的支持高效、可扩展的大规模图数据处理系统GFlow.GFlow提出了全新的适用于Multi-GPU下的图数据Grid切分策略和双层滑动窗口算法，在将图的属性数据(点的状态集合、点/边权重值)缓存于各GPU设备之后，顺序加载图的拓扑结构数据(点/边集合)值各GPU中.通过双层滑动窗口，GFlow动态地加载数据分块从SSD存储至GPU设备内存，并顺序化聚合并应用处理过程中各GPU所生成的Updates.通过在9个现实图数据集上的实验结果可以看出，GFlow在Multi-GPU平台下相比其他支持外存图(out-of-core graph)处理的相关系统性能表现更为优异，对比CPU下的GraphChi和X-Stream分别提升25.6X和20.3X，对比GPU下支持外存图数据处理的GraphReduce系统单GPU提升1.3~2.5X.同时GFlow可扩展性在Multi-GPU上也表现良好.
- 大规模图数据 /
- multi-GPU /
- 图分块 /
- 双层滑动窗口 /
- 数据传输
Abstract: GPU-based node has emerged as a promising direction toward efficient large-scale graph processing, which is relied on the high computational power and scalable caching mechanisms of GPUs. Out-of-core graphs are the graphs that exceed main and GPU-resident memory capacity. To handle them, most existing systems using GPUs employ compact partitions of fix-sized ordered edge sets (i.e., shards) for the data movement and computation. However, when scaling to platforms with multiple GPUs, these systems have a high demand of interconnect (PCI-E) bandwidth. They suffer from GPU underutilization and represent scalability and performance bottlenecks. This paper presents GFlow, an efficient and scalable graph processing system to handle out-of-core graphs on multi-GPU nodes. In GFlow, we propose a novel 2-level streaming windows method, which stores graph’s attribute data consecutively in shared memory of multi-GPUs, and then streams graph’s topology data (shards) to GPUs. With the novel 2-level streaming windows, GFlow streams shards dynamically from SSDs to GPU devices’ memories via PCI-E fabric and applies on-the-fiy updates while processing graphs, thus reducing the amount of data movement required for computation. The detailed evaluations demonstrate that GFlow significantly outperforms most other competing out-of-core systems for a wide variety of graphs and algorithms under multi-GPUs environment, i.e., yields average speedups of 256X and 203X over CPU-based GraphChi and X-Stream respectively, and 1.3~2.5X speedup against GPU-based GraphReduce (single-GPU). Meanwhile, GFlow represents excellent scalability as we increase the number of GPUs in the node.
- large scalegraph /
- multi-GPU /
- graph shard /
- dual streaming windows /
- data movement

HTML全文

参考文献(0)

施引文献(11)

期刊类型引用(8)

1.	李清清，于欣宁，王海峰. GPU异构集群的协同计算引擎设计研究. 计算机应用与软件. 2024(12): 15-22+28 . 百度学术
2.	王庆桦. 动态数据处理平台分布式缓存替换算法仿真. 计算机仿真. 2020(02): 294-298 . 百度学术
3.	曲海成，于思淼，刘万军，王鑫源. 面向CUDA程序的性能预测框架. 电子学报. 2020(04): 654-661 . 百度学术
4.	张珩，崔强，侯朋朋，武延军，赵琛. 面向GPU平台的复杂网络core分解方法研究. 软件学报. 2020(04): 1225-1239 . 百度学术
5.	姜丽丽，李叶飞，豆龙龙，陈智麒，钱柱中. 面向大数据的图模式挖掘概率算法. 计算机应用研究. 2020(12): 3545-3551 . 百度学术
6.	杨世伟，蒋国平，宋玉蓉，涂潇. 基于GPU的稀疏矩阵存储格式优化研究. 计算机工程. 2019(09): 23-31+39 . 百度学术
7.	刘振鹏，薛雷，张彬，王雪峰. 最小延时问题GPU并行加速变邻域搜索方法. 科学技术与工程. 2018(29): 216-221 . 百度学术
8.	沈华峰，冯新扬，邵超. 一种云环境下图数据中带边权重的隐私保护方法. 电视技术. 2018(10): 30-33 . 百度学术