一种带缓冲区的分布式流式图划分方法
A Method of Distributed Buffered Streaming Graph Partitioning
-
摘要: 图划分是大图并行处理关键技术之一.现有图划分方法存在划分质量和效率难以平衡的问题,主要体现在离线划分方法划分质量高但耗时长;在线(也称流式)划分方法相对高效但划分质量不理想.为此,本文提出一种带缓冲区的分布式流式划分方法.它采用多加载器-多划分器架构,多个加载器并行读取图数据,提高图数据加载效率;每个划分器维护一个缓冲区,缓存相应加载器发来的图顶点,并按顶点度数高度排序,为划分器提供更多决策依据. 划分器预置有4条流式启发式规则,围绕不同目标,对缓冲区中的顶点实施并行划分,借助重流机制微调划分结果,改进划分质量. 分布式系统环境下的划分质量与性能实验表明:本文提出方法的划分质量(割边比)比当前最好的在线划分方法提升超过18.8%,并将图数据加载时间在划分总时间的占比,从单划分器-单加载器架构流式划分算法的平均30.8%缩减至平均20.1%.Abstract: Graph partitioning is one of the key technologies for parallel processing of big graphs. Existing offline partitioning algorithms are time-consuming while online ones cannot offer high-quality partitions. To this end, we propose a distributed buffered streaming graph partitioning method. It adopts a multi-loader-multi-partitioner architecture with multiple loaders reading graph data in parallel to accelerate data loading. Each partitioner maintains a buffer, where the vertex stream read by the corresponding loader is buffered and sorted with the aim of providing more valuable information for partitioning decisions. Then the buffered vertices are assigned to different partitions by using one of our proposed four streaming heuristics with different goals. To further improve the quality of graph partitions, we design a re-streaming mechanism. Experimental results show that the proposed algorithm outperforms the state-of-the-art online ones in terms of partition quality by at least 18.8% and reduce the ratio of graph data loading time to the total partition time from an average of 30.8% to 20.1%.
-
Keywords:
- Big graph /
- streaming partitioning /
- distributed /
- buffering /
- restreaming
-
计量
- 文章访问数: 5
- HTML全文浏览量: 0
- PDF下载量: 2