高级检索

    一种带缓冲区的分布式流式图划分算法

    An Algorithms of Distributed Buffered Streaming Graph Partitioning

    • 摘要: 图划分是大图并行处理关键技术之一. 现有图划分算法存在划分质量和效率难以平衡的问题,主要体现在离线划分算法划分质量高但耗时长;在线(也称流式)划分算法相对高效但划分质量不理想. 为此,提出一种带缓冲区的分布式流式划分算法. 它采用多加载器-多划分器架构,多个加载器并行读取图数据,提高图数据加载效率;每个划分器维护一个缓冲区,缓存相应加载器发来的图顶点,并按顶点度数高度排序,为划分器提供更多决策依据. 划分器预置有4条流式启发式规则,围绕不同目标,对缓冲区中的顶点实施并行划分,借助重流机制微调划分结果,改进划分质量. 分布式系统环境下的划分质量与性能实验表明:提出算法的划分质量(割边比)比当前最好的在线划分算法改善超过18.8%,并将图数据加载时间在划分总时间的占比,从单划分器-单加载器架构流式划分算法的平均30.8%缩减至平均20.1%.

       

      Abstract: Graph partitioning is a key technique for parallel processing of large graphs. Existing graph partitioning algorithms struggle to balance partition quality and efficiency. Offline partitioning algorithms tend to achieve high partition quality but are time-consuming, while online (or streaming) partitioning algorithms are relatively efficient but suffer from suboptimal partition quality. To address the above problem, we in this work propose a distributed streaming partitioning algorithm with a buffering mechanism. The algorithm utilizes a multi-loader and multi-partitioner architecture, where multiple loaders read graph data in parallel to improve loading efficiency. Each partitioner maintains a buffer to store graph vertices received from the corresponding loader and sorts them in descending order based on vertex degree, providing better decision-making data for partitioning. Four streaming heuristic rules are pre-configured in the partitioners, which perform parallel partitioning on vertices in the buffer with different goals in mind. A reflow mechanism is employed to fine-tune the partitioning results and improve quality. Experimental results in a distributed system environment show that the proposed algorithm improves partition quality (edge-cut ratio) by more than 18.8% compared to the best existing online partitioning algorithm. Additionally, it reduces the proportion of graph data loading time in the total partitioning time from an average of 30.8% in a single-loader single-partitioner architecture to an average of 20.1%.

       

    /

    返回文章
    返回