Abstract:
Graph partitioning is a key technique for parallel processing of large graphs. Existing graph partitioning algorithms struggle to balance partition quality and efficiency. Offline partitioning algorithms tend to achieve high partition quality but are time-consuming, while online (or streaming) partitioning algorithms are relatively efficient but suffer from suboptimal partition quality. To address the above problem, we in this work propose a distributed streaming partitioning algorithm with a buffering mechanism. The algorithm utilizes a multi-loader and multi-partitioner architecture, where multiple loaders read graph data in parallel to improve loading efficiency. Each partitioner maintains a buffer to store graph vertices received from the corresponding loader and sorts them in descending order based on vertex degree, providing better decision-making data for partitioning. Four streaming heuristic rules are pre-configured in the partitioners, which perform parallel partitioning on vertices in the buffer with different goals in mind. A reflow mechanism is employed to fine-tune the partitioning results and improve quality. Experimental results in a distributed system environment show that the proposed algorithm improves partition quality (edge-cut ratio) by more than 18.8% compared to the best existing online partitioning algorithm. Additionally, it reduces the proportion of graph data loading time in the total partitioning time from an average of 30.8% in a single-loader single-partitioner architecture to an average of 20.1%.