Abstract:
The minimum spanning tree(MST) algorithm is one of the most classic algorithms in the graph theory. Some complex graph algorithms based on MST including clustering, classification and shortest path queries have been improved significantly in terms of efficiency and quality. However, with the rapid development of Internet, the scales of graphs have been becoming larger and larger. Large scale graphs which contain millions or even billions of vertices have become more common. Therefore, how to implement query processing and data mining algorithms on large scale graphs has become a problem to be solved urgently. In addition, because of the dynamic properties of large-scale graphs, how to maintain results dynamically has also become one of the most attractive problems. However, the state of the art MST algorithms can’t handle such massive and dynamic graph data. In this paper, we propose a vertex-driven parallel MST algorithm called PB based on a partition Prim algorithm named PP, and demonstrate the correctness of PB. Moreover, we implement the whole process of PB algorithm on the MapReduce and BSP framework respectively. Taking deletion-only graphs into consideration, we put forward a maintenance algorithm for MST which can conduct efficient incremental evaluation. All the computation and maintenance costs are further analyzed and compared. Finally, experiments based on real and synthetic data sets demonstrate the reliabilty, efficiency and scalability of our algorithms.