Abstract:
The graph data structure, which is adept at encapsulating intricate relationships among entities, has been widely used in a vast array of application scenarios. With the incessant progression of Internet applications and the concomitant surge in data scales, distributed graph computing systems have demonstrated superior performance compared with traditional single-machine systems in various aspects, including computational efficiency and resource scheduling. In recent years, the increasing demand for distributed graph computing systems designed for handling large-scale graph data has brought graph partitioning technology to the forefront of academic research. Based on a comprehensive analysis of graph partitioning techniques for distributed graph computing, we explain the technological backdrop of graph partitioning in these systems. We provide definitions for key concepts related to graph partitioning in modern distributed graph computing systems and present a classification scheme for existing computational models, offering insights into the current status of distributed graph computing paradigms. Subsequent sections delve into the complexities of different graph partitioning methodologies, conducting a thorough analysis to determine their respective strengths and weaknesses within the context of various distributed graph computing frameworks. Finally, we discuss the current challenges and future research directions of graph partitioning technology in distributed graph computing systems.