Citation: | Shang Junlin, Zhang Zhenyu, Qu Wenwen, Wang Xiaoling. Survey of Graph Partitioning Techniques for Distributed Graph Computing[J]. Journal of Computer Research and Development, 2025, 62(1): 90-103. DOI: 10.7544/issn1000-1239.202330790 |
The graph data structure, which is adept at encapsulating intricate relationships among entities, has been widely used in a vast array of application scenarios. With the incessant progression of Internet applications and the concomitant surge in data scales, distributed graph computing systems have demonstrated superior performance compared with traditional single-machine systems in various aspects, including computational efficiency and resource scheduling. In recent years, the increasing demand for distributed graph computing systems designed for handling large-scale graph data has brought graph partitioning technology to the forefront of academic research. Based on a comprehensive analysis of graph partitioning techniques for distributed graph computing, we explain the technological backdrop of graph partitioning in these systems. We provide definitions for key concepts related to graph partitioning in modern distributed graph computing systems and present a classification scheme for existing computational models, offering insights into the current status of distributed graph computing paradigms. Subsequent sections delve into the complexities of different graph partitioning methodologies, conducting a thorough analysis to determine their respective strengths and weaknesses within the context of various distributed graph computing frameworks. Finally, we discuss the current challenges and future research directions of graph partitioning technology in distributed graph computing systems.
[1] |
Malewicz G, Austern M H, Bik A J C, et al. Pregel: A system for large-scale graph processing[C]//Proc of the 36th ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2010: 135−146
|
[2] |
Low Y, Gonzalez J, Kyrola A, et al. Distributed GraphLab: A framework for machine learning in the cloud[J]. Proceedings of the VLDB Endowment, 2012, 5(8): 716−727 doi: 10.14778/2212351.2212354
|
[3] |
Gonzalez J E, Low Y, Gu Haijie, et al. PowerGraph: Distributed graph-parallel computation on natural graphs[C]//Proc of the 10th Symp on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2012: 17−30
|
[4] |
Gonzalez J E, Xin R S, Dave A, et al. GraphX: Graph processing in a distributed dataflow framework[C]// Proc of the 11th Symp on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2014: 599−613
|
[5] |
Karypis G, Kumar V. A fast and high quality multilevel scheme for partitioning irregular graphs[J]. SIAM Journal on Scientific Computing, 1998, 20(1): 359−392 doi: 10.1137/S1064827595287997
|
[6] |
Ching A, Edunov S, Kabiljo M, et al. One trillion edges: Graph processing at Facebook-scale[J]. Proceedings of the VLDB Endowment, 2015, 8(12): 1804−1815 doi: 10.14778/2824032.2824077
|
[7] |
Tian Yuanyuan, Balmin A, Corsten S A, et al. From “think like a vertex” to “think like a graph”[J]. Proceedings of the VLDB Endowment, 2013, 7(3): 193−204 doi: 10.14778/2732232.2732238
|
[8] |
Yan Da, Cheng J, Lu Yi, et al. Blogel: A block-centric framework for distributed computation on real-world graphs[J]. Proceedings of the VLDB Endowment, 2014, 7(14): 1981−1992 doi: 10.14778/2733085.2733103
|
[9] |
Khayyat Z, Awara K, Alonazi A, et al. Mizan: A system for dynamic load balancing in large-scale graph processing[C]// Proc of the 8th ACM European Conf on Computer Systems. New York: ACM, 2013: 169−182
|
[10] |
Yan Da, Huang Yuzhen, Liu Miao, et al. GraphD: Distributed vertex-centric graph processing beyond the memory limit[J]. IEEE Transactions on Parallel and Distributed Systems, 2017, 29(1): 99−114
|
[11] |
Kyrola A, Blelloch G, Guestrin C. GraphChi: Large-scale graph computation on just a PC[C]// Proc of the 10th Symp on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2012: 31−46
|
[12] |
李金吉,张岩峰,巩树凤,等. 流式处理的异步图处理框架[J]. 软件学报,2018,29(3):528−544
Li Jinji, Zhang Yanfeng, Gong Shufeng, et al. Streamlined asynchronous graph processing framework[J]. Journal of Software, 2018, 29(3): 528−544(in Chinese)
|
[13] |
Roy A, Mihailovic I, Zwaenepoel W. X-Stream: Edge-centric graph processing using streaming partitions[C]//Proc of the 24th ACM Symp on Operating Systems Principles. New York: ACM, 2013: 472−488
|
[14] |
Cheng Jiefeng, Liu Qin, Li Zhengou, et al. VENUS: Vertex-centric streamlined graph computation on a single PC[C]// Proc of the 31st Int Conf on Data Engineering. Piscataway, NJ: IEEE, 2015: 1131−1142
|
[15] |
Roy A, Bindschaedler L, Malicevic J, et al. Chaos: Scale-out graph processing from secondary storage[C]//Proc of the 25th Symp on Operating Systems Principles. New York: ACM, 2015: 410−424
|
[16] |
Gao Pin, Zhang Mingxing, Chen Kang, et al. High performance graph processing with locality oriented design[J]. IEEE Transactions on Computers, 2017, 66(7): 1261−1267 doi: 10.1109/TC.2017.2652465
|
[17] |
Bouhenni S, Yahiaoui S, Nouali-Taboudjemat N, et al. Efficient parallel edge-centric approach for relaxed graph pattern matching[J]. Journal of Supercomputing, 2022, 78(2): 1642−1671 doi: 10.1007/s11227-021-03938-7
|
[18] |
Kang S, Hastings C, Eaton J, et al. cuGraph C++ primitives: Vertex/edge-centric building blocks for parallel graph computing[C]//Proc of the 37th IEEE Int Parallel and Distributed Processing Symp Workshops (IPDPSW). Piscataway, NJ: IEEE, 2023: 226−229
|
[19] |
Kernighan B W, Lin Shen. An efficient heuristic procedure for partitioning graphs[J]. The Bell System Technical Journal, 1970, 49(2): 291−307 doi: 10.1002/j.1538-7305.1970.tb01770.x
|
[20] |
Fiduccia C M, Mattheyses R M. A linear-time heuristic for improving network partitions [C]//Proc of the 19th Design Automation Conf. Piscataway, NJ: IEEE, 1982: 175−181
|
[21] |
Karypis G, Kumar V. Multilevel k-way hypergraph partitioning[C]//Proc of the 36th Annual ACM/IEEE Design Automation Conf. New York: ACM, 1999: 343−348
|
[22] |
Shao Bin, Wang Haixun, Li Yaotao. Trinity: A distributed graph engine on a memory cloud [C]// Proc of the 39th ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2013: 505−516
|
[23] |
Stanton I, Kliot G. Streaming graph partitioning for large distributed graphs[C]//Proc of the 18th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2012: 1222−1230
|
[24] |
Tsourakakis C, Gkantsidis C, Radunovic B, et al. Fennel: Streaming graph partitioning for massive scale graphs[C]// Proc of the 7th ACM Int Conf on Web Search and Data Mining. New York: ACM, 2014: 333−342
|
[25] |
Nishimura J, Ugander J. Restreaming graph partitioning: Simple versatile algorithms for advanced balancing[C]// Proc of the 19th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2013: 1106−1114
|
[26] |
Jain N, Liao Guangdeng, Willke T L. Graphbuilder: Scalable graph ETL framework[C]//Proc of the 1st Int Workshop on Graph Data Management Experiences and Systems. New York: ACM, 2013: 19−24
|
[27] |
Schlag S, Henne V, Heuer T, et al. K-way hypergraph partitioning via n-level recursive bisection[C]//Proc of the 18th Workshop on Algorithm Engineering and Experiments (ALENEX). Philadelphia, PA: SIAM, 2016: 53−67
|
[28] |
Wang Lu, Xiao Yanghua, Shao Bin, et al. How to partition a billion-node graph[C]// Proc of the 30th Int Conf on Data Engineering. Piscataway, NJ: IEEE, 2014: 568−579
|
[29] |
Alistarh D, Iglesias J, Vojnovic M. Streaming Min-Max hypergraph partitioning[C]//Proc of the 28th Int Conf on Neural Information Processing Systems-Volume 2. Cambridge, MA: MIT, 2015: 1900−1908
|
[30] |
Huang Jiewen, Abadi D J. Leopard: Lightweight edge-oriented partitioning and replication for dynamic graphs[J]. Proceedings of the VLDB Endowment, 2016, 9(7): 540−551 doi: 10.14778/2904483.2904486
|
[31] |
Petroni F, Querzoni L, Daudjee K, et al. HDRF: Stream-based partitioning for power-law graphs[C] //Proc of the 24th ACM Int on Conf on Information and Knowledge Management. New York: ACM, 2015: 243−252
|
[32] |
Xie Cong, Yan Ling, Li Wujun, et al. Distributed power-law graph computing: Theoretical and empirical analysis[C]//Proc of the 27th Int Conf on Neural Information Processing Systems-Volume 1. Cambridge, MA: MIT, 2014: 1673−1681
|
[33] |
Mayer C, Mayer R, Bhowmik S, et al. HYPE: Massive hypergraph partitioning with neighborhood expansion[C]//Proc of the 6th IEEE Int Conf on Big Data (Big Data). Piscataway, NJ: IEEE, 2018: 458−467
|
[34] |
Kabiljo I, Karrer B, Pundir M, et al. Social Hash partitioner: A scalable distributed hypergraph partitioner[J]. Proceedings of the VLDB Endowment, 2017, 10(11): 1418−1429 doi: 10.14778/3137628.3137650
|
[35] |
Dai Dong, Zhang Wei, Chen Yong. IOGP: An incremental online graph partitioning algorithm for distributed graph databases[C]// Proc of the 26th Int Symp on High-Performance Parallel and Distributed Computing. New York: ACM, 2017: 219−230
|
[36] |
Zhang Chenzi, Wei Fan, Liu Qin, et al. Graph edge partitioning via neighborhood heuristic[C]//Proc of the 23rd ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2017: 605−614
|
[37] |
Hanai M, Suzumura T, Tan W J, et al. Distributed edge partitioning for trillion-edge graphs[J]. arXiv preprint, arXiv: 1908.05855, 2019
|
[38] |
Zhang Yiming, Wang Haonan, Jia M, et al. TopoX: Topology refactorization for minimizing network communication in graph computations[J]. IEEE/ACM Transactions on Networking, 2020, 28(6): 2768−2782 doi: 10.1109/TNET.2020.3020813
|
[39] |
Chen Rong, Shi Jiaxin, Chen Yanchen, et al. Powerlyra: Differentiated graph computation and partitioning on skewed graphs[J]. ACM Transactions on Parallel Computing, 2019, 5(3): 1−39
|
[40] |
Fan Wenfei, Jin Ruochun, Liu Muyang, et al. Application driven graph partitioning[C]// Proc of the 46th ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2020: 1765−1779
|
[41] |
Nazi A, Hang W, Goldie A, et al. GAP: Generalizable approximate graph partitioning framework[J]. arXiv preprint, arXiv: 1903.00614, 2019
|
[42] |
Avdiukhin D, Pupyrev S, Yaroslavtsev G. Multi-dimensional balanced graph partitioning via projected gradient descent[J]. Proceedings of the VLDB Endowment, 2019, 12(8): 906−919 doi: 10.14778/3324301.3324307
|
[43] |
Guo Zhenyu, Xiao Mingyu, Zhou Yi, et al. Enhancing balanced graph edge partition with effective local search[C] //Proc of the 35th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2021: 12336−12343
|
[44] |
Mayer R, Jacobsen H A. Hybrid edge partitioner: Partitioning large power-law graphs under memory constraints[C]//Proc of the 47th Int Conf on Management of Data. New York: ACM, 2021: 1289−1302
|
[45] |
Ayall T, Duan Hanchong, Liu Changhong, et al. Taking heuristic based graph edge partitioning one step ahead via OffStream partitioning approach[C]//Proc of the 37th IEEE Int Conf on Data Engineering (ICDE). Piscataway, NJ: IEEE, 2021: 2081−2086
|
[46] |
Gatti A, Hu Zhixiong, Smidt T, et al. Graph partitioning and sparse matrix ordering using reinforcement learning and graph neural networks[J]. Journal of Machine Learning Research, 2022, 23(1): 13675−13702
|
[47] |
Bui T N, Jones C. A heuristic for reducing fill-in in sparse matrix factorization[R]. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM), 1993
|
[48] |
Cheng C K, Wei Y C A. An improved two-way partitioning algorithm with stable performance[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1991, 10(12): 1502−1511 doi: 10.1109/43.103500
|
[49] |
Shi Jianbo, Malik J. Normalized cuts and image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888−905 doi: 10.1109/34.868688
|
[50] |
Spielman D A, Teng Shanghua. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems[C]//Proc of the 36th Annual ACM Symp on Theory of Computing. New York: ACM, 2004: 81−90
|
[51] |
Orecchia L, Vishnoi N K. Towards an SDP-based approach to spectral methods: A nearly-linear-time algorithm for graph partitioning and decomposition[C]//Proc of the 22nd Annual ACM-SIAM Symp on Discrete Algorithms. Philadelphia, PA: SIAM, 2011: 532−545
|
[52] |
Slota G M, Madduri K, Rajamanickam S. PuLP: Scalable multi-objective multi-constraint partitioning for small-world networks[C]//Proc of the 2nd IEEE Int Conf on Big Data (Big Data). Piscataway, NJ: IEEE, 2014: 481−490
|
[53] |
Slota G M, Rajamanickam S, Devine K, et al. Partitioning trillion-edge graphs in minutes[C]//Proc of the 31st IEEE Int Parallel and Distributed Processing Symp (IPDPS). Piscataway, NJ: IEEE, 2017: 646−655
|
[54] |
Karypis G, Kumar V. A parallel algorithm for multilevel graph partitioning and sparse matrix ordering[J]. Journal of Parallel and Distributed Computing, 1998, 48(1): 71−95 doi: 10.1006/jpdc.1997.1403
|
[55] |
Ye Chang, Li Yuchen, He Bingsheng, et al. Large-scale graph label propagation on GPUs[J/OL]. IEEE Transactions on Knowledge and Data Engineering, 2023[2023-12-01]. https://ieeexplore.ieee.org/abstract/document/10330123
|
[56] |
Kim D H, Nagi R, Chen Deming. Thanos: High-performance CPU-GPU based balanced graph partitioning using cross-decomposition[C]//Proc of the 25th Asia and South Pacific Design Automation Conf (ASP-DAC). Piscataway, NJ: IEEE, 2020: 91−96
|
[57] |
Massri M, Miklos Z, Raipin P, et al. Clock-G: A temporal graph management system with space-efficient storage technique[C]//Proc of the 38th IEEE Int Conf on Data Engineering (ICDE). Piscataway, NJ: IEEE, 2022: 2263−2276
|
[58] |
李贺,刘延娜,袁航,等. 动态图划分算法研究综述[J]. 软件学报,2023,34(2):539−564
Li He, Liu Yanna, Yuan Hang, et al. Research on dynamic graph partitioning algorithms: A survey[J]. Journal of Software, 2023, 34(2): 539−564 (in Chinese)
|
[1] | Fang Haotian, Li Chunhua, Wang Qing, Zhou Ke. A Method of Microservice Performance Anomaly Detection Based on Deep Learning[J]. Journal of Computer Research and Development, 2024, 61(3): 600-613. DOI: 10.7544/issn1000-1239.202330543 |
[2] | Yue Wenjing, Qu Wenwen, Lin Kuan, Wang Xiaoling. Survey of Cardinality Estimation Techniques Based on Machine Learning[J]. Journal of Computer Research and Development, 2024, 61(2): 413-427. DOI: 10.7544/issn1000-1239.202220649 |
[3] | Wang Rui, Qi Jianpeng, Chen Liang, Yang Long. Survey of Collaborative Inference for Edge Intelligence[J]. Journal of Computer Research and Development, 2023, 60(2): 398-414. DOI: 10.7544/issn1000-1239.202110867 |
[4] | Liu Qixu, Chen Yanhui, Ni Jieshuo, Luo Cheng, Liu Caiyun, Cao Yaqin, Tan Ru, Feng Yun, Zhang Yue. Survey on Machine Learning-Based Anomaly Detection for Industrial Internet[J]. Journal of Computer Research and Development, 2022, 59(5): 994-1014. DOI: 10.7544/issn1000-1239.20211147 |
[5] | Wang Jialai, Zhang Chao, Qi Xuyan, Rong Yi. A Survey of Intelligent Malware Detection on Windows Platform[J]. Journal of Computer Research and Development, 2021, 58(5): 977-994. DOI: 10.7544/issn1000-1239.2021.20200964 |
[6] | Chen Kerui, Meng Xiaofeng. Interpretation and Understanding in Machine Learning[J]. Journal of Computer Research and Development, 2020, 57(9): 1971-1986. DOI: 10.7544/issn1000-1239.2020.20190456 |
[7] | Liu Chenyi, Xu Mingwei, Geng Nan, Zhang Xiang. A Survey on Machine Learning Based Routing Algorithms[J]. Journal of Computer Research and Development, 2020, 57(4): 671-687. DOI: 10.7544/issn1000-1239.2020.20190866 |
[8] | Liu Junxu, Meng Xiaofeng. Survey on Privacy-Preserving Machine Learning[J]. Journal of Computer Research and Development, 2020, 57(2): 346-362. DOI: 10.7544/issn1000-1239.2020.20190455 |
[9] | Xu Xiaoxiang, Li Fanzhang, Zhang Li, Zhang Zhao. The Category Representation of Machine Learning Algorithm[J]. Journal of Computer Research and Development, 2017, 54(11): 2567-2575. DOI: 10.7544/issn1000-1239.2017.20160350 |
[10] | Wen Guihua. Relative Transformation for Machine Learning[J]. Journal of Computer Research and Development, 2008, 45(4): 612-618. |