Citation: | Xue Xin, Zhu Tianchen, Sun Qingyun, Zhou Haoyi, Li Jianxin. Efficient Subgraph Matching Algorithm with Graph Neural Network[J]. Journal of Computer Research and Development, 2025, 62(3): 694-708. DOI: 10.7544/issn1000-1239.202330732 |
Subgraph matching is an optimization problem in graph, which is to find all matching subgraphs of the query graph in a large target graph. Although subgraph matching is an NP-Hard problem, the problem is common in many fields such as social networks, biochemistry, and cognitive science. Backtracking searching algorithms for subgraph matching have high time complexity, and the pruning strategy is essential to reduce the operating time. However, complex expanding in the existing pruning strategy leads to high complexity of time and space. To balance the efficiency and the effectiveness, only limited neighborhood structure information can be used in conflicting judging, which lets lots of useless states pass the pruning judging and wastes time. An efficient, accurate, and adaptive subgraph matching algorithm is proposed. The algorithm captures the detail structure of the whole graph by graph neural network, builds structure connections, and generates pruning possibilities for all candidate searching states. It replaces complex expanding pruning method with inferring by the neural network model to rapidly estimate the probability of pruning during searching. A data sampling mechanism is designed to alleviate the problem of network training collapse. Experiments show that using our pruning method in traditional backtracking search can improve search efficiency.
[1] |
Snijders T, Pattison P, Robins G, et al. New specifications for exponential random graph models[J]. Sociological Methodology, 2006, 36(1): 99−153 doi: 10.1111/j.1467-9531.2006.00176.x
|
[2] |
Xue Jiawei, Jiang Nan, Liang Senwei, et al. Quantifying spatial homogeneity of urban road networks via graph neural networks[J]. Nature Machine Intelligence, 2022, 4(3): 246−257 doi: 10.1038/s42256-022-00462-y
|
[3] |
Przulj N, Corneil D, Jurisica I. Efficient estimation of graphlet frequency distributions in protein-protein interaction networks[J]. Bioinformatics, 2006, 22(8): 974−980 doi: 10.1093/bioinformatics/btl030
|
[4] |
郭方方,王欣悦,王慧强,等. 基于最大频繁子图挖掘的动态污点分析方法[J]. 计算机研究与发展,2020,57(3):631−638 doi: 10.7544/issn1000-1239.2020.20180846
Guo Fangfang, Wang Xinyue, Wang Huiqiang, et al. A dynamic stain analysis method on maximal frequent sub graph minin[J]. Journal of Computer Research and Development, 2020, 57(3): 631−638 (in Chinese) doi: 10.7544/issn1000-1239.2020.20180846
|
[5] |
Webber J. A programmatic introduction to Neo4j[C]//Proc of the 2012 ACM Conf on Systems, Programming, and Applications: Software for Humanity. New York: ACM, 2018: 217−218
|
[6] |
Anikin D, Borisenko O, Nedumov Y. Labeled property graphs: SQL or NoSQL[C]//Proc of the 2019 Ivannikov Memorial Workshop. Piscataway, NJ: IEEE, 2019: 7−13
|
[7] |
Singh R, Xu Jinbo, Berger B. Pairwise global alignment of protein interaction networks by matching neighborhood topology[C]//Proc of the 11th Annual Int Conf on Research in Computational Molecular Biology. New York: ACM, 2007: 16−31
|
[8] |
Zhang Xiaolin, Yuan Haochen, Li Zhuolin, et al. Distributed subgraph matching privacy preserving method for dynamic social network[C]//Proc of the 7th CCF Conf of BigData. Berlin: Springer, 2019: 159−173
|
[9] |
Nguyen H, Jeong H. An area partitioning and subgraph growing (APSG) approach to the conflation of road networks[J]. Sensors, 2022, 22(4): 1501−1526 doi: 10.3390/s22041501
|
[10] |
Pavlov D, Shturts I. Chemical substructure search screening with fingerprints built with subGraph enumeration[J]. Reviews on Advanced Materials Science, 2009, 20: 37−41
|
[11] |
Conte D, Foggia P, Sansone C, et al. Thirty years of graph matching in pattern recognition[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2004, 18(3): 265−298 doi: 10.1142/S0218001404003228
|
[12] |
Nakai, Kenta. Yeast : UCI Machine learning repository[DB/OL]. 1996[2023-03-01]. https://archive.ics.uci.edu/ml/datasets/Yeast
|
[13] |
Yang J, Leskovec J. Defining and evaluating network communities based on ground-truth[C]//Proc of the 12th IEEE Int Conf on Data Mining. Piscataway, NJ: IEEE, 2012: 745−754
|
[14] |
Sun Shixuan, Luo Qiong. In-memory subgraph matching: An in-depth study[C]//Proc of the 2020 Int Conf on Management of Data. New York: ACM, 2020: 1083−1098
|
[15] |
Cordella L, Foggia P, Sansone C, et al. A (sub)graph isomorphism algorithm for matching large graphs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(10): 1367−1372 doi: 10.1109/TPAMI.2004.75
|
[16] |
Carletti V, Foggia P, Vento M. VF2 plus: An improved version of VF2 for biological graphs[C]//Proc of the 10th Int Workshop on Graph Based Representations in Pattern Recognition. Berlin: Springer, 2015: 168−177
|
[17] |
Juttner A, Madarasi P. VF2++ ―An improved subgraph isomorphism algorithm[J]. Discrete Applied Mathematics, 2018, 242: 69−81 doi: 10.1016/j.dam.2018.02.018
|
[18] |
Vinyals O, Fortunato M, Jaitly N. Pointer networks[C]//Proc of the 28th Annual Conf on Neural Information Processing Systems. New York: ACM, 2015: 2692−2700
|
[19] |
Liu Xin, Pan Haojie, He Mutian, et al. Neural subgraph isomorphism counting[C]//Proc of the 26th ACM Int Conf on Knowledge Discovery Data Mining. New York: ACM, 2019: 1959−1969
|
[20] |
Shang Haichuan, Zhang Ying, Lin Xuemin, et al. Taming verification hardness: An efficient algorithm for testing subgraph isomorphism[J]. Very Large Data Base Endowment, 2008, 1(1): 364−375
|
[21] |
Abdelaziz I, Al-Harbi R, Salihoglu S, et al. SPARTex: A vertex-centric framework for RDF data analytics[J]. Very Large Data Base Endowment, 2015, 8(12): 1880−1883
|
[22] |
Ngo H, Porat E, Re C, et al. Worst-case optimal join algorithms[J]. Journal of ACM, 2018, 65(3): 16: 1−16: 40
|
[23] |
Carletti V, Foggia P, Saggese A, et al. Challenging the time complexity of exact subgraph isomorphism for huge and dense graphs with VF3[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 804−818 doi: 10.1109/TPAMI.2017.2696940
|
[24] |
Aberger C, Lamb A, Tu S, et al. EmptyHeaded: A relational engine for graph processing[C]//Proc of the 2016 Int Conf on Management of Data. New York: ACM, 2015: 431−446
|
[25] |
Aref M, Cate B, Green T, et al. Design and implementation of the LogicBlox system[C]//Proc of the 2015 ACM Int Conf on Management of Data. New York: ACM, 2015: 1371−1382
|
[26] |
Scarselli F, Gori M, Tsoi A, et al. The graph neural network mode[J]. IEEE Transactions on Neural Networks, 2009, 20(1): 61−80 doi: 10.1109/TNN.2008.2005605
|
[27] |
Kipf T, Welling M. Semi-supervised classifification with graph convolutional networks[J]. arXiv preprint, arXiv: 1609.02907, 2016
|
[28] |
Schlichtkrull M, Kipf T, Bloem P, et al. Modeling relational data with graph convolutional networks[C]//Proc of the 15th Int Conf of ESWC. Berlin: Springer, 2018: 593−607
|
[29] |
Yan Sijie, Xiong Yuanjun, Lin Dahua. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Proc of the 8th AAAI Conf on Artificial Intelligence. New York: ACM, 2018: 7444−7452
|
[30] |
Hamilton W, Ying R, Leskovec J. Inductive representation learning on large graphs[C]//Proc of the 31st Int Conf on Neural Information Processing Systems. New York: ACM, 2017: 1025−1035
|
[31] |
Xu Keyulu, Hu Weihua, Leskovec J, et al. How powerful are graph neural networks[J]. arXiv preprint, arXiv: 1810.00826, 2018
|
[32] |
Velickovic P, Cucurull G, Casanova A, et al. Graph attention networks[J]. arXiv preprint, arXiv: 1710.10903, 2017
|
[33] |
Wang Shuohang, Jiang Jing. Machine comprehension using Match-LSTM and answer pointer[J]. arXiv preprint, arXiv: 1608.0790, 2016
|
[34] |
Ma Qiang, Ge Suwen, He Danyang, et al. Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning[J]. arXiv preprint, arXiv: 1911.04936, 2019
|
[35] |
Khalil E, Dai H, Zhang Y, et al. Learning combinatorial optimization algorithms over graphs[C]//Proc of the 30th Annual Conf on Neural Information Processing System. La Jolla, CA: NIPS, 2017, 6348−6358
|
[36] |
Wang Hanchen, Zhang Ying, Qin Lu, et al. Reinforcement learning based query vertex ordering model for subgraph matching[C]//Proc of the 38th Int Conf on Data Engineering. Piscataway, NJ: IEEE, 2022: 245−258
|
[37] |
Ying R, Lou Zhaoyu, You Jiaxuan, et al. Neural subgraph matching[J]. arXiv preprint, arXiv: 2007.03092, 2020
|
[38] |
Roy I, Velugoti V, Chakrabarti S, et al. Interpretable neural subgraph matching for graph retrieval[C]//Proc of the 36th AAAI Conf on Artificial Intelligence. New York: ACM, 2022: 8115−8123
|
[39] |
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proc of the 30th Annual Conf on Neural Information Processing Systems. New York: ACM, 2017: 5998−6008
|
[40] |
Devlin J, Chang Mingwei, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proc of the 2019 Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: ACL, 2019: 4171−4186
|
[41] |
Beltagy I, Peters M, Cohan A. Longformer: The long-document transformer[J]. arXiv preprint, arXiv: 2004.05150, 2020
|
[42] |
Dai Zihang, Yang Zhilin, Yang Yiming, et al. Transformer-XL: Attentive language models beyond a fixed-length context[C]//Proc of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2019: 2978−2988
|
[43] |
Wang Wenhai, Xie Enze, Li Xiang, et al. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions[C]//Proc of the 2021 IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2021: 548−558
|