Abstract:
The problem of topological imbalance in graphs, arising from the non-uniform and asymmetric distribution of nodes in the topological space, significantly hampers the performance of graph neural networks. Current research predominantly focuses on labeled nodes, with relatively less attention given to unlabeled nodes. To address this challenge, we propose a self-supervised learning method based on random walk paths aimed at tackling the issues posed by topological imbalance, including the constraints imposed by homogeneity assumptions, topological distance decay, and annotation attenuation. Our method introduces the concept of multi-hop paths within the subgraph neighborhood, aiming to comprehensively capture relationships and local features among nodes. Firstly, through a strategy of aggregating between paths, we can learn both homogeneous and heterogeneous features within multi-hop paths, thereby preserving not only the nodes' original attributes but also maintaining their initial structural connections in the random walk sequences. Additionally, by combining a strategy of aggregating subgraph samples based on multiple paths with structured contrastive loss, we maximize the intrinsic features of local subgraphs for the same node, enhancing the expressive power of graph representations. Experimental results validate the effectiveness and generalization performance of our method across various imbalanced scenarios. This research provides a novel approach and perspective for addressing topological imbalance issues.