Advanced Search
    He Yulin, Wu Bo, Wu Dingming, Huang Zhexue, Philippe Fournier-Viger. Edge Hashing Distributed Sampling Algorithm for Triangle Counting in Large-scale Dynamic Graph Stream[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440120
    Citation: He Yulin, Wu Bo, Wu Dingming, Huang Zhexue, Philippe Fournier-Viger. Edge Hashing Distributed Sampling Algorithm for Triangle Counting in Large-scale Dynamic Graph Stream[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440120

    Edge Hashing Distributed Sampling Algorithm for Triangle Counting in Large-scale Dynamic Graph Stream

    • Triangle counting is a classical problem in large-scale graph analysis and widely applied in areas such as social network analysis, anomaly detection, and graph mining. In dynamic graph data streams, the topological structure of graph is constantly changing, involving edge insertions and deletions. Recent researches mainly focus on static streaming graphs and the corresponding streaming graph sampling algorithms only deal with edge insertions. Only a few single-machine sampling algorithms with low accuracy can handle edge deletion operations. We propose the edge hashing assignment-based distributed sampling (EHADS) algorithm, which is a distributed streaming algorithm designed for estimating the number of triangles in dynamic graph data streams. EHADS algorithm can rapidly and accurately estimate the counts of global triangles and local triangles associated with each vertex in dynamic streaming graphs. EHADS algorithm processes the input graph stream only once and samples edges across multiple machines. Compared with state-of-the-art single-machine streaming algorithms, the experimental results reflect that EHADS algorithm has the following advantages: 1) with the same sample capacity, EHADS algorithm achieves smaller estimation errors with less runtime, reducing the average estimation error of global triangle count by 31.79% and local triangle count by 23.35%; 2) EHADS algorithm provides unbiased estimations of the triangle count in streaming graph, and mathematical proofs demonstrate that the above-mentioned unbiased estimation has smaller variance.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return