时序图流上的快速子图近似计数算法

王晶晶; 王延昊; 姜文君; 曾一夫; 祝团飞

doi:10.7544/issn1000-1239.202330788

时序图流上的快速子图近似计数算法

Fast Algorithm for Approximate Motif Counting in Temporal Graph Streams

摘要

摘要: 图数据中包含丰富的时间信息，其拓扑结构随时间动态演变，通常建模为时序图流. 时序图流由一组节点和一系列带时间戳的有向边组成，节点、时序边随时间动态增加. 其中时序子图是由传统子图模式推广而来，不仅考虑拓扑结构，同时将时序边的顺序和持续时间纳入考量. 在时序图流中计算时序子图的出现次数是时序图研究中的一个基础问题. 然而，传统流式子图计数方法不支持时序匹配，仅适用于不包含时间信息的简单无向图或有向图；并且，现有时序子图计数算法在不断产生新数据的时序图流场景下效率不高. 因此，对时序图流上时序子图近似计数问题进行了研究，提出了基于蓄水池采样的流式边采样（streaming edge sampling, SES）算法，并从期望、方差、时间复杂度3个方面对SES算法进行了理论分析. 最后，在4个真实数据集上进行了大量实验. 实验结果表明，与基线方法相比，SES虽然返回的计数相对误差略大，但计算效率取得了最高3个数量级的大幅提升.

Abstract: Graphs often have rich temporal information and evolve dynamically over time, which can be modeled as temporal graph streams. A temporal graph stream consists of a set of vertices and a series of timestamped and directed edges, where new vertices and edges arrive continuously over time. Temporal motifs are generalizations of subgraph patterns in static graphs which take into account edge orderings and durations in addition to topologies. Counting the number of occurrences of temporal motifs is a fundamental problem for temporal graph analysis. However, traditional streaming subgraph counting methods cannot support temporal matching, and are only suitable for simple graphs that do not contain temporal information. In addition, existing temporal motifs counting methods suffer from poor performance in temporal graph streams. We thus study approximate temporal motif counting via random sampling in temporal graph streams. We propose a generic streaming edge sampling (SES) algorithm to estimate the number of instances of any temporal motif in a given temporal graph stream. We then provide comprehensive analyses of the theoretical bounds and time complexities of SES. Finally, we perform extensive experimental evaluations for SES on four real world datasets. The results show that SES achieves up to three orders of magnitude speedups over the state-of-the-art sampling methods while having comparable estimation errors for temporal motif counting in the streaming setting.

HTML全文

参考文献(26)

施引文献

资源附件(0)