高级检索

    基于混合增量计算模式的流式图并行处理

    Parallel Streaming Graph Processing with Hybrid Incremental Computation

    • 摘要: 流式图能够对现实生活中数据快速变化的场景进行有效建模,在社交网络分析、内容推荐、异常检测等领域得到了广泛应用.基于流式图更新前后的两个图快照具有大量相同数据的事实,增量计算通过对历史计算结果进行存储和复用来降低迭代计算过程中的访存量和计算量,从而有效提升流式图处理的性能.然而,现有对图算法进行增量计算优化的研究,往往受限于满足特定性质的图算法,而难以应用于一般图算法.针对一般图算法的增量计算优化问题,将增量计算进一步划分为基于修正和基于重计算的增量计算模式,理论上刨析了二者的异同点,实验上在不同图数据集、图算法和更新场景设置下测试了二者的性能差异,提出了混合增量计算模式,设计了确保切换正确性的算法,并通过随机森林分类器准确地预测切换时机.性能评估和切换效果分析表明一般图算法在混合增量计算模式下能够进行有效切换,并相比先进的流式图处理系统DZIG实现了平均1.25×的加速比.

       

      Abstract: Streaming graphs can effectively model rapidly changing data in real life, and have been widely used in social network analysis, content recommendation, anomaly detection, and other fields. Based on the fact that the two graph snapshots before and after the graph update have a large amount of the same data, incremental computation reduces the amount of memory accesses and computation in the iterations by storing and reusing the historical results, which effectively improves the performance of streaming graph processing. However, the existing solutions for incremental computation optimization of graph algorithms are often limited by graph algorithms that satisfy specific properties, thus it is difficult to apply them to general graph algorithms. In this paper, the incremental computation is further divided into two modes of refinement and recomputation. The similarities and differences between the two modes are analyzed theoretically, and the performance differences between them are tested experimentally with different graph datasets, graph algorithms, and graph update settings. Based on the two modes, the hybrid incremental computation mode is designed to ensure the correctness of switching, and the switching time is predicted accurately by a random forest classifier. Through performance evaluation and switching effect analysis, it is proved that the general graph algorithms can effectively switch under the hybrid mode. Compared with the start-of-the-art solution DZIG, the average speedup is 1.25×.

       

    /

    返回文章
    返回