高级检索
    张承龙, 曹华伟, 王国波, 郝沁汾, 张洋, 叶笑春, 范东睿. 面向高通量计算机的图算法优化技术[J]. 计算机研究与发展, 2020, 57(6): 1152-1163. DOI: 10.7544/issn1000-1239.2020.20200115
    引用本文: 张承龙, 曹华伟, 王国波, 郝沁汾, 张洋, 叶笑春, 范东睿. 面向高通量计算机的图算法优化技术[J]. 计算机研究与发展, 2020, 57(6): 1152-1163. DOI: 10.7544/issn1000-1239.2020.20200115
    Zhang Chenglong, Cao Huawei, Wang Guobo, Hao Qinfen, Zhang Yang, Ye Xiaochun, Fan Dongrui. Efficient Optimization of Graph Computing on High-Throughput Computer[J]. Journal of Computer Research and Development, 2020, 57(6): 1152-1163. DOI: 10.7544/issn1000-1239.2020.20200115
    Citation: Zhang Chenglong, Cao Huawei, Wang Guobo, Hao Qinfen, Zhang Yang, Ye Xiaochun, Fan Dongrui. Efficient Optimization of Graph Computing on High-Throughput Computer[J]. Journal of Computer Research and Development, 2020, 57(6): 1152-1163. DOI: 10.7544/issn1000-1239.2020.20200115

    面向高通量计算机的图算法优化技术

    Efficient Optimization of Graph Computing on High-Throughput Computer

    • 摘要: 随着互联网技术的蓬勃发展,图数据的规模呈爆炸式增长.如何高效地处理大规模图数据逐渐成为工业界和学术界关注的焦点.宽度优先搜索算法是解决图遍历问题的经典算法,也是Graph500基准的核心测试程序之一.高通量计算机采用ARM架构的众核体系结构,具有高并发、强实时、低功耗等适于大数据计算的特点.在单节点上,BFS算法的优化已取得一系列进展,首先对现有的优化技术进行系统的介绍,并在此基础上提出2种面向高通量计算机的优化手段,通过减少冗余访存和提高缓存局部性,有效提高了算法的访存效率.通过这些优化手段,在高通量计算机上对BFS算法的性能进行了系统的评估.对于顶点规模为230的Kronecker图(顶点数为230,边数为234),优化后的BFS算法在高通量计算机上的平均性能为24.26 GTEPS.与两路x86架构服务器相比,单节点具有1.18倍的性能优势.在性能功耗比方面,高通量计算机的结果为181.04 MTEPS/W.在2019年6月份的Green Graph500面向大数据集的排行榜上取得第2名的成绩.综上,高通量计算机的高并发和低功耗等特点非常适合处理大规模图计算等数据密集型应用.

       

      Abstract: With the rapid development of computing technology, the scale of graph increases explosively and large-scale graph computing has been the focus in recent years. Breadth first search (BFS) is a classic algorithm to solve graph traverse problem. It is the main kernel of Graph500 benchmark that evaluates the performance of supercomputers and servers in terms of data-intensive applications. High-throughput computer (HTC) adopts ARM-based many-core architecture, which has the characteristics of high concurrency, strong real-time, low-power consumption. The optimization of BFS algorithm has made a series of progress on single-node systems. In this paper, we first introduce parallel BFS algorithm and existing optimizations. Then we propose two optimization techniques for HTC to improve the efficiency of data access and data locality. We systematically evaluate the performance of BFS algorithm on HTC. For the Kronecker graph with 2scale=230whose vertices are 230 and edges are 234, the average performance on HTC is 24.26 GTEPS and 1.18 times faster than the two-way x86 server. In terms of energy efficiency, the result on HTC is 181.04 MTEPS/W and rank 2nd place on the June 2019 Green Graph500 big data list. To our best knowledge, this is the first work that evaluates BFS performance on HTC platform. HTC is suitable for data intensive applications such as large-scale graph computing.

       

    /

    返回文章
    返回