高级检索
    刘旭, 杨章, 杨扬. 针对天河2号的一种嵌套剖分负载平衡算法[J]. 计算机研究与发展, 2018, 55(2): 418-425. DOI: 10.7544/issn1000-1239.2018.20160877
    引用本文: 刘旭, 杨章, 杨扬. 针对天河2号的一种嵌套剖分负载平衡算法[J]. 计算机研究与发展, 2018, 55(2): 418-425. DOI: 10.7544/issn1000-1239.2018.20160877
    Liu Xu, Yang Zhang, Yang Yang. A Nested Partitioning Load Balancing Algorithm for Tianhe-2[J]. Journal of Computer Research and Development, 2018, 55(2): 418-425. DOI: 10.7544/issn1000-1239.2018.20160877
    Citation: Liu Xu, Yang Zhang, Yang Yang. A Nested Partitioning Load Balancing Algorithm for Tianhe-2[J]. Journal of Computer Research and Development, 2018, 55(2): 418-425. DOI: 10.7544/issn1000-1239.2018.20160877

    针对天河2号的一种嵌套剖分负载平衡算法

    A Nested Partitioning Load Balancing Algorithm for Tianhe-2

    • 摘要: 天河2号等亿亿次计算机上的大规模异构协同计算对负载平衡算法提出了3方面要求:低算法复杂度、适应多级嵌套的数据传输系统和支撑异构协同计算.通过组合3级嵌套负载平衡算法框架、贪婪剖分算法和内外子区域剖分算法,设计了一种能够同时满足这3方面要求的负载平衡算法.模型测试表明,算法可以达到90%以上的负载平衡效率.天河2号上32个节点的测试表明,算法能够保证通信开销较小.5个典型应用在天河2号上最大93.6万核的测试表明,算法能够支撑应用高效扩展,并行效率最高可达80%.

       

      Abstract: As energy consumption becomes a major design concern of supercomputers, three design trends emerge in supercomputer architectures: massive parallelism, deep memory and network hierarchy, and heterogeneous computing. Large scale computing on such supercomputers as Tianhe-2 requires the load balancing algorithms with three properties: fast, minimal data movement cost, and load balance among heterogeneous devices such as CPU cores and accelerators. On the other hand, multi-physics and multi-scale applications are becoming ubiquitous for many challenge scientific simulations, which results in non-uniform load distribution and demands powerful load balancing algorithms. In this paper, we propose a load balancing algorithm with the above properties by combining a nested partitioning scheme, a greedy partitioning algorithm and an inner-outer subdomain partitioning algorithm. Model experiment shows our algorithm can guarantee good load balance efficiency. Furthermore, experiment on Tianhe-2 with 32 nodes shows our algorithm is able to achieve low communication cost. Finally, experiments of 5 real applications on Tianhe-2 with 936 thousand CPU and MIC cores show that, our algorithm can support large scale simulations efficiently.

       

    /

    返回文章
    返回