针对天河2号的一种嵌套剖分负载平衡算法

刘旭; 杨章; 杨扬

doi:10.7544/issn1000-1239.2018.20160877

针对天河2号的一种嵌套剖分负载平衡算法

A Nested Partitioning Load Balancing Algorithm for Tianhe-2

摘要

摘要: 天河2号等亿亿次计算机上的大规模异构协同计算对负载平衡算法提出了3方面要求：低算法复杂度、适应多级嵌套的数据传输系统和支撑异构协同计算.通过组合3级嵌套负载平衡算法框架、贪婪剖分算法和内外子区域剖分算法，设计了一种能够同时满足这3方面要求的负载平衡算法.模型测试表明，算法可以达到90%以上的负载平衡效率.天河2号上32个节点的测试表明，算法能够保证通信开销较小.5个典型应用在天河2号上最大93.6万核的测试表明，算法能够支撑应用高效扩展，并行效率最高可达80%.

Abstract: As energy consumption becomes a major design concern of supercomputers, three design trends emerge in supercomputer architectures: massive parallelism, deep memory and network hierarchy, and heterogeneous computing. Large scale computing on such supercomputers as Tianhe-2 requires the load balancing algorithms with three properties: fast, minimal data movement cost, and load balance among heterogeneous devices such as CPU cores and accelerators. On the other hand, multi-physics and multi-scale applications are becoming ubiquitous for many challenge scientific simulations, which results in non-uniform load distribution and demands powerful load balancing algorithms. In this paper, we propose a load balancing algorithm with the above properties by combining a nested partitioning scheme, a greedy partitioning algorithm and an inner-outer subdomain partitioning algorithm. Model experiment shows our algorithm can guarantee good load balance efficiency. Furthermore, experiment on Tianhe-2 with 32 nodes shows our algorithm is able to achieve low communication cost. Finally, experiments of 5 real applications on Tianhe-2 with 936 thousand CPU and MIC cores show that, our algorithm can support large scale simulations efficiently.

HTML全文

参考文献(0)

施引文献

资源附件(0)