ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2018, Vol. 55 ›› Issue (2): 418-425.doi: 10.7544/issn1000-1239.2018.20160877

• 软件技术 • 上一篇    下一篇

针对天河2号的一种嵌套剖分负载平衡算法

刘旭1,2, 杨章1,2, 杨扬2   

  1. 1(计算物理重点实验室(北京应用物理与计算数学研究所) 北京 100094); 2(北京应用物理与计算数学研究所高性能计算中心 北京 100094) (liu_xu@iapcm.ac.cn)
  • 出版日期: 2018-02-01
  • 基金资助: 
    国家自然科学基金重大研究计划重点项目(91430218)

A Nested Partitioning Load Balancing Algorithm for Tianhe-2

Liu Xu1,2, Yang Zhang1,2, Yang Yang2   

  1. 1(Laboratory of Computational Physics (Institute of Applied Physics and Computational Mathematics),Beijing 100094); 2(High Performance Computing Center,Institute of Applied Physics and Computational Mathematics,Beijing 100094)
  • Online: 2018-02-01

摘要: 天河2号等亿亿次计算机上的大规模异构协同计算对负载平衡算法提出了3方面要求:低算法复杂度、适应多级嵌套的数据传输系统和支撑异构协同计算.通过组合3级嵌套负载平衡算法框架、贪婪剖分算法和内外子区域剖分算法,设计了一种能够同时满足这3方面要求的负载平衡算法.模型测试表明,算法可以达到90%以上的负载平衡效率.天河2号上32个节点的测试表明,算法能够保证通信开销较小.5个典型应用在天河2号上最大93.6万核的测试表明,算法能够支撑应用高效扩展,并行效率最高可达80%.

关键词: 并行计算, 负载平衡, 异构协同计算, 天河2号, 至强融核协处理器

Abstract: As energy consumption becomes a major design concern of supercomputers, three design trends emerge in supercomputer architectures: massive parallelism, deep memory and network hierarchy, and heterogeneous computing. Large scale computing on such supercomputers as Tianhe-2 requires the load balancing algorithms with three properties: fast, minimal data movement cost, and load balance among heterogeneous devices such as CPU cores and accelerators. On the other hand, multi-physics and multi-scale applications are becoming ubiquitous for many challenge scientific simulations, which results in non-uniform load distribution and demands powerful load balancing algorithms. In this paper, we propose a load balancing algorithm with the above properties by combining a nested partitioning scheme, a greedy partitioning algorithm and an inner-outer subdomain partitioning algorithm. Model experiment shows our algorithm can guarantee good load balance efficiency. Furthermore, experiment on Tianhe-2 with 32 nodes shows our algorithm is able to achieve low communication cost. Finally, experiments of 5 real applications on Tianhe-2 with 936 thousand CPU and MIC cores show that, our algorithm can support large scale simulations efficiently.

Key words: parallel computing, load balance, heterogeneous computing, Tianhe-2, MIC

中图分类号: