ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (4): 823-832.doi: 10.7544/issn1000-1239.2015.20131332

• 系统结构 • 上一篇    下一篇

千万亿次可扩展可容错自由网格数值模拟系统

黎雷生1,2, 王朝尉1, 马志涛1, 霍志刚1, 田荣1   

  1. 1(中国科学院计算技术研究所高性能计算机研究中心 北京 100190); 2(中国科学院大学 北京 100049) (lileisheng@ncic.ac.cn)
  • 出版日期: 2015-04-01
  • 基金资助: 
    基金项目:国家自然科学基金项目(11072241,11111140020,91130026);橡树岭国家实验室/美国国家计算科学中心主任基金项目(MAT028)

petaPar: A Scalable and Fault Tolerant Petascale Free Mesh Simulation System

Li Leisheng1,2,Wang Chaowei1,Ma Zhitao1,Huo Zhigang1,Tian Rong1   

  1. 1(High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190); 2(University of Chinese Academy of Sciences, Beijing 100049)
  • Online: 2015-04-01

摘要: 在千万亿次计算能力的驱动下,数值软件的发展进入了一个以海量并行为基本特征的历史转折期,可扩展和可容错成为大规模数值模拟的两大关键技术.petaPar模拟程序是以对传统数值技术形成优势互补的无网格类方法为切入点,面向千万亿次级计算而开发的下一代新兴通用数值模拟程序.petaPar在统一架构下实现了光滑粒子动力学(smoothed particle hydrodynamics, SPH)和物质点法(material point method, MPM)两种最为成熟和有效的无网格/粒子算法,支持多种强度、失效模型和状态方程;其中MPM支持改进的接触算法,可以处理上百万离散物体的非连续变形和相互作用计算.系统具有以下特点:1)高可扩展.实现单核单Patch极端情形下计算和通信的完全重叠,支持动态负载均衡;2)可容错.支持无人值守变进程重启动,在系统硬件出现局部热故障时可以不中止计算;3)适应硬件体系结构异构架构的变化趋势,同时支持flat MPI和MPI+Pthreads并行模型.程序在Titan千万亿次超级计算机上进行了全系统规模的可扩展性测试,结果表明该代码可线性扩展到26万个CPU核,SPH和MPM的并行效率分别为100%和96%.

关键词: 千万亿次计算, 无网格/粒子模拟, 高可扩展, 高可容错, 多线程, 动态负载平衡

Abstract: With the emergence of petaflops (10\+15 FLOPS) systems, numerical simulation has entered a new era—a times opening a possibility of using 10\+4 to 10\+6 processor cores in one single run of parallel computing. In order to take full advantages of the powerfulness of the petaflops and post-petaflops supercomputing infrastructures, two aspects of grand challenges including the scalability and the fault tolerance must be addressed in a domain application. petaPar is a highly scalable and fault tolerant meshfree/particle simulation code dedicated to petascale computing. Two popular particle methods, smoothed particle hydrodynamics (SPH) and material point method (MPM), are implemented in a unified object-oriented framework. The parallelization of both SPH and MPM consistently starts from the domain decomposition of a regular background grid. The scalability of the code is assured by fully overlapping the inter-MPI process communication with computation and a dynamic load balance strategy. petaPar supports both flat MPI and MPI+Pthreads hierarchial parallelization. Application-specific lightweight checkpointing is used in petaPar to deal with the issue of fault tolerance. petaPar is designed to be able to automatically self-restart from any number of MPI processes, allow a dynamic change of computing resources arisen in a scenario of, for example, nodal failure and connection timeout etc. Experiments are performed on the Titan petaflops supercomputer. It is shown that petaPar linearly scales up to 2.6×10\+5 CPU cores with the excellent parallel efficiency of 100% and 96% for the multithreaded SPH and the multithreaded MPM, respectively, and the performance of the multithreaded SPH is improved by up to 30% compared with the flat MPI implementation.

Key words: petascale computing, meshless/particle simulation, high scalable, fault tolerance, MPI+Pthreads, dynamic load balancing

中图分类号: