petaPar: A Scalable and Fault Tolerant Petascale Free Mesh Simulation System

Li Leisheng; Wang Chaowei; Ma Zhitao; Huo Zhigang; Tian Rong

doi:10.7544/issn1000-1239.2015.20131332

Li Leisheng, Wang Chaowei, Ma Zhitao, Huo Zhigang, Tian Rong. petaPar: A Scalable and Fault Tolerant Petascale Free Mesh Simulation System[J]. Journal of Computer Research and Development, 2015, 52(4): 823-832. DOI: 10.7544/issn1000-1239.2015.20131332

Citation:

petaPar: A Scalable and Fault Tolerant Petascale Free Mesh Simulation System

Graphical Abstract

Abstract

Abstract

With the emergence of petaflops (10\+15 FLOPS) systems, numerical simulation has entered a new era—a times opening a possibility of using 10\+4 to 10\+6 processor cores in one single run of parallel computing. In order to take full advantages of the powerfulness of the petaflops and post-petaflops supercomputing infrastructures, two aspects of grand challenges including the scalability and the fault tolerance must be addressed in a domain application. petaPar is a highly scalable and fault tolerant meshfree/particle simulation code dedicated to petascale computing. Two popular particle methods, smoothed particle hydrodynamics (SPH) and material point method (MPM), are implemented in a unified object-oriented framework. The parallelization of both SPH and MPM consistently starts from the domain decomposition of a regular background grid. The scalability of the code is assured by fully overlapping the inter-MPI process communication with computation and a dynamic load balance strategy. petaPar supports both flat MPI and MPI+Pthreads hierarchial parallelization. Application-specific lightweight checkpointing is used in petaPar to deal with the issue of fault tolerance. petaPar is designed to be able to automatically self-restart from any number of MPI processes, allow a dynamic change of computing resources arisen in a scenario of, for example, nodal failure and connection timeout etc. Experiments are performed on the Titan petaflops supercomputer. It is shown that petaPar linearly scales up to 2.6×10\+5 CPU cores with the excellent parallel efficiency of 100% and 96% for the multithreaded SPH and the multithreaded MPM, respectively, and the performance of the multithreaded SPH is improved by up to 30% compared with the flat MPI implementation.

FullText(HTML)

References (0)

Supplements (0)

Cited By

Turn off MathJax

Article Contents

petaPar: A Scalable and Fault Tolerant Petascale Free Mesh Simulation System

Abstract

Catalog

Export File

Citation

Format

Content