• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Li Leisheng, Wang Chaowei, Ma Zhitao, Huo Zhigang, Tian Rong. petaPar: A Scalable and Fault Tolerant Petascale Free Mesh Simulation System[J]. Journal of Computer Research and Development, 2015, 52(4): 823-832. DOI: 10.7544/issn1000-1239.2015.20131332
Citation: Li Leisheng, Wang Chaowei, Ma Zhitao, Huo Zhigang, Tian Rong. petaPar: A Scalable and Fault Tolerant Petascale Free Mesh Simulation System[J]. Journal of Computer Research and Development, 2015, 52(4): 823-832. DOI: 10.7544/issn1000-1239.2015.20131332

petaPar: A Scalable and Fault Tolerant Petascale Free Mesh Simulation System

More Information
  • Published Date: March 31, 2015
  • With the emergence of petaflops (10\+15 FLOPS) systems, numerical simulation has entered a new era—a times opening a possibility of using 10\+4 to 10\+6 processor cores in one single run of parallel computing. In order to take full advantages of the powerfulness of the petaflops and post-petaflops supercomputing infrastructures, two aspects of grand challenges including the scalability and the fault tolerance must be addressed in a domain application. petaPar is a highly scalable and fault tolerant meshfree/particle simulation code dedicated to petascale computing. Two popular particle methods, smoothed particle hydrodynamics (SPH) and material point method (MPM), are implemented in a unified object-oriented framework. The parallelization of both SPH and MPM consistently starts from the domain decomposition of a regular background grid. The scalability of the code is assured by fully overlapping the inter-MPI process communication with computation and a dynamic load balance strategy. petaPar supports both flat MPI and MPI+Pthreads hierarchial parallelization. Application-specific lightweight checkpointing is used in petaPar to deal with the issue of fault tolerance. petaPar is designed to be able to automatically self-restart from any number of MPI processes, allow a dynamic change of computing resources arisen in a scenario of, for example, nodal failure and connection timeout etc. Experiments are performed on the Titan petaflops supercomputer. It is shown that petaPar linearly scales up to 2.6×10\+5 CPU cores with the excellent parallel efficiency of 100% and 96% for the multithreaded SPH and the multithreaded MPM, respectively, and the performance of the multithreaded SPH is improved by up to 30% compared with the flat MPI implementation.
  • Related Articles

    [1]Liu Xu, Yang Zhang, Yang Yang. A Nested Partitioning Load Balancing Algorithm for Tianhe-2[J]. Journal of Computer Research and Development, 2018, 55(2): 418-425. DOI: 10.7544/issn1000-1239.2018.20160877
    [2]Li Qi, Zhong Jiang, Li Xue. DyBGP: A Dynamic-Balanced Algorithm for Graph Partitioning Based on Heuristic Strategies[J]. Journal of Computer Research and Development, 2017, 54(12): 2851-2857. DOI: 10.7544/issn1000-1239.2017.20160690
    [3]Li Zhanhui, Liu Chang, Meng Jianyi, Yan Xiaolang. Cache Load Balancing Oriented Dynamic Binary Translation[J]. Journal of Computer Research and Development, 2015, 52(9): 2105-2113. DOI: 10.7544/issn1000-1239.2015.20140220
    [4]Zhang Lilun, Ye Hong, Wu Jianping, Song Junqiang. Parallel Load-Balancing Performance Analysis Based on Maximal Ratio of Load Offset[J]. Journal of Computer Research and Development, 2010, 47(6).
    [5]Ren Juan and Qiu Zhengding. Load-Balancing Routing Based on Path Metric for Multi-Channel Wireless Mesh Networks[J]. Journal of Computer Research and Development, 2008, 45(12): 2079-2086.
    [6]Chai Yunpeng, Gu Lei, and Li Sanli. Cluster-Based Edge Streaming Server with Adaptive Load Balance in Mobile Grid[J]. Journal of Computer Research and Development, 2007, 44(12): 2136-2142.
    [7]Li Zhenyu, Xie Gaogang. A Load Balancing Algorithm for DHT-Based P2P Systems[J]. Journal of Computer Research and Development, 2006, 43(9): 1579-1585.
    [8]Tian Junfeng, Liu Yuling, and Du Ruizhong. Research of a Load Balancing Model Based on Mobile Agent[J]. Journal of Computer Research and Development, 2006, 43(9): 1571-1578.
    [9]Ou Xinliang, Chen Songqiao, Chang Zhiming. A Parallel Geometric Correction Algorithm Based on Dynamic Division-Point Computing[J]. Journal of Computer Research and Development, 2006, 43(6): 1115-1121.
    [10]Falah Mousa Falah ALALI (Jordan). A New Technique of Demultiplexing Distributed Packet LoadBalancing for Parallel Packet Switch[J]. Journal of Computer Research and Development, 2005, 42(12): 2077-2083.

Catalog

    Article views (1490) PDF downloads (617) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return