高级检索

    面向天河系统的回旋动理学模拟代码异构性能优化

    Heterogeneous Programming and Optimization of Gyrokinetic Simulation Code on Tianhe Supercomputer

    • 摘要: 磁约束核聚变粒子网格法(particle in cell,PIC)回旋动理学模拟代码VirtEx具备研究聚变产物Alpha粒子约束及输运的能力,该研究是聚变能实现的关键. Alpha粒子回旋动理学模拟相比电子模拟,访存量更大、更复杂,同时包含非规则访问和原子写操作,对访存性能依赖较高,在面向高计算密度特征的天河新一代超算平台异构处理器MT-3000进行移植及性能优化时具有巨大挑战. 考虑到异构加速器架构和PIC算法特性,设计并实施了一些优化方法,如中间变量的即时计算、定制化的软件缓存设计、缓存空间局部性优化、热点函数合并,显著地提高了热点函数的计算访存比. 通过中等规模的回旋动理学离子基准算例测试显示了在热点函数PushLocateCharge上分别有10.9,13.3,16.2倍的速度提升,同时在扩展性测试中,3 840个节点的5 898 240个加速核上显示了良好的扩展性,并行效率为88.4%.

       

      Abstract: The magnetic confinement fusion particle-in-cell (PIC) gyrokinetic simulation code, VirtEx, has been capable of studying the confinement and transport of the fusion product Alpha, which is the key to fusion energy realization. Alpha particle simulation relies heavily on the computational code of the kinetic ion, which has more complex memory access than the electron, and contains both non-regular accesses and atomic write-back operations, belong to memory-intensive application. MT-3000 as a new heterogeneous acceleration device, provided by Tianhe's new-generation supercomputing platform, which have powerful computational performance with its extremely high computational density. Heterogeneous porting of alpha particle simulations for this device is a great challenge, in order to fully exploit the computational power of the acceleration array in MT-3000, we combine application characteristics propose some optimization methods, such as recalculation of intermediate variables, customized software cache design, memory locality optimization, and hotspot function merging, are designed and implemented to reduce the total amount of memory accesses in the program. The medium scale benchmark with gyrokinetic ion shows an overall speedup of 4.2 times, with 10.9, 13.3 and 16.2 times of speedup on hotspot functions Push, Locate and Charge, respectively, meanwhile it shows a good scaling of scalability with 88.4% efficiency with 5 898 240 accelerator cores in 3 840 nodes.

       

    /

    返回文章
    返回