Abstract
The magnetic confinement fusion particle-in-cell (PIC) gyrokinetic simulation code, VirtEx, has been capable of studying the confinement and transport of the fusion product Alpha, which is the key to fusion energy realization. Alpha particle simulation relies heavily on the computational code of the kinetic ion, which has more complex memory access than the electron, and contains both non-regular accesses and atomic write-back operations, belong to memory-intensive application. MT-3000 as a new heterogeneous acceleration device, provided by Tianhe's new-generation supercomputing platform, which have powerful computational performance with its extremely high computational density. Heterogeneous porting of alpha particle simulations for this device is a great challenge, in order to fully exploit the computational power of the acceleration array in MT-3000, we combine application characteristics propose some optimization methods, such as recalculation of intermediate variables, customized software cache design, memory locality optimization, and hotspot function merging, are designed and implemented to reduce the total amount of memory accesses in the program. The medium scale benchmark with gyrokinetic ion shows an overall speedup of 4.2 times, with 10.9, 13.3 and 16.2 times of speedup on hotspot functions "push", "locate" and "charge", respectively, meanwhile it shows a good scaling of scalability with 88.4% efficiency with 5,898,240 accelerator cores in 3840 nodes.