Abstract:
Sunway TaihuLight with its sustainable performance achieving 93PFLOPS is now the No.1 supercomputer in the latest Top500 list. It provides a high-level directive language called OpenACC that is compatible with OpenACC 2.0 standard with some customized extensions. GTC-P is a discovery-science-capable real-world application code based on the particle-in-cell (PIC) algorithm that is well-established in the HPC area. Our motivation is to port GTC-P code on TaihuLight supercomputer with OpenACC. Since the Sunway OpenACC compiler cannot deal with the performance bottleneck of GTC-P at present when it is directly ported onto TaihuLight, we have applied three optimizations on an “intermediate” version of the code generated by the compiler: 1) elimination of atomic operations; 2) avoidance of expensive global memory access instructions; 3) addition of SIMD intrinsics manually. The results from our numerical experiments show that these optimizations produce 1.6X and 8.6X speed-up on 64 CPE cores compared with a 1 MPE core for the key charge and push kernel PIC operations respectively. Overall, this accelerator makes the entire GTC-P code faster by a factor of 2.5X. Our findings demonstrate that manual optimizations on the “intermediate” code are important for achieving significant improved performance of PIC applications on TaihuLight with OpenACC.