太湖之光上利用OpenACC移植和优化GTC-P

王一超; 林新华; 蔡林金; Tang William; Ethier Stephane; 王蓓; 施忠伟; 松岗聪

doi:10.7544/issn1000-1239.2018.20160871

太湖之光上利用OpenACC移植和优化GTC-P

Porting and Optimizing GTC-P on TaihuLight Supercomputer with OpenACC

摘要

摘要: 神威“太湖之光”是最新一期Top500榜单上排名第一的超级计算机，实测峰值性能约93PFLOPS.该系统提供了基于指导语句的并行编程工具OpenACC，兼容OpenACC 2.0编程标准，并添加了部分定制化功能.GTC-P是一个具有重要物理意义的科学应用，算法基于高性能计算领域中被广泛使用的PIC(particle-in-cell)方法.利用神威OpenACC并行编程模型在“太湖之光”上成功移植了GTC-P应用.在移植过程中，鉴于OpenACC编译器尚无法解决的性能瓶颈，提出了3种基于中间代码二次开发的优化方法：1)消除原子操作；2)避免低效的全局访存操作；3)手动添加SIMD intrinsics指令.实验结果表明，在64个从核上相比1个主核，优化后的函数charge和push分别实现了1.6倍和86倍的加速比，同时GTC-P代码整体取得了2.5倍的加速比.优化结果证明了基于中间代码的手动优化对利用神威OpenACC移植的PIC算法在“太湖之光”上的性能提升非常重要.

Abstract: Sunway TaihuLight with its sustainable performance achieving 93PFLOPS is now the No.1 supercomputer in the latest Top500 list. It provides a high-level directive language called OpenACC that is compatible with OpenACC 2.0 standard with some customized extensions. GTC-P is a discovery-science-capable real-world application code based on the particle-in-cell (PIC) algorithm that is well-established in the HPC area. Our motivation is to port GTC-P code on TaihuLight supercomputer with OpenACC. Since the Sunway OpenACC compiler cannot deal with the performance bottleneck of GTC-P at present when it is directly ported onto TaihuLight, we have applied three optimizations on an “intermediate” version of the code generated by the compiler: 1) elimination of atomic operations; 2) avoidance of expensive global memory access instructions; 3) addition of SIMD intrinsics manually. The results from our numerical experiments show that these optimizations produce 1.6X and 8.6X speed-up on 64 CPE cores compared with a 1 MPE core for the key charge and push kernel PIC operations respectively. Overall, this accelerator makes the entire GTC-P code faster by a factor of 2.5X. Our findings demonstrate that manual optimizations on the “intermediate” code are important for achieving significant improved performance of PIC applications on TaihuLight with OpenACC.

HTML全文

参考文献(0)

施引文献

资源附件(0)