ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2018, Vol. 55 ›› Issue (2): 400-408.doi: 10.7544/issn1000-1239.2018.20160872

• 软件技术 • 上一篇    下一篇

基于OpenMP 4.0的发动机燃烧模拟软件异构并行优化

杨梅芳1, 车永刚1,2, 高翔1   

  1. 1(国防科技大学计算机学院 长沙 410073); 2(国防科技大学并行与分布处理重点实验室 长沙 410073) (
  • 出版日期: 2018-02-01
  • 基金资助: 

Heterogeneous Parallel Optimization of an Engine Combustion Simulation Application with the OpenMP 4.0 Standard

Yang Meifang1, Che Yonggang1,2, Gao Xiang1   

  1. 1(College of Computer, National University of Defense Technology, Changsha 410073); 2(Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073)
  • Online: 2018-02-01

摘要: LESAP是一个超燃冲压发动机燃烧数值模拟软件,可模拟发动机燃烧室内的燃烧化学反应与超声速流动,具有实际工程应用价值,其计算量巨大.面向通用CPU与Intel集成众核协处理器(many integrated core, MIC)构成的新型异构众核平台,使用新的OpenMP 4.0编程标准,实现了LESAP软件面向异构并行平台的移植,并采用SIMD向量化、数据传输优化、基于网格块划分的负载均衡等技术进行了性能优化.性能测试结果表明异构版本比纯CPU版本性能更佳.在天河二号超级计算机的1个结点(含2个12核的Intel Xeon E5-2692 CPU加3块Intel Xeon Phi 31S1P协处理器)上,对一个实际超燃发动机燃烧数值模拟问题,网格规模为532万单元时,每时间步的平均执行时间从原来纯CPU版的64.72s减少到21.06s,性能加速比达到约3.07.

关键词: 发动机燃烧数值模拟, 异构众核平台, Intel 集成众核, OpenMP4.0, 性能优化

Abstract: LESAP is a combustion simulation application capable of simulating the chemical reactions and supersonic flows in the scramjet engines. It can be used to solve practical engineering problems and involve a large amount of computations. In this paper, we port and optimize LESAP with the OpenMP 4.0 accelerator model, targeting the heterogeneous many-core platform composed of general CPU and Intel Many Integrated Core (MIC). Based on the application characteristics, a series of techniques are proposed, including OpenMP 4.0 based task offloading, data movement optimization, grid-partition based load-balancing and SIMD optimization. The performance evaluation is done for a real combustion simulation configuration, with 5 320 896 grid cells, on one Tianhe-2 supercomputer node. The results show that the resulting heterogenous code significantly outperforms the original CPU only code. When the heterogenous code runs on two Intel Xeon E5-2692 CPUs and three Intel Xeon Phi 31S1P coprocessors, the runtime per time-steep is reduced from 64.72 seconds to 21.06 seconds. The heterogeneous computing achieves a speedup of 3.07 times over the original code that only runs on the two Intel Xeon E5-2692 CPUs.

Key words: combustion simulation, heterogeneous many-core platform, Intel MIC, OpenMP 4.0, performance optimization