高级检索
    杨梅芳, 车永刚, 高翔. 基于OpenMP 4.0的发动机燃烧模拟软件异构并行优化[J]. 计算机研究与发展, 2018, 55(2): 400-408. DOI: 10.7544/issn1000-1239.2018.20160872
    引用本文: 杨梅芳, 车永刚, 高翔. 基于OpenMP 4.0的发动机燃烧模拟软件异构并行优化[J]. 计算机研究与发展, 2018, 55(2): 400-408. DOI: 10.7544/issn1000-1239.2018.20160872
    Yang Meifang, Che Yonggang, Gao Xiang. Heterogeneous Parallel Optimization of an Engine Combustion Simulation Application with the OpenMP 4.0 Standard[J]. Journal of Computer Research and Development, 2018, 55(2): 400-408. DOI: 10.7544/issn1000-1239.2018.20160872
    Citation: Yang Meifang, Che Yonggang, Gao Xiang. Heterogeneous Parallel Optimization of an Engine Combustion Simulation Application with the OpenMP 4.0 Standard[J]. Journal of Computer Research and Development, 2018, 55(2): 400-408. DOI: 10.7544/issn1000-1239.2018.20160872

    基于OpenMP 4.0的发动机燃烧模拟软件异构并行优化

    Heterogeneous Parallel Optimization of an Engine Combustion Simulation Application with the OpenMP 4.0 Standard

    • 摘要: LESAP是一个超燃冲压发动机燃烧数值模拟软件,可模拟发动机燃烧室内的燃烧化学反应与超声速流动,具有实际工程应用价值,其计算量巨大.面向通用CPU与Intel集成众核协处理器(many integrated core, MIC)构成的新型异构众核平台,使用新的OpenMP 4.0编程标准,实现了LESAP软件面向异构并行平台的移植,并采用SIMD向量化、数据传输优化、基于网格块划分的负载均衡等技术进行了性能优化.性能测试结果表明异构版本比纯CPU版本性能更佳.在天河二号超级计算机的1个结点(含2个12核的Intel Xeon E5-2692 CPU加3块Intel Xeon Phi 31S1P协处理器)上,对一个实际超燃发动机燃烧数值模拟问题,网格规模为532万单元时,每时间步的平均执行时间从原来纯CPU版的64.72s减少到21.06s,性能加速比达到约3.07.

       

      Abstract: LESAP is a combustion simulation application capable of simulating the chemical reactions and supersonic flows in the scramjet engines. It can be used to solve practical engineering problems and involve a large amount of computations. In this paper, we port and optimize LESAP with the OpenMP 4.0 accelerator model, targeting the heterogeneous many-core platform composed of general CPU and Intel Many Integrated Core (MIC). Based on the application characteristics, a series of techniques are proposed, including OpenMP 4.0 based task offloading, data movement optimization, grid-partition based load-balancing and SIMD optimization. The performance evaluation is done for a real combustion simulation configuration, with 5 320 896 grid cells, on one Tianhe-2 supercomputer node. The results show that the resulting heterogenous code significantly outperforms the original CPU only code. When the heterogenous code runs on two Intel Xeon E5-2692 CPUs and three Intel Xeon Phi 31S1P coprocessors, the runtime per time-steep is reduced from 64.72 seconds to 21.06 seconds. The heterogeneous computing achieves a speedup of 3.07 times over the original code that only runs on the two Intel Xeon E5-2692 CPUs.

       

    /

    返回文章
    返回