高级检索
    罗红兵 张晓霞 王 伟 武林平. 科学计算应用程序单核指令级优化研究[J]. 计算机研究与发展, 2014, 51(6): 1263-1269.
    引用本文: 罗红兵 张晓霞 王 伟 武林平. 科学计算应用程序单核指令级优化研究[J]. 计算机研究与发展, 2014, 51(6): 1263-1269.
    Luo Hongbing, Zhang Xiaoxia, Wang Wei, and Wu Linping. Instruction Level Parallel Optimizing for Scientific Computing Application[J]. Journal of Computer Research and Development, 2014, 51(6): 1263-1269.
    Citation: Luo Hongbing, Zhang Xiaoxia, Wang Wei, and Wu Linping. Instruction Level Parallel Optimizing for Scientific Computing Application[J]. Journal of Computer Research and Development, 2014, 51(6): 1263-1269.

    科学计算应用程序单核指令级优化研究

    Instruction Level Parallel Optimizing for Scientific Computing Application

    • 摘要: 尽管高性能计算机性能提升越来越快,但科学计算应用程序获得同步的性能提升是很困难的.提高科学计算应用程序的执行性能,需要依照高性能计算机体系结构的特点进行针对性的优化,其中单核指令级优化是科学计算应用程序性能优化的重要方面之一.以基于JASMIN(J adaptive structured meshes applications infrastructure)框架实现的Euler程序为例,探讨了科学计算应用程序在Intel Xeon微处理器平台上的具体性能问题和指令级并行性能优化方法,并较大幅度地优化了Euler程序的单核性能.程序优化后,二维和三维两个物理模型计算的总运行时间比优化前减少了21%~34%,核心模块Gas1dapproxy的执行时间缩短了50%以上.性能优化实验表明:流水线效率已成为影响科学计算类实际应用程序计算效率的重要因素,需要通过降低计算语句的依赖度、减少长延迟计算数量等方法予以改进.

       

      Abstract: Achieving a high fraction of performance on super computers is difficult for actual scientific computing applications. An application must be optimized to exploit the characteristics of the architecture, such as inter-node communication, intra-node connection, hierarchy memory structure and the architecture of single processor core, etc. On a cluster comprised of several Intel Xeon multi-core processors, we explore how to improve the instruction-level parallel efficiency of a scientific application on single processor core. Taking Euler program based on a software infrastructure named J adaptive structured meshes applications infrastructure (JASMIN) as an example, we identify the performance hotspots of the application by the performance analysis tools, analyze the performance monitoring data to derive the performance bottlenecks, and tail the code to fit the characteristics of single core architecture. After a few attempts the performance of Euler program improve greatly. The execution time of Gas1dapproxy module of the program is shortened 60%—62%, and the total execution time of program is shortened 21%—34% for a 2D physical model and a 3D physical model respectively. The experiment results show that the pipeline efficiency is one of key factors to achieve higher performance for scientific computing applications. It can be optimized by reducing dependence degree in computation code, decreasing the number of long delay operators, such as replacing division with multiplication.

       

    /

    返回文章
    返回