Abstract:
Achieving a high fraction of performance on super computers is difficult for actual scientific computing applications. An application must be optimized to exploit the characteristics of the architecture, such as inter-node communication, intra-node connection, hierarchy memory structure and the architecture of single processor core, etc. On a cluster comprised of several Intel Xeon multi-core processors, we explore how to improve the instruction-level parallel efficiency of a scientific application on single processor core. Taking Euler program based on a software infrastructure named J adaptive structured meshes applications infrastructure (JASMIN) as an example, we identify the performance hotspots of the application by the performance analysis tools, analyze the performance monitoring data to derive the performance bottlenecks, and tail the code to fit the characteristics of single core architecture. After a few attempts the performance of Euler program improve greatly. The execution time of Gas1dapproxy module of the program is shortened 60%—62%, and the total execution time of program is shortened 21%—34% for a 2D physical model and a 3D physical model respectively. The experiment results show that the pipeline efficiency is one of key factors to achieve higher performance for scientific computing applications. It can be optimized by reducing dependence degree in computation code, decreasing the number of long delay operators, such as replacing division with multiplication.