There still exist great challenges when simulating the large-scale computational fluid dynamics (CFD) applications on the contemporary supercomputer systems with many-core heterogeneous architecture like Tianhe-2, which is also one of the research hotspots in this field. In this paper, we focus on exploring the techniques of efficient parallel simulations on the heterogeneous high-performance computing (HPC) platform for large-scale CFD applications with high-order accurate scheme. Some approaches and strategies of performance optimization matched with both the characteristic of CFD application and the architectures of heterogeneous HPC platform are proposed from the perspective of task decomposition, exploration of parallelism, optimization for multi-threaded running, vectorization by employing single-instruction multiple-data (SIMD), optimization for the cooperation of both CPUs and co-processors, and so on. To evaluate the performance of these techniques, some numerical experiments are performed on Tianhe-2
,supercomputer system with the maximum number of grid points achieving 1.228×1011, and the total amount of processors and/or co-processors being 590000. Such a large-scale CFD simulation with high-order accurate scheme has to our best knowledge never been attempted before. It shows that the optimized code can get the speedup of 2.6X on CPU and co-processor hybrid platform than that on the CPU platform only, and perfect scalability is also observed from the test results. The present work redefines the frontier of high performance computing for fluid dynamics simulations on heterogeneous platform.