Abstract:
Weather forecast, atmosphere or ocean simulations have much output data during the iterative computation for the intermediate status or check point. However, an unreasonable output design limits the performance of the earth science application in large-scale parallel computation. In this paper, we propose an overlap store optimization to solve this problem. The key issue of this overlap store optimization is setting some I/O processes to hide the I/O cost. This optimization has three main advantages: first, we hide the I/O operation through the overlap of output and computing; second, we limit the cost of gather operation, break though the bottleneck of gather communication bandwidth and memory size; third, the I/O process is flexible to use different high-performance parallel I/O API. We use this method to optimize WRF, ROMS_AGRIF and GRAPES in Tianhe II super computer, and test their performance after the optimization. The result of the tests shows that we obtain about 30% to 900% improvement in the peak. We also discuss the best proportion of computer process and I/O process when the total number of processes is fixed. The optimized version is very easy to used, and the only cost is the scientists need to setup two more variables in the namelist.