大规模海洋数据同化的并行优化

蔡迪; 洪学海; 肖俊敏; 谭光明

doi:10.7544/issn1000-1239.202111185

摘要: 海洋数据同化是一种同时利用海洋观测资料和海洋数值模式对海洋数据进行修正的有效方法，经过处理的海洋数据更加接近海洋的真实情况. 在高分辨率下，基于中国科学院大气物理研究所（Institute of Atmospheric Physics，Chinese Academy of Sciences，IAP）和大气科学和地球流体力学数值模拟国家重点实验室（State Key Laboratory Modelling for Atmospheric Sciences and Geophysical Fluid Dynamics，LASG）发展的LASG/IAP气候系统海洋模式（LASG/IAP climate ocean model，LICOM）的同化并行程序往往涉及大量的文件读取、通信和计算，以往的研究虽然对这些方面进行了优化，但是由于优化只是停留在上层算法层面，没有考虑底层的文件系统以及超算集群的架构，因此优化的效果不太明显. 针对以往研究存在的问题，进一步将海洋数据同化的数据特性、计算特性与所使用的超算平台的架构特性相结合，在此基础上结合时间局部性和空间局部性，提出了基于计算拓扑图的负载均衡策略、基于Lustre文件存储架构和超算集群特性的并行优化策略，以及计算、读取通信、写回3层重叠策略. 最后，使用高分辨率数据集，在天河2号超算集群上对所提算法进行了测试. 相比于现有算法，所提的算法在4000核下对总体同化性能上提升了18倍. 另外，还在曙光7000超算集群上开展了测试. 在4000块DCU加速卡上，相比于已有算法，所提算法提升总体计算性能8倍左右.

Abstract: Ocean data assimilation is an effective method to process ocean data by using ocean observation data and ocean numerical model simultaneously, and the processed ocean data is closer to the real ocean situation. Under high resolution, parallel assimilation programs based on LASG/IAP climate ocean model (LICOM) often involve a lot of file reading, communication and calculation. Although these aspects have been optimized in previous studies, these optimization algorithms only remain at the upper level. Without considering the underlying file system and the architecture of the supercomputer cluster, the optimization algorithm has great limitations, so the effect of optimization is not obvious. In this paper, the data characteristics and computing characteristics of ocean data assimilation are combined with the architectural characteristics of the used supercomputer platform . On this basis, combining the temporal locality and spatial locality, a load-balancing strategy based on computing topology, a parallel optimization strategy based on the storage architecture of Lustre parallel file system and the characteristics of supercomputer clusters, and a three-layer overlapping strategy of computing, reading and communication, and writing back are proposed. Finally, we test our algorithm on Tianhe-2 supercomputer cluster using high-resolution datasets. Compared with the existing ocean assimilation program, the overall performance of our algorithm improves by 18 times under 4000 cores. In addition, we also test on the Sugon 7000 supercomputer cluster. The maximum number of DCU cards used in this paper is 4000. Compared with the existing program, the overall performance is improved about 8 times.

大规模海洋数据同化的并行优化

Parallel Optimization for Large-Scale Ocean Data Assimilation