Ocean data assimilation is an effective method to process ocean data by using ocean observation data and ocean numerical model simultaneously, and the processed ocean data is closer to the real ocean situation. Under high resolution, parallel assimilation programs based on LASG/IAP climate ocean model (LICOM) often involve a lot of file reading, communication and calculation. Although these aspects have been optimized in previous studies, these optimization algorithms only remain at the upper level. Without considering the underlying file system and the architecture of the supercomputer cluster, the optimization algorithm has great limitations, so the effect of optimization is not obvious. In this paper, the data characteristics and computing characteristics of ocean data assimilation are combined with the architectural characteristics of the used supercomputer platform . On this basis, combining the temporal locality and spatial locality, a load-balancing strategy based on computing topology, a parallel optimization strategy based on the storage architecture of Lustre parallel file system and the characteristics of supercomputer clusters, and a three-layer overlapping strategy of computing, reading and communication, and writing back are proposed. Finally, we test our algorithm on Tianhe-2 supercomputer cluster using high-resolution datasets. Compared with the existing ocean assimilation program, the overall performance of our algorithm improves by 18 times under 4000 cores. In addition, we also test on the Sugon 7000 supercomputer cluster. The maximum number of DCU cards used in this paper is 4000. Compared with the existing program, the overall performance is improved about 8 times.