Abstract:
Today, scientific research has moved from computational science to the era of data science. Discovering laws from massive data and breaking through bottlenecks in scientific development are the main goals of the data science paradigm. At the same time, high-performance computers (HPC) are also paying more and more attention to intelligent computing power. Integrating AI algorithms on the basis of traditional high performance computing methods (HPC+AI) is more conducive to solving practical science problems in the ear of data science, and can give full play to the intelligent computing power of high-performance computers. However, on domestic HPC systems, especially on HPC systems constructed by the new generation of domestic heterogeneous many-core processors, there are many challenges to support HPC+AI programs. In this paper, we propose a data flow computing system for domestic heterogeneous many-core processors, which is called swFLOWpro. The system supports the use of TensorFlow interface to build data flow programs, realizes many-core parallel acceleration transparent to users, and implements two-level parallel strategy based on the whole processor perspective. Tested on sw26010pro processor, swFLOWpro can get up to 545X single core group(CG) many-core speedup ratio for typical OP, 346X for typical deep learning models. Compared with the single CG of sw26010pro, we execute ResNet50 model on all the 6 CGs for one whole processor, and the speedup ration is up to 4.96X, whose parallel efficiency is 82.6%. Experiments show that swFLOWpro can support the efficient execution of data flow programs represented by deep learning on domestic heterogeneous many-core processors.