Advanced Search
    Xiao Qian, Zhao Meijia, Li Mingfan, Shen Li, Chen Junshi, Zhou Wenhao, Wang Fei, An Hong. A Dataflow Computing System for New Generation of Domestic Heterogeneous Many-Core Processors[J]. Journal of Computer Research and Development, 2023, 60(10): 2405-2417. DOI: 10.7544/issn1000-1239.202220562
    Citation: Xiao Qian, Zhao Meijia, Li Mingfan, Shen Li, Chen Junshi, Zhou Wenhao, Wang Fei, An Hong. A Dataflow Computing System for New Generation of Domestic Heterogeneous Many-Core Processors[J]. Journal of Computer Research and Development, 2023, 60(10): 2405-2417. DOI: 10.7544/issn1000-1239.202220562

    A Dataflow Computing System for New Generation of Domestic Heterogeneous Many-Core Processors

    • Today, scientific research has moved from the era of computational science to the era of data science. Discovering laws from massive data and breaking through bottlenecks in scientific development are the main goals of the data science paradigm. At the same time, high performance computers are also paying more and more attention on intelligent computing power. Integrating AI algorithms on the basis of traditional high performance computing methods (HPC+AI) is more conducive to solving practical science problems in the era of data science, and can give full play to the intelligent computing power of high performance computers. However, on domestic HPC systems, especially on HPC systems constructed by the new generation of domestic heterogeneous many-core processors, there are many challenges to support HPC+AI programs. In this paper, we propose a data flow computing system for domestic heterogeneous many-core processors, which is called swFLOWpro. The system supports the use of TensorFlow interface to build data flow programs, and realizes many-core parallel acceleration transparent to users, and implements two-level parallel strategy based on the whole processor perspective. Testing on sw26010pro processor, swFLOWpro can get up to 545 times single core group (CG) many-core speedup ratio for typical OP, 346 times for typical deep learning models. Compared with the single CG of sw26010pro, we execute ResNet50 model on all the 6 CGs for one whole processor, and the speedup ration is up to 4.96 times, whose parallel efficiency is 82.6%. Experiments show that swFLOWpro can support the efficient execution of data flow programs represented by deep learning on domestic heterogeneous many-core processors.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return