• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Xiao Qian, Zhao Meijia, Li Mingfan, Shen Li, Chen Junshi, Zhou Wenhao, Wang Fei, An Hong. A Dataflow Computing System for New Generation of Domestic Heterogeneous Many-Core Processors[J]. Journal of Computer Research and Development, 2023, 60(10): 2405-2417. DOI: 10.7544/issn1000-1239.202220562
Citation: Xiao Qian, Zhao Meijia, Li Mingfan, Shen Li, Chen Junshi, Zhou Wenhao, Wang Fei, An Hong. A Dataflow Computing System for New Generation of Domestic Heterogeneous Many-Core Processors[J]. Journal of Computer Research and Development, 2023, 60(10): 2405-2417. DOI: 10.7544/issn1000-1239.202220562

A Dataflow Computing System for New Generation of Domestic Heterogeneous Many-Core Processors

More Information
  • Author Bio:

    Xiao Qian: born in 1988. PhD candidate. His main research interests include compiler optimization, data flow computing, and AI framework

    Zhao Meijia: born in 1992. Master. Her main research interest includes AI framework

    Li Mingfan: born in 1995. PhD candidate. His research interests include dataflow system, parallel and distributed computing in heterogeneous environments

    Shen Li: born in 1981. PhD candidate. Her main research interests include compiler and AI basic software

    Chen Junshi: born in 1990. PhD. His main research interests include high performance computing, general processor architecture, and optimization of scientific applications on large scale systems

    Zhou Wenhao: born in 1992. Master. His main research interests include compiler optimization and many-core program environment

    Wang Fei: born in 1981. PhD candidate. His main research interests include compiler optimization and many-core program environment

    An Hong: born in 1963. PhD, professor, PhD supervisor. Her main research interests include parallel computing system and many-core chip architecture

  • Received Date: June 15, 2022
  • Revised Date: January 15, 2023
  • Available Online: May 22, 2023
  • Today, scientific research has moved from the era of computational science to the era of data science. Discovering laws from massive data and breaking through bottlenecks in scientific development are the main goals of the data science paradigm. At the same time, high performance computers are also paying more and more attention on intelligent computing power. Integrating AI algorithms on the basis of traditional high performance computing methods (HPC+AI) is more conducive to solving practical science problems in the era of data science, and can give full play to the intelligent computing power of high performance computers. However, on domestic HPC systems, especially on HPC systems constructed by the new generation of domestic heterogeneous many-core processors, there are many challenges to support HPC+AI programs. In this paper, we propose a data flow computing system for domestic heterogeneous many-core processors, which is called swFLOWpro. The system supports the use of TensorFlow interface to build data flow programs, and realizes many-core parallel acceleration transparent to users, and implements two-level parallel strategy based on the whole processor perspective. Testing on sw26010pro processor, swFLOWpro can get up to 545 times single core group (CG) many-core speedup ratio for typical OP, 346 times for typical deep learning models. Compared with the single CG of sw26010pro, we execute ResNet50 model on all the 6 CGs for one whole processor, and the speedup ration is up to 4.96 times, whose parallel efficiency is 82.6%. Experiments show that swFLOWpro can support the efficient execution of data flow programs represented by deep learning on domestic heterogeneous many-core processors.

  • [1]
    Jia Weile, Wang Han, Chen Mohan, et al. Pushing the limit of molecular dynamic with abinitio accuracy to 100 million atoms with machine learning[C] //Proc of the 33rd Int Conf for High Performance Computing, Networking, Storge and Analysis(SC’20) . Piscataway, NJ: IEEE, 2020: 1−14
    [2]
    Abadi M, Barham P, Chen Jianmin , et al. TensorFlow: A system for large-scale machine learning[C] //Proc of the 12th USENIX Symp on Operating Ssytems Design and Implementation(OSDI’2016). Berkeley, CA: USENIX Association, 2016: 265−283
    [3]
    Paszke A, Gross S, Massa F, et al. Pytorch: An imperative style high-performance deep learning libray[J]. Advances in Neural Information Processing Systems, 2019, 32: 8026−8037
    [4]
    Liu Yong, Liu Xin, Li Fang, et al. Closing the “quantum supremacy” gap: Achieving real-time simulation of a random quantum circuit using a new Sunway supercomputer [C/OL] //Proc of the 34th Int Conf for High Performance Computing, Networking, Storge and Analysis(SC’21) . Piscataway, NJ: IEEE, 2021[2023-01-11].https://arvix.org/abs/2110.14502
    [5]
    胡向东,柯希明,尹飞,等. 高性能众核处理器申威26010[J]. 计算机研究与发展,2021,58(6):1155−1165 doi: 10.7544/issn1000-1239.2021.20201041

    Hu Xiangdong, Ke Ximing, Yin Fei, et al. Shenwei-26010: A high-performance many-core prcocessor[J]. Journal of Computer Research and Development, 2021, 58(6): 1155−1165 (in Chinese) doi: 10.7544/issn1000-1239.2021.20201041
    [6]
    Tony N, Vinay G, Karthikeyan S, et al. Exploring the potential of heterogeneous von neumann/dataflow execution models[C] //Proc of the 42nd Annual Int Symp on Computer Architecture(ISCA’15). Piscataway, NJ: IEEE, 2015: 298−310
    [7]
    Sankaralingam K, Nagarajan R, Liu Haiming, et al. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture[C] //Proc of the 30th Annual Int Symp on Computer Architecture. Piscataway, NJ: IEEE , 2003: 422−433
    [8]
    Burger D, Keckler S W, Mckinley K S, et al. Scaling to the end of silicon with EDGE architectures[J]. Computer, 2004, 37(7): 44−55 doi: 10.1109/MC.2004.65
    [9]
    Zuckerman S, Suetterlein J, Knauerhase R, et al. Using a Codelet program execution model for exascale machines [C] //Proc of the 1st Int Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era. New York: ACM, 2011: 64−69
    [10]
    Lauderdale C, Khan R. Towards a Codelet-based runtime for exascale computing[C]//Proce of the 2nd Int Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era. New York: ACM, 2012: 21−26
    [11]
    Su Zhichao, Chen Junshi, Lin Han, et al. A dataflow-based runtime support on a 100P actual system [C] //Proc of the 15th Int Symp on Parallel and Distributed Processing with Applications. Piscataway, NJ: IEEE, 2017: 599−606
    [12]
    Zhao Wenlai, Fu Haohuan, Fang Jiarui, et al. Optimizing convolutional neural networks on the Sunway Taihulight supercomputer[J]. ACM Transactions on Architecture and Code Optimization, 2018, 15(1): 13−25
    [13]
    Li Liandeng, Fang Jiarui, Fu Haohuan, et al. swCaffe: A parallel framework for accelerating deep learning applications on Sunway Taihulight[C] //Proc of the 20th IEEE Int Conf on Cluster Computing (CLUSTER). Piscataway, NJ: IEEE, 2018: 413−422
    [14]
    Jia Yangqing, Shelhamer E, Donahue J, et al. Caffe: Convolutional architecture for fast feature embedding[C]// Proc of the 22nd ACM Int Conf on Multimedia. New York: ACM, 2014: 675−678
    [15]
    Li Mingfan, Lin Han, Chen Junshi, et al. swFLOW: A large-scale distributed framework for deep learning on Sunway Taihulight supercomputer[J]. Information Sciences, 2021, 570(9): 831−847
    [16]
    Deepak N, Mohammad S, Jared C, et al. Efficient large-scale language model training on GPU clusters using megatron-LM [C] //Proc of the 34th Int Conf for High Performance Computing, Networking, Storage and Analysis. (SC’21). Piscataway, NJ: IEEE, 2021: 401−412
    [17]
    Xu Yuanzhong, HyoukJoong L, Chen Dehao, et al. Gspmd: General and scalable parallelization for ml computation graphs [J]. arXiv preprint, arXiv: 2105.04663, 2021
    [18]
    Fan Shiqing, Yi Rong, Meng Chen, et al. DAPPLE: A pipelined data parallel approach for training large models [C] //Proc of the 26th ACM SIGPLAN Symp on Principles and Practice of Parallel Programming. New York: ACM, 2021: 431–445
    [19]
    Zheng Lianmin, Li Zhuohan, Zhang Hao, et al. Alpa: Automating inter- and intra-operator parallelism for distributed deep learning[C] //Proc of the 16th USENIX Symp on Operating Systems Design and Implementation. New York: ACM, 2022: 559–578
    [20]
    Krizhevsky A, Sutskever I, Hinton G, et al. ImageNet classification with deep convolutional neural networks[C] //Proc of the 26th Annual Conf on Neural Information Processing Systems. Cambridge, MA: MIT, 2012: 1097−1105
    [21]
    Alexis C, Holger S, LeCun Y, et al. Very deep convolutional networks for large-scale image recognition[C]//Proc of the 15th European Chapter of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2017: 1107−1116
    [22]
    He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Deep residual learning for image recognition[C] //Proc of the 33rd IEEE Conf on Computer Vision and Pattern Recognition(CVPR). Piscataway, NJ: IEEE, 2016: 770−778
    [23]
    Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the Inception architecture for computer vision[C] //Proc of the 33rd IEEE Conf on Computer Vision and Pattern Recognition(CVPR). Piscataway, NJ: IEEE, 2016: 551−561
    [24]
    Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, inception-resnet and the impact of residual connections on learning[J]. arXiv preprint, arXiv: 1602.07261, 2016
  • Related Articles

    [1]Wang Chenze, Shen Xuehao, Huang Zhenli, Wang Zhengxia. Interactive Visualization Framework for Panoramic Super-Resolution Images Based on Localization Data[J]. Journal of Computer Research and Development, 2024, 61(7): 1741-1753. DOI: 10.7544/issn1000-1239.202330643
    [2]Fan Wei, Liu Yong. Social Network Information Diffusion Prediction Based on Spatial-Temporal Transformer[J]. Journal of Computer Research and Development, 2022, 59(8): 1757-1769. DOI: 10.7544/issn1000-1239.20220064
    [3]Zhou Weilin, Yang Yuan, Xu Mingwei. Network Function Virtualization Technology Research[J]. Journal of Computer Research and Development, 2018, 55(4): 675-688. DOI: 10.7544/issn1000-1239.2018.20170937
    [4]Yang Shuaifeng, Zhao Ruizhen. Image Super-Resolution Reconstruction Based on Low-Rank Matrix and Dictionary Learning[J]. Journal of Computer Research and Development, 2016, 53(4): 884-891. DOI: 10.7544/issn1000-1239.2016.20140726
    [5]Dou Nuo, Zhao Ruizhen, Cen Yigang, Hu Shaohai, Zhang Yongdong. Noisy Image Super-Resolution Reconstruction Based on Sparse Representation[J]. Journal of Computer Research and Development, 2015, 52(4): 943-951. DOI: 10.7544/issn1000-1239.2015.20140047
    [6]Yang Xin, Zhou Dake, Fei Shumin. A Self-Adapting Bilateral Total Variation Technology for Image Super-Resolution Reconstruction[J]. Journal of Computer Research and Development, 2012, 49(12): 2696-2701.
    [7]Wang Kai, Hou Zifeng. A Relaxed Co-Scheduling Method of Virtual CPUs on Xen Virtual Machines[J]. Journal of Computer Research and Development, 2012, 49(1): 118-127.
    [8]Wang Dan, Feng Dengguo, and Xu Zhen. An Approach to Data Sealing Based on Trusted Virtualization Platform[J]. Journal of Computer Research and Development, 2009, 46(8): 1325-1333.
    [9]Xiao Chuangbai, Yu Jing, Xue Yi. A Novel Fast Algorithm for MAP Super-Resolution Image Reconstruction[J]. Journal of Computer Research and Development, 2009, 46(5): 872-880.
    [10]Huang Hua, Fan Xin, Qi Chun, and Zhu Shihua. Face Image Super-Resolution Reconstruction Based on Recognition and Projection onto Convex Sets[J]. Journal of Computer Research and Development, 2005, 42(10): 1718-1725.
  • Cited by

    Periodical cited type(1)

    1. 刘韵洁,汪硕,黄韬,王佳森. 数算融合网络技术发展研究. 中国工程科学. 2025(01): 1-13 .

    Other cited types(0)

Catalog

    Article views (357) PDF downloads (174) Cited by(1)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return