• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Liu Shifang, Zhao Yonghua, Yu Tianyu, Huang Rongfeng. Efficient Implementation of Parallel Symmetric Matrix Tridiagonalization Algorithm on GPU Cluster[J]. Journal of Computer Research and Development, 2020, 57(12): 2635-2647. DOI: 10.7544/issn1000-1239.2020.20190731
Citation: Liu Shifang, Zhao Yonghua, Yu Tianyu, Huang Rongfeng. Efficient Implementation of Parallel Symmetric Matrix Tridiagonalization Algorithm on GPU Cluster[J]. Journal of Computer Research and Development, 2020, 57(12): 2635-2647. DOI: 10.7544/issn1000-1239.2020.20190731

Efficient Implementation of Parallel Symmetric Matrix Tridiagonalization Algorithm on GPU Cluster

Funds: This work was supported by National Key Research and Development Program of China (2017YFB0202202) and the Strategic Priority Research Program of Chinese Academy of Sciences (C) (XDC01040000).
More Information
  • Published Date: November 30, 2020
  • The symmetric matrix tridiagonalization is the key computational process for solving dense eigenproblems. This paper presents the implementation of the dense symmetric matrix tridiagonal hybrid parallel blocked algorithm based on MPI(message passing interface)+CUDA(compute unified device architecture) for GPU cluster. The parallel algorithm design uses a two-level parallel method of MPI cluster level and GPU level. In the MPI-level parallelism, the communication performance of the tridiagonal MPI parallel algorithm is improved by designing the global data communication between the row-column communication domains in the two-dimensional communication domain as a completely parallel point-to-point data communication method. Moreover, by improving the original matrix tridiagonalized MPI parallel algorithm, the use of irregular matrix-vector operation in GPU-level parallelism is avoided, and the parallel performance of this part is improved by about 1 time. What’s more, the small-grained computing existing in the GPU parallel is merged into a larger granularity calculation, which fully utilize the computing power of the GPU by increasing the computational intensity, thereby increasing the utilization of the GPU and improving the performance of the algorithm. In addition, multiple CUDA streams can be used to enable independent CUDA operations in the algorithm to be concurrently executed in different streams. Furthermore, in the parallel algorithm, the asynchronous data transmission between the CPU and the GPU is utilized, so that the data transmission and the kernel function in different streams are simultaneously executed, which hides the time of data transmission and improves the performance of the algorithm. On the supercomputer system Era of the Computer Network Center of the Chinese Academy of Sciences, each compute node is configured with 2 Nvidia Tesla K20 GPGPU cards and 2 Intel E5-2680 V2 processors, we tested the performance of the implementation of the tridiagonalization blocked algorithm with GPGPU cards. It has achieved better acceleration, performance and scalability.
  • Related Articles

    [1]Fu Hao, Long Chun, Gong Liangyi, Wei Jinxia, Huang Pan, Lin Yanzhong, Sun Degang. Malicious Domain Detection Technology Based on Semantic Graph Learning[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440375
    [2]Liu Qixu, Liu Jiaxi, Jin Ze, Liu Xinyu, Xiao Juxin, Chen Yanhui, Zhu Hongwen, Tan Yaokang. Survey of Artificial Intelligence Based IoT Malware Detection[J]. Journal of Computer Research and Development, 2023, 60(10): 2234-2254. DOI: 10.7544/issn1000-1239.202330450
    [3]Pan Jianwen, Cui Zhanqi, Lin Gaoyi, Chen Xiang, Zheng Liwei. A Review of Static Detection Methods for Android Malicious Application[J]. Journal of Computer Research and Development, 2023, 60(8): 1875-1894. DOI: 10.7544/issn1000-1239.202220297
    [4]Fan Zhaoshan, Wang Qing, Liu Junrong, Cui Zelin, Liu Yuling, Liu Song. Survey on Domain Name Abuse Detection Technology[J]. Journal of Computer Research and Development, 2022, 59(11): 2581-2605. DOI: 10.7544/issn1000-1239.20210121
    [5]Yang Zheng, Yin Qilei, Li Haoran, Miao Yuanli, Yuan Dong, Wang Qian, Shen Chao, Li Qi. Study of Wechat Sybil Detection[J]. Journal of Computer Research and Development, 2021, 58(11): 2319-2332. DOI: 10.7544/issn1000-1239.2021.20210461
    [6]Yang Wang, Gao Mingzhe, Jiang Ting. A Malicious Code Static Detection Framework Based on Multi-Feature Ensemble Learning[J]. Journal of Computer Research and Development, 2021, 58(5): 1021-1034. DOI: 10.7544/issn1000-1239.2021.20200912
    [7]Wang Jialai, Zhang Chao, Qi Xuyan, Rong Yi. A Survey of Intelligent Malware Detection on Windows Platform[J]. Journal of Computer Research and Development, 2021, 58(5): 977-994. DOI: 10.7544/issn1000-1239.2021.20200964
    [8]Wang Lina, Tan Cheng, Yu Rongwei, Yin Zhengguang. The Malware Detection Based on Data Breach Actions[J]. Journal of Computer Research and Development, 2017, 54(7): 1537-1548. DOI: 10.7544/issn1000-1239.2017.20160436
    [9]Li Peng, Wang Ruchuan, Wu Ning. Research on Unknown Malicious Code Automatic Detection Based on Space Relevance Features[J]. Journal of Computer Research and Development, 2012, 49(5): 949-957.
    [10]Dai Hua, Qin Xiaolin, and Bai Chuanjie. A Malicious Transaction Detection Method Based on Transaction Template[J]. Journal of Computer Research and Development, 2010, 47(5): 921-929.

Catalog

    Article views (826) PDF downloads (251) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return