• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Xia Tian, Fu Gelin, Qu Shaoru, Luo Zhongpei, Ren Pengju. Optimization of Parallel Computation on Sparse Matrix-Vector Multiplication with High Predictability[J]. Journal of Computer Research and Development, 2023, 60(9): 1973-1987. DOI: 10.7544/issn1000-1239.202330421
Citation: Xia Tian, Fu Gelin, Qu Shaoru, Luo Zhongpei, Ren Pengju. Optimization of Parallel Computation on Sparse Matrix-Vector Multiplication with High Predictability[J]. Journal of Computer Research and Development, 2023, 60(9): 1973-1987. DOI: 10.7544/issn1000-1239.202330421

Optimization of Parallel Computation on Sparse Matrix-Vector Multiplication with High Predictability

Funds: This work was supported by the National Key Research and Development Program of China(2022YFB4500500)and the Key Research and Development Program of Shaanxi Province(2022ZDLGY01-08).
More Information
  • Author Bio:

    Xia Tian: born in 1988. PhD, associate professor. His main research interests include cloud-computing virtualization and innovative parallel computation architecture

    Fu Gelin: born in 1997. PhD candidate. His main research interests include system modeling and optimization, and computer architecture

    Qu Shaoru: born in 2000. Master candidate. His main research interests include high performance computing and computer architecture

    Luo Zhongpei: born in 1999. Master candidate. His main research interests include high performance computing and GPGPU technology

    Ren Pengju: born in 1984. PhD, professor. His main interests include on-chip network and computer architecture

  • Received Date: May 30, 2023
  • Revised Date: July 24, 2023
  • Available Online: July 31, 2023
  • Sparse matrix-vector multiplication (SpMV) has been widely applied in scientific computation, industry simulation and intelligent computation domains, which is the critical algorithm in all these applications. Usually, iterative computation of SpMV is required to fulfill precise numeric simulation, linear algebra solving and graph analytics requirements. However, due to the poor data locality, low cache usage and extreme irregular computation patterns caused by the highly sparse and random distributions, SpMV optimization has become one of the most challenging problems for modern high-performance processors. In this paper, we study the bottlenecks of SpMV on current out-of-order CPUs and propose to improve its performance by pursuing high predictability and low program complexity. Specifically, we improve the memory access regularity and locality by creating serialized access patterns so that the data prefetching efficiency and cache usage are optimized. We also improve the pipeline efficiency by creating regular branch patterns to make the branch prediction more accurate. Meanwhile, we flexibly lever the SIMD instructions to optimize the parallel execution and fully use CPU’s computation resources. Experimental results show that using the above optimization approaches, our SpMV kernel is effective to significantly alleviate the critical bottlenecks and improve the efficiency of CPU pipeline, cache and memory bandwidth usage. The resulting performance achieves average 2.6 times speedup against Intel’s commercial library of MKL, as well as average 1.3 times speedup against the existing best SpMV algorithm.

  • [1]
    Sarıyüce E A, Saule E, Kaya K, et al. Regularizing graph centrality computations[J]. Journal of Parallel and Distributed Computing, 2015, 76: 106−119 doi: 10.1016/j.jpdc.2014.07.006
    [2]
    Malewicz G, Austern H M, Bik J C A, et al. Pregel: A system for large-scale graph processing[C]//Proc of ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2010: 135–146
    [3]
    Lin Li, Liao Xiaofei, Tan Guang, et al. LiveRender: A cloud gaming system based on compressed graphics streaming[C]//Proc of the 22nd ACM Int Conf on Multimedia. New York: ACM, 2014: 347–356
    [4]
    Hua Qiangsheng, Fan Haoqiang, Qian Lixiang, et al. Brief announcement: A tight distributed algorithm for all pairs shortest paths and applications[C]//Proc of the 28th ACM Symp on Parallelism in Algorithms and Architectures. New York: ACM, 2016: 439–441
    [5]
    谢震, 谭光明, 孙凝晖. 基于PPR模型的稀疏矩阵向量乘及卷积性能优化研究[J]. 计算机研究与发展, 2021,58(3): 445−457

    Xie Zhen, Tan Guangming, Sun Ninghui. Research on optimal performance of sparse matrix-vector multiplication and convoulution using the probability-process-RAM model[J]. Journal of Computer Research and Development, 2021, 58(3): 445−457(in Chinese)
    [6]
    Wang Endong, Zhang Qing, Shen Bo, et al. Intel Math Kernel Library[M]//High-Performance Computing on the Intel® Xeon Phi™. Beilin: Springer, 2014: 167–188
    [7]
    Liu Xing, Smelyankiy M, Chow E, et al. Efficient sparse matrix-vector multiplication on x86-based many-core processors[C]//Proc of the 27th Int ACM Conf on Int Conf on Supercomputing. New York: ACM, 2013: 273–282
    [8]
    Liu Weifeng, Vinter B. CSR5: An efficient storage format for cross platform sparse matrix-vector multiplication[C]//Proc of the 29th Int Conf on Supercomputing. New York: ACM, 2015: 339–350
    [9]
    Xie Biwei, Zhan Jianfeng, Liu Xu, et al. CVR: Efficient vectorization of spmv on x86 processors[C]//Proc of the 2018 Int Symp on Code Generation and Optimization. New York: ACM, 2018: 149–162
    [10]
    Buluc A, Fineman T J, Frigo M, et al. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks[C]//Proc of the 21st Annual Symp on Parallelism in Algorithms and Architectures. New York: ACM, 2009: 233–244
    [11]
    Choi W J, Singh A, Vuduc W R. Model-driven autotuning of sparse matrix-vector multiply on GPUs[C]//Proc of the 15th ACM SIGPLAN Symp on Principles and Practice of Parallel Programming. New York: ACM, 2010: 115–126
    [12]
    Liang Yun, Tang TengWai, Zhao Ruizhe, et al. Scale-free sparse matrix-vector multiplication on many-core architectures[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2017, 36(12): 2106−2119 doi: 10.1109/TCAD.2017.2681072
    [13]
    Im E J, Yelick K, Vuduc R. Sparsity: Optimization framework for sparse matrix kernels[J]. International Journal of High Performance Applications, 2004, 18(1): 135−158 doi: 10.1177/1094342004041296
    [14]
    Hong Changwan, Sukumaran-Rajam A, Nisa I, et al. Adaptive sparse tiling for sparse matrix multiplication[C]//Proc of the 24th Symp on Principles and Practice of Parallel Programming. New York: ACM, 2019: 300–314
    [15]
    Davis A T, Hu Yifan. The University of Florida sparse matrix collection[J]. ACM Transactions on Mathematical Software, 2011, 38(1): 1−25
    [16]
    Dyksen R W, Ribbens J C. Interactive ELLPACK: An interactive problem-solving environment for elliptic partial differential equations[J]. ACM Transactions on Mathematical Software, 1987, 13(2): 113−132 doi: 10.1145/328512.328515
    [17]
    Kourtis K, Karakasis V, Goumas G, et al. CSX: An extended compression format for SpMV on shared memory systems[C]//Proc of the 16th ACM Symp on Principles and Practice of Parallel Programming. New York: ACM, 2011: 247–256
    [18]
    Ashari A, Sedaghati N, Eisenlohr J, et al. An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs[C]//Proc of the 28th ACM Int Conf on Supercomputing. New York: ACM, 2014: 273–282
    [19]
    Chen Tien-Fu, Baer L J. Effective hardware-based data prefetching for high-performance processors[J]. IEEE Transactions on Computers, 1995, 44(5): 609−623 doi: 10.1109/12.381947
    [20]
    Xia Tian, Fu Gelin, Li Chenyang, et al. A comprehensive performance model of sparse matrix-vector multiplication to guide kernel optimization[J]. IEEE Transactions on Parallel and Distributed Systems, 2023, 34(2): 519−534 doi: 10.1109/TPDS.2022.3225230
    [21]
    Karkhanis S T, Smith E J. A first-order superscalar processor model[C]//Proc of 31st Annual Int Symp on Computer Architecture. Piscataway, NJ: IEEE, 2004: 338–349
    [22]
    Tang Tengwai, Zhao Ruizhe, Lu Mian et al. Optimizing and auto-tuning scale-free sparse matrix vector multiplication on Intel Xeon Phi[C]// Proc of IEEE/ACM Int Symp on Code Generation and Optimization. New York: ACM, 2015: 136–145
    [23]
    Seznez A. A new case for the TAGE branch predictor[C]//Proc of the 44th Annual IEEE/ACM Int Symp on Microarchitecture. New York: ACM, 2011: 117–127
    [24]
    Bian Haodong, Huang Jianqiang, Dong Runting et al. CSR2: A new format for SIMD-accelerated SpMV[C]//Proc of IEEE/ACM Int Symp on Cluster, Cloud and Internet Computing (CCGRID). Piscataway, NJ: IEEE, 2020: 350−359.
  • Related Articles

    [1]Li Qinxin, Wu Wenhao, Wang Zhaohua, Li Zhenyu. DNS Recursive Resolution Service Security: Threats, Defenses, and Measurements[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440158
    [2]Research on Malicious Domain Detection Technology Based on Semantic Graph Learning[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440375
    [3]Wei Jinxia, Long Chun, Fu Hao, Gong Liangyi, Zhao Jing, Wan Wei, Huang Pan. Malicious Domain Name Detection Method Based on Enhanced Embedded Feature Hypergraph Learning[J]. Journal of Computer Research and Development, 2024, 61(9): 2334-2346. DOI: 10.7544/issn1000-1239.202330117
    [4]Pan Jianwen, Cui Zhanqi, Lin Gaoyi, Chen Xiang, Zheng Liwei. A Review of Static Detection Methods for Android Malicious Application[J]. Journal of Computer Research and Development, 2023, 60(8): 1875-1894. DOI: 10.7544/issn1000-1239.202220297
    [5]Fan Zhaoshan, Wang Qing, Liu Junrong, Cui Zelin, Liu Yuling, Liu Song. Survey on Domain Name Abuse Detection Technology[J]. Journal of Computer Research and Development, 2022, 59(11): 2581-2605. DOI: 10.7544/issn1000-1239.20210121
    [6]Yang Wang, Gao Mingzhe, Jiang Ting. A Malicious Code Static Detection Framework Based on Multi-Feature Ensemble Learning[J]. Journal of Computer Research and Development, 2021, 58(5): 1021-1034. DOI: 10.7544/issn1000-1239.2021.20200912
    [7]Peng Chengwei, Yun Xiaochun, Zhang Yongzheng, Li Shuhao. Detecting Malicious Domains Using Co-Occurrence Relation Between DNS Query[J]. Journal of Computer Research and Development, 2019, 56(6): 1263-1274. DOI: 10.7544/issn1000-1239.2019.20180481
    [8]Dai Hua, Qin Xiaolin, and Bai Chuanjie. A Malicious Transaction Detection Method Based on Transaction Template[J]. Journal of Computer Research and Development, 2010, 47(5): 921-929.
    [9]Li Qianmu and Liu Fengyu. A Risk Detection and Fault Analysis Method for the Strategic Internet[J]. Journal of Computer Research and Development, 2008, 45(10): 1718-1723.
    [10]Zhang Xiaoning and Feng Dengguo. Intrusion Detection for Ad Hoc Routing Based on Fuzzy Behavior Analysis[J]. Journal of Computer Research and Development, 2006, 43(4): 621-626.
  • Cited by

    Periodical cited type(1)

    1. 余莎莎,肖辉,郑清,赵幽. 基于威胁情报的DNS助力医院网络安全建设实践. 中国卫生信息管理杂志. 2024(06): 909-914 .

    Other cited types(1)

Catalog

    Article views (310) PDF downloads (145) Cited by(2)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return