• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Wang Haotian, Ding Yan, He Xianhao, Xiao Guoqing, Yang Wangdong. SparseMode: A Sparse Compiler Framework for Efficient SpMV Vectorized Code Generation[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550139
Citation: Wang Haotian, Ding Yan, He Xianhao, Xiao Guoqing, Yang Wangdong. SparseMode: A Sparse Compiler Framework for Efficient SpMV Vectorized Code Generation[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550139

SparseMode: A Sparse Compiler Framework for Efficient SpMV Vectorized Code Generation

Funds: 

undefined

This work was supported by the National Key Research and Development Program of China (2023YFB3001805), the National Natural Science Foundation of China (62321003, 62227808), and the China Postdoctoral Science Foundation (2024M750879, BX20240111).

More Information
  • Author Bio:

    Wang Haotian: born in 1997. PhD, assistant researcher, postdoctor. Member of CCF. His main research interests include high performance computing and compiler optimization

    Ding Yan: born in 1992. PhD, assistant professor. Member of CCF. His main research interests include parallel computing, mobile edge computing, and artificial intelligence

    He Xianhao: born in 1995. PhD candidate. His main research interests include sparse compiler and artificial intelligence

    Xiao Guoqing: born in 1988. PhD, professor. His main research interests include high performance computing and intelligent computing

    Yang Wangdong: born in 1974. PhD, professor. His main research interests include high performance computing and parallel numerical algorithms

  • Received Date: February 28, 2025
  • Revised Date: April 10, 2025
  • Available Online: April 18, 2025
  • Sparse matrix-vector multiplication (SpMV) is a core operation in numerical computations, widely used in scientific computing, engineering simulations, and machine learning. The performance optimization of SpMV is mainly constrained by irregular sparse patterns, with traditional optimizations relying on manually designed storage formats, computation strategies, and memory access patterns. Existing tensor compilers such as TACO and TVM use domain-specific languages (DSL) to generate high-performance operators, alleviating developers from the burdensome manual optimization tasks. However, the optimization support for sparse computations remains insufficient, making it challenging to adaptively optimize performance for different sparse patterns. To address these issues, we propose a sparse compiler framework called SparseMode, which generates efficient vectorized code for SpMV computations based on the matrix’s sparse pattern and adaptively adjusts optimization strategies according to the characteristics of the hardware platform. The compiler framework first designs a domain-specific language, SpMV-DSL, which efficiently and concisely expresses sparse matrices and computational operations in SpMV. Then, a sparse pattern-aware method is introduced, which dynamically selects computation strategies based on the matrix storage format and non-zero element distribution defined in SpMV-DSL. Finally, efficient parallel SpMV operator code is generated through sparse pattern analysis and scheduling optimizations, fully utilizing SIMD instructions to enhance performance. Experimental results on different hardware platforms show that SpMV operator code generated by SparseMode achieves up to a 2.44 times speedup compared with the existing TACO and TVM tensor compilers.

  • [1]
    颜志远, 解壁伟, 包云岗. HVMS:基于混合向量化的SpMV优化机制[J]. 计算机研究与发展,2024,61(12):2969−2984

    Yan Zhiyuan, Xie Biwei, Bao Yungang. HVMS: A hybrid vectorization-based optimization mechanism for SpMV[J]. Journal of Computer Research and Development, 2024, 61(12): 2969−2984 (in Chinese)
    [2]
    Chen Jie. Graph neural preconditioners for iterative solutions of sparse linear systems[J]. arXiv preprint, arXiv: 2406.00809, 2024
    [3]
    Shi Xiangting, Zhang Yakang, Pujahari A, et al. When latent features meet side information: A preference relation based graph neural network for collaborative filtering[J]. Expert Systems with Applications, 2025, 260: 125423 doi: 10.1016/j.eswa.2024.125423
    [4]
    Tanveer M, Rajani T, Rastogi R, et al. Comprehensive review on twin support vector machines[J]. Annals of Operations Research, 2024, 339(3): 1223−1268 doi: 10.1007/s10479-022-04575-w
    [5]
    李佳佳,张秀霞,谭光明,等. 选择稀疏矩阵乘法最优存储格式的研究[J]. 计算机研究与发展,2014,51(4):882−894 doi: 10.7544/issn1000-1239.2014.20120857

    Li Jiajia, Zhang Xiuxia, Tan Guangming, et al. Research on selecting optimal storage formats for sparse matrix multiplication[J]. Journal of Computer Research and Development, 2014, 51(4): 882−894 (in Chinese) doi: 10.7544/issn1000-1239.2014.20120857
    [6]
    袁 娥,张云泉,刘芳芳,等. SpMV的自动性能优化实现技术及其应用研究[J]. 计算机研究与发展,2009,46(7):1117−1126

    Yuan E, Zhang Yunquan, Liu Fangfang, et al. Research on automatic performance optimization implementation techniques for SpMV and its applications[J]. Journal of Computer Research and Development, 2009, 46(7): 1117−1126 (in Chinese)
    [7]
    Li Kenli, Yang Wangdong, Li Keqin. Performance analysis and optimization for SpMV on GPU using probabilistic modeling[J]. IEEE Transactions on Parallel and Distributed Systems, 2014, 26(1): 196−205
    [8]
    Lattner C, Adve V. LLVM: A compilation framework for lifelong program analysis & transformation[C]//Proc of the Int Symp on Code Generation and Optimization (CGO 2004). Piscataway, NJ: IEEE, 2004: 75−86
    [9]
    Gough B J, Stallman R. An Introduction to GCC[M]. Network Theory Limited, 2004
    [10]
    Kjolstad F, Kamil S, Chou S, et al. The tensor algebra compiler[C]//Proc of the ACM on Programming Languages (OOPSLA). New York: ACM, 2017: 1−29
    [11]
    Chen Tianqi, Moreau T, Jiang Ziheng, et al. TVM: An automated end-to-end optimizing compiler for deep learning[C]//Proc of the 13th USENIX Symp on Operating Systems Design and Implementation (OSDI 18). Berkeley, CA: USENIX Association, 2018: 578−594
    [12]
    孙相征,张云泉,王婷,等. 对角线稀疏矩阵的SpMV自适应性能优化[J]. 计算机研究与发展,2013,50(3):648−656 doi: 10.7544/issn1000-1239.2013.20110104

    Sun Xiangzheng, Zhang Yunquan, Wang Ting, et al. Adaptive performance optimization for SpMV on diagonal sparse matrices[J]. Journal of Computer Research and Development, 2013, 50(3): 648−656 (in Chinese) doi: 10.7544/issn1000-1239.2013.20110104
    [13]
    甘新标, 谭雯, 刘杰. 基于双向位图的CSR大规模图存储优化[J]. 计算机研究与发展,2021,58(3):458−466

    Gan Xinbiao, Tan Wen, Liu Jie. CSR-based large-scale graph storage optimization using bidirectional bitmaps[J]. Journal of Computer Research and Development, 2021, 58(3): 458−466 (in Chinese)
    [14]
    Wang Y E, Wei G Y, Brooks D. Benchmarking TPU, GPU, and CPU platforms for deep learning[J]. arXiv preprint, arXiv: 1907.10701, 2019
    [15]
    Gao Jianhua, Liu Bingjie, Ji Weixing, et al. A systematic literature survey of sparse matrix-vector multiplication[J]. arXiv preprint, arXiv: 2404.06047, 2024
    [16]
    Bian Haodong, Huang Jianqiang, Liu Lingbin, et al. ALBUS: A method for efficiently processing SpMV using SIMD and load balancing[J]. Future Generation Computer Systems, 2021, 116: 371−392 doi: 10.1016/j.future.2020.10.036
    [17]
    Tang Waiteng, Zhao Ruizhe, Lu Mian, et al. Optimizing and auto tuning scale-free sparse matrix-vector multiplication on Intel Xeon Phi[C]//Proc of the 13th Annual IEEE/ACM Int Symp on Code Generation and Optimization. Piscataway, NJ: IEEE, 2015: 136−145
    [18]
    Xie Biwei, Zhan Jianfeng, Liu Xu, et al. CVR: Efficient vectorization of SpMV on x86 processors[C]//Proc of the 16th Annual IEEE/ACM Int Symp on Code Generation and Optimization. New York: ACM, 2018: 149−162
    [19]
    Eberhardt R, Hoemmen M. Optimization of block sparse matrix-vector multiplication on shared-memory parallel architectures[C]//Proc of the 2016 IEEE Int Parallel and Distributed Processing Symp Workshops (IPDPSW). Piscataway, NJ: IEEE, 2016: 663−672
    [20]
    Yan Shengen, Li Chao, Zhang Yunquan, et al. yaSpMV: Yet another SpMV framework on GPUS[J]. ACM SIGPLAN Notices, 2014, 49(8): 107−118 doi: 10.1145/2692916.2555255
    [21]
    Yesil S, Heidarshenas A, Morrison A, et al. Speeding up SpMV for power-law graph analytics by enhancing locality & vectorization[C/ OL]//Proc of the 2020 Int Conf for High Performance Computing, Networking, Storage and Analysis. Piscataway, NJ: IEEE, 2020[2023 07−31]. https://ieeexplore.ieee.org/ document/9355205
    [22]
    Filippone S, Cardellini V, Barbieri D, et al. Sparse matrix-vector multiplication on GPGPUs[J]. ACM Transactions on Mathematical Software, 2017, 43(4): 1−49
    [23]
    Mohammed T, Mehmood R. Performance enhancement strategies for sparse matrix-vector multiplication (SPMV) and iterative linear solvers[J]. arXiv preprint, arXiv: 2212.07490, 2022
    [24]
    Langr D, Tvrdik P. Evaluation criteria for sparse matrix storage formats[J]. IEEE Transactions on Parallel and Distributed Systems, 2015, 27(2): 428−440
    [25]
    Lattner C, Amini M, Bondhugula U, et al. MLIR: Scaling compiler infrastructure for domain specific computation[C]//Proc of 2021 IEEE/ACM Int Symp on Code Generation and Optimization (CGO). Piscataway, NJ: IEEE, 2021: 2−14
    [26]
    Li Yingsong, Wang Yanyan, Jiang Tao. Sparse-aware set-membership NLMS algorithms and their application for sparse channel estimation and echo cancelation[J]. AEU-International Journal of Electronics and Communications, 2016, 70(7): 895−902
    [27]
    Kumar K, Pandey R, Karthik M L N S, et al. Robust and sparsity-aware adaptive filters: A review[J]. Signal Processing, 2021, 189: 108276 doi: 10.1016/j.sigpro.2021.108276
    [28]
    Shomron G, Gabbay F, Kurzum S, et al. Post-training sparsity-aware quantization[J]. Advances in Neural Information Processing Systems, 2021, 34: 17737−17748
    [29]
    Mariappan M, Che J, Vora K. DZiG: Sparsity-aware incremental processing of streaming graphs[C]//Proc of the 16th European Conf on Computer Systems. New York: ACM, 2021: 83−98
    [30]
    Won J, Mendis C, Emer J S, et al. WACO: Learning workload-aware co-optimization of the format and schedule of a sparse tensor program[C]//Proc of the 28th ACM Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2023: 920−934

Catalog

    Article views (33) PDF downloads (8) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return