Citation: | Wang Haotian, Ding Yan, He Xianhao, Xiao Guoqing, Yang Wangdong. SparseMode: A Sparse Compiler Framework for Efficient SpMV Vectorized Code Generation[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550139 |
Sparse matrix-vector multiplication (SpMV) is a core operation in numerical computations, widely used in scientific computing, engineering simulations, and machine learning. The performance optimization of SpMV is mainly constrained by irregular sparse patterns, with traditional optimizations relying on manually designed storage formats, computation strategies, and memory access patterns. Existing tensor compilers such as TACO and TVM use domain-specific languages (DSL) to generate high-performance operators, alleviating developers from the burdensome manual optimization tasks. However, the optimization support for sparse computations remains insufficient, making it challenging to adaptively optimize performance for different sparse patterns. To address these issues, we propose a sparse compiler framework called SparseMode, which generates efficient vectorized code for SpMV computations based on the matrix’s sparse pattern and adaptively adjusts optimization strategies according to the characteristics of the hardware platform. The compiler framework first designs a domain-specific language, SpMV-DSL, which efficiently and concisely expresses sparse matrices and computational operations in SpMV. Then, a sparse pattern-aware method is introduced, which dynamically selects computation strategies based on the matrix storage format and non-zero element distribution defined in SpMV-DSL. Finally, efficient parallel SpMV operator code is generated through sparse pattern analysis and scheduling optimizations, fully utilizing SIMD instructions to enhance performance. Experimental results on different hardware platforms show that SpMV operator code generated by SparseMode achieves up to a 2.44 times speedup compared with the existing TACO and TVM tensor compilers.
[1] |
颜志远, 解壁伟, 包云岗. HVMS:基于混合向量化的SpMV优化机制[J]. 计算机研究与发展,2024,61(12):2969−2984
Yan Zhiyuan, Xie Biwei, Bao Yungang. HVMS: A hybrid vectorization-based optimization mechanism for SpMV[J]. Journal of Computer Research and Development, 2024, 61(12): 2969−2984 (in Chinese)
|
[2] |
Chen Jie. Graph neural preconditioners for iterative solutions of sparse linear systems[J]. arXiv preprint, arXiv: 2406.00809, 2024
|
[3] |
Shi Xiangting, Zhang Yakang, Pujahari A, et al. When latent features meet side information: A preference relation based graph neural network for collaborative filtering[J]. Expert Systems with Applications, 2025, 260: 125423 doi: 10.1016/j.eswa.2024.125423
|
[4] |
Tanveer M, Rajani T, Rastogi R, et al. Comprehensive review on twin support vector machines[J]. Annals of Operations Research, 2024, 339(3): 1223−1268 doi: 10.1007/s10479-022-04575-w
|
[5] |
李佳佳,张秀霞,谭光明,等. 选择稀疏矩阵乘法最优存储格式的研究[J]. 计算机研究与发展,2014,51(4):882−894 doi: 10.7544/issn1000-1239.2014.20120857
Li Jiajia, Zhang Xiuxia, Tan Guangming, et al. Research on selecting optimal storage formats for sparse matrix multiplication[J]. Journal of Computer Research and Development, 2014, 51(4): 882−894 (in Chinese) doi: 10.7544/issn1000-1239.2014.20120857
|
[6] |
袁 娥,张云泉,刘芳芳,等. SpMV的自动性能优化实现技术及其应用研究[J]. 计算机研究与发展,2009,46(7):1117−1126
Yuan E, Zhang Yunquan, Liu Fangfang, et al. Research on automatic performance optimization implementation techniques for SpMV and its applications[J]. Journal of Computer Research and Development, 2009, 46(7): 1117−1126 (in Chinese)
|
[7] |
Li Kenli, Yang Wangdong, Li Keqin. Performance analysis and optimization for SpMV on GPU using probabilistic modeling[J]. IEEE Transactions on Parallel and Distributed Systems, 2014, 26(1): 196−205
|
[8] |
Lattner C, Adve V. LLVM: A compilation framework for lifelong program analysis & transformation[C]//Proc of the Int Symp on Code Generation and Optimization (CGO 2004). Piscataway, NJ: IEEE, 2004: 75−86
|
[9] |
Gough B J, Stallman R. An Introduction to GCC[M]. Network Theory Limited, 2004
|
[10] |
Kjolstad F, Kamil S, Chou S, et al. The tensor algebra compiler[C]//Proc of the ACM on Programming Languages (OOPSLA). New York: ACM, 2017: 1−29
|
[11] |
Chen Tianqi, Moreau T, Jiang Ziheng, et al. TVM: An automated end-to-end optimizing compiler for deep learning[C]//Proc of the 13th USENIX Symp on Operating Systems Design and Implementation (OSDI 18). Berkeley, CA: USENIX Association, 2018: 578−594
|
[12] |
孙相征,张云泉,王婷,等. 对角线稀疏矩阵的SpMV自适应性能优化[J]. 计算机研究与发展,2013,50(3):648−656 doi: 10.7544/issn1000-1239.2013.20110104
Sun Xiangzheng, Zhang Yunquan, Wang Ting, et al. Adaptive performance optimization for SpMV on diagonal sparse matrices[J]. Journal of Computer Research and Development, 2013, 50(3): 648−656 (in Chinese) doi: 10.7544/issn1000-1239.2013.20110104
|
[13] |
甘新标, 谭雯, 刘杰. 基于双向位图的CSR大规模图存储优化[J]. 计算机研究与发展,2021,58(3):458−466
Gan Xinbiao, Tan Wen, Liu Jie. CSR-based large-scale graph storage optimization using bidirectional bitmaps[J]. Journal of Computer Research and Development, 2021, 58(3): 458−466 (in Chinese)
|
[14] |
Wang Y E, Wei G Y, Brooks D. Benchmarking TPU, GPU, and CPU platforms for deep learning[J]. arXiv preprint, arXiv: 1907.10701, 2019
|
[15] |
Gao Jianhua, Liu Bingjie, Ji Weixing, et al. A systematic literature survey of sparse matrix-vector multiplication[J]. arXiv preprint, arXiv: 2404.06047, 2024
|
[16] |
Bian Haodong, Huang Jianqiang, Liu Lingbin, et al. ALBUS: A method for efficiently processing SpMV using SIMD and load balancing[J]. Future Generation Computer Systems, 2021, 116: 371−392 doi: 10.1016/j.future.2020.10.036
|
[17] |
Tang Waiteng, Zhao Ruizhe, Lu Mian, et al. Optimizing and auto tuning scale-free sparse matrix-vector multiplication on Intel Xeon Phi[C]//Proc of the 13th Annual IEEE/ACM Int Symp on Code Generation and Optimization. Piscataway, NJ: IEEE, 2015: 136−145
|
[18] |
Xie Biwei, Zhan Jianfeng, Liu Xu, et al. CVR: Efficient vectorization of SpMV on x86 processors[C]//Proc of the 16th Annual IEEE/ACM Int Symp on Code Generation and Optimization. New York: ACM, 2018: 149−162
|
[19] |
Eberhardt R, Hoemmen M. Optimization of block sparse matrix-vector multiplication on shared-memory parallel architectures[C]//Proc of the 2016 IEEE Int Parallel and Distributed Processing Symp Workshops (IPDPSW). Piscataway, NJ: IEEE, 2016: 663−672
|
[20] |
Yan Shengen, Li Chao, Zhang Yunquan, et al. yaSpMV: Yet another SpMV framework on GPUS[J]. ACM SIGPLAN Notices, 2014, 49(8): 107−118 doi: 10.1145/2692916.2555255
|
[21] |
Yesil S, Heidarshenas A, Morrison A, et al. Speeding up SpMV for power-law graph analytics by enhancing locality & vectorization[C/ OL]//Proc of the 2020 Int Conf for High Performance Computing, Networking, Storage and Analysis. Piscataway, NJ: IEEE, 2020[2023 07−31]. https://ieeexplore.ieee.org/ document/9355205
|
[22] |
Filippone S, Cardellini V, Barbieri D, et al. Sparse matrix-vector multiplication on GPGPUs[J]. ACM Transactions on Mathematical Software, 2017, 43(4): 1−49
|
[23] |
Mohammed T, Mehmood R. Performance enhancement strategies for sparse matrix-vector multiplication (SPMV) and iterative linear solvers[J]. arXiv preprint, arXiv: 2212.07490, 2022
|
[24] |
Langr D, Tvrdik P. Evaluation criteria for sparse matrix storage formats[J]. IEEE Transactions on Parallel and Distributed Systems, 2015, 27(2): 428−440
|
[25] |
Lattner C, Amini M, Bondhugula U, et al. MLIR: Scaling compiler infrastructure for domain specific computation[C]//Proc of 2021 IEEE/ACM Int Symp on Code Generation and Optimization (CGO). Piscataway, NJ: IEEE, 2021: 2−14
|
[26] |
Li Yingsong, Wang Yanyan, Jiang Tao. Sparse-aware set-membership NLMS algorithms and their application for sparse channel estimation and echo cancelation[J]. AEU-International Journal of Electronics and Communications, 2016, 70(7): 895−902
|
[27] |
Kumar K, Pandey R, Karthik M L N S, et al. Robust and sparsity-aware adaptive filters: A review[J]. Signal Processing, 2021, 189: 108276 doi: 10.1016/j.sigpro.2021.108276
|
[28] |
Shomron G, Gabbay F, Kurzum S, et al. Post-training sparsity-aware quantization[J]. Advances in Neural Information Processing Systems, 2021, 34: 17737−17748
|
[29] |
Mariappan M, Che J, Vora K. DZiG: Sparsity-aware incremental processing of streaming graphs[C]//Proc of the 16th European Conf on Computer Systems. New York: ACM, 2021: 83−98
|
[30] |
Won J, Mendis C, Emer J S, et al. WACO: Learning workload-aware co-optimization of the format and schedule of a sparse tensor program[C]//Proc of the 28th ACM Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2023: 920−934
|