SparseMode: A Sparse Compiler Framework for Efficient SpMV Vectorized Code Generation

Wang Haotian; Ding Yan; He Xianhao; Xiao Guoqing; Yang Wangdong

doi:10.7544/issn1000-1239.202550139

Journal of Computer Research and Development > 2025 > Corrected proof > DOI: 10.7544/issn1000-1239.202550139 CSTR: 32373.14.issn1000-1239.202550139

Wang Haotian, Ding Yan, He Xianhao, Xiao Guoqing, Yang Wangdong. SparseMode: A Sparse Compiler Framework for Efficient SpMV Vectorized Code Generation[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550139

Citation:

PDF (1320 KB)

SparseMode: A Sparse Compiler Framework for Efficient SpMV Vectorized Code Generation

College of Computer Science and Electronic Engineering, Hunan University, Changsha 410000

Funds:

undefined

This work was supported by the National Key Research and Development Program of China (2023YFB3001805), the National Natural Science Foundation of China (62321003, 62227808), and the China Postdoctoral Science Foundation (2024M750879, BX20240111).

More Information

Author Bio:
Wang Haotian: born in 1997. PhD, assistant researcher, postdoctor. Member of CCF. His main research interests include high performance computing and compiler optimization

Ding Yan: born in 1992. PhD, assistant professor. Member of CCF. His main research interests include parallel computing, mobile edge computing, and artificial intelligence

He Xianhao: born in 1995. PhD candidate. His main research interests include sparse compiler and artificial intelligence

Xiao Guoqing: born in 1988. PhD, professor. His main research interests include high performance computing and intelligent computing

Yang Wangdong: born in 1974. PhD, professor. His main research interests include high performance computing and parallel numerical algorithms
Received Date: February 28, 2025
Revised Date: April 10, 2025
Available Online: April 18, 2025

Graphical Abstract

Abstract

Abstract

Sparse matrix-vector multiplication (SpMV) is a core operation in numerical computations, widely used in scientific computing, engineering simulations, and machine learning. The performance optimization of SpMV is mainly constrained by irregular sparse patterns, with traditional optimizations relying on manually designed storage formats, computation strategies, and memory access patterns. Existing tensor compilers such as TACO and TVM use domain-specific languages (DSL) to generate high-performance operators, alleviating developers from the burdensome manual optimization tasks. However, the optimization support for sparse computations remains insufficient, making it challenging to adaptively optimize performance for different sparse patterns. To address these issues, we propose a sparse compiler framework called SparseMode, which generates efficient vectorized code for SpMV computations based on the matrix’s sparse pattern and adaptively adjusts optimization strategies according to the characteristics of the hardware platform. The compiler framework first designs a domain-specific language, SpMV-DSL, which efficiently and concisely expresses sparse matrices and computational operations in SpMV. Then, a sparse pattern-aware method is introduced, which dynamically selects computation strategies based on the matrix storage format and non-zero element distribution defined in SpMV-DSL. Finally, efficient parallel SpMV operator code is generated through sparse pattern analysis and scheduling optimizations, fully utilizing SIMD instructions to enhance performance. Experimental results on different hardware platforms show that SpMV operator code generated by SparseMode achieves up to a 2.44 times speedup compared with the existing TACO and TVM tensor compilers.
- SpMV,
- compilers,
- sparse patterns,
- high performance computing,
- vectorization

FullText(HTML)

References (30)

References

[1]	颜志远, 解壁伟, 包云岗. HVMS:基于混合向量化的SpMV优化机制[J]. 计算机研究与发展,2024,61(12):2969−2984 Yan Zhiyuan, Xie Biwei, Bao Yungang. HVMS: A hybrid vectorization-based optimization mechanism for SpMV[J]. Journal of Computer Research and Development, 2024, 61(12): 2969−2984 (in Chinese)
[2]	Chen Jie. Graph neural preconditioners for iterative solutions of sparse linear systems[J]. arXiv preprint, arXiv: 2406.00809, 2024
[3]	Shi Xiangting, Zhang Yakang, Pujahari A, et al. When latent features meet side information: A preference relation based graph neural network for collaborative filtering[J]. Expert Systems with Applications, 2025, 260: 125423 doi: 10.1016/j.eswa.2024.125423
[4]	Tanveer M, Rajani T, Rastogi R, et al. Comprehensive review on twin support vector machines[J]. Annals of Operations Research, 2024, 339(3): 1223−1268 doi: 10.1007/s10479-022-04575-w
[5]	李佳佳,张秀霞,谭光明,等. 选择稀疏矩阵乘法最优存储格式的研究[J]. 计算机研究与发展,2014,51(4):882−894 doi: 10.7544/issn1000-1239.2014.20120857 Li Jiajia, Zhang Xiuxia, Tan Guangming, et al. Research on selecting optimal storage formats for sparse matrix multiplication[J]. Journal of Computer Research and Development, 2014, 51(4): 882−894 (in Chinese) doi: 10.7544/issn1000-1239.2014.20120857
[6]	袁娥,张云泉,刘芳芳,等. SpMV的自动性能优化实现技术及其应用研究[J]. 计算机研究与发展,2009,46(7):1117−1126 Yuan E, Zhang Yunquan, Liu Fangfang, et al. Research on automatic performance optimization implementation techniques for SpMV and its applications[J]. Journal of Computer Research and Development, 2009, 46(7): 1117−1126 (in Chinese)
[7]	Li Kenli, Yang Wangdong, Li Keqin. Performance analysis and optimization for SpMV on GPU using probabilistic modeling[J]. IEEE Transactions on Parallel and Distributed Systems, 2014, 26(1): 196−205
[8]	Lattner C, Adve V. LLVM: A compilation framework for lifelong program analysis & transformation[C]//Proc of the Int Symp on Code Generation and Optimization (CGO 2004). Piscataway, NJ: IEEE, 2004: 75−86
[9]	Gough B J, Stallman R. An Introduction to GCC[M]. Network Theory Limited, 2004
[10]	Kjolstad F, Kamil S, Chou S, et al. The tensor algebra compiler[C]//Proc of the ACM on Programming Languages (OOPSLA). New York: ACM, 2017: 1−29
[11]	Chen Tianqi, Moreau T, Jiang Ziheng, et al. TVM: An automated end-to-end optimizing compiler for deep learning[C]//Proc of the 13th USENIX Symp on Operating Systems Design and Implementation (OSDI 18). Berkeley, CA: USENIX Association, 2018: 578−594
[12]	孙相征,张云泉,王婷,等. 对角线稀疏矩阵的SpMV自适应性能优化[J]. 计算机研究与发展,2013,50(3):648−656 doi: 10.7544/issn1000-1239.2013.20110104 Sun Xiangzheng, Zhang Yunquan, Wang Ting, et al. Adaptive performance optimization for SpMV on diagonal sparse matrices[J]. Journal of Computer Research and Development, 2013, 50(3): 648−656 (in Chinese) doi: 10.7544/issn1000-1239.2013.20110104
[13]	甘新标, 谭雯, 刘杰. 基于双向位图的CSR大规模图存储优化[J]. 计算机研究与发展,2021,58(3):458−466 Gan Xinbiao, Tan Wen, Liu Jie. CSR-based large-scale graph storage optimization using bidirectional bitmaps[J]. Journal of Computer Research and Development, 2021, 58(3): 458−466 (in Chinese)
[14]	Wang Y E, Wei G Y, Brooks D. Benchmarking TPU, GPU, and CPU platforms for deep learning[J]. arXiv preprint, arXiv: 1907.10701, 2019
[15]	Gao Jianhua, Liu Bingjie, Ji Weixing, et al. A systematic literature survey of sparse matrix-vector multiplication[J]. arXiv preprint, arXiv: 2404.06047, 2024
[16]	Bian Haodong, Huang Jianqiang, Liu Lingbin, et al. ALBUS: A method for efficiently processing SpMV using SIMD and load balancing[J]. Future Generation Computer Systems, 2021, 116: 371−392 doi: 10.1016/j.future.2020.10.036
[17]	Tang Waiteng, Zhao Ruizhe, Lu Mian, et al. Optimizing and auto tuning scale-free sparse matrix-vector multiplication on Intel Xeon Phi[C]//Proc of the 13th Annual IEEE/ACM Int Symp on Code Generation and Optimization. Piscataway, NJ: IEEE, 2015: 136−145
[18]	Xie Biwei, Zhan Jianfeng, Liu Xu, et al. CVR: Efficient vectorization of SpMV on x86 processors[C]//Proc of the 16th Annual IEEE/ACM Int Symp on Code Generation and Optimization. New York: ACM, 2018: 149−162
[19]	Eberhardt R, Hoemmen M. Optimization of block sparse matrix-vector multiplication on shared-memory parallel architectures[C]//Proc of the 2016 IEEE Int Parallel and Distributed Processing Symp Workshops (IPDPSW). Piscataway, NJ: IEEE, 2016: 663−672
[20]	Yan Shengen, Li Chao, Zhang Yunquan, et al. yaSpMV: Yet another SpMV framework on GPUS[J]. ACM SIGPLAN Notices, 2014, 49(8): 107−118 doi: 10.1145/2692916.2555255
[21]	Yesil S, Heidarshenas A, Morrison A, et al. Speeding up SpMV for power-law graph analytics by enhancing locality & vectorization[C/ OL]//Proc of the 2020 Int Conf for High Performance Computing, Networking, Storage and Analysis. Piscataway, NJ: IEEE, 2020[2023 07−31]. https://ieeexplore.ieee.org/ document/9355205
[22]	Filippone S, Cardellini V, Barbieri D, et al. Sparse matrix-vector multiplication on GPGPUs[J]. ACM Transactions on Mathematical Software, 2017, 43(4): 1−49
[23]	Mohammed T, Mehmood R. Performance enhancement strategies for sparse matrix-vector multiplication (SPMV) and iterative linear solvers[J]. arXiv preprint, arXiv: 2212.07490, 2022
[24]	Langr D, Tvrdik P. Evaluation criteria for sparse matrix storage formats[J]. IEEE Transactions on Parallel and Distributed Systems, 2015, 27(2): 428−440
[25]	Lattner C, Amini M, Bondhugula U, et al. MLIR: Scaling compiler infrastructure for domain specific computation[C]//Proc of 2021 IEEE/ACM Int Symp on Code Generation and Optimization (CGO). Piscataway, NJ: IEEE, 2021: 2−14
[26]	Li Yingsong, Wang Yanyan, Jiang Tao. Sparse-aware set-membership NLMS algorithms and their application for sparse channel estimation and echo cancelation[J]. AEU-International Journal of Electronics and Communications, 2016, 70(7): 895−902
[27]	Kumar K, Pandey R, Karthik M L N S, et al. Robust and sparsity-aware adaptive filters: A review[J]. Signal Processing, 2021, 189: 108276 doi: 10.1016/j.sigpro.2021.108276
[28]	Shomron G, Gabbay F, Kurzum S, et al. Post-training sparsity-aware quantization[J]. Advances in Neural Information Processing Systems, 2021, 34: 17737−17748
[29]	Mariappan M, Che J, Vora K. DZiG: Sparsity-aware incremental processing of streaming graphs[C]//Proc of the 16th European Conf on Computer Systems. New York: ACM, 2021: 83−98
[30]	Won J, Mendis C, Emer J S, et al. WACO: Learning workload-aware co-optimization of the format and schedule of a sparse tensor program[C]//Proc of the 28th ACM Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2023: 920−934

[1]	Lai Yuanming, Li Yalong, Hu Hanzhi, Xie Mengyao, Wang Zhe, Wu Chenggang. Optimizing Cross-Architecture Programming Model Adaptation in SIMD-to-RVV Dynamic Binary Translation[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550135
[2]	Yu Zihao, Chen Lu, Sun Ninghui, Bao Yungang. Quality Optimization Method of Dynamic Binary Translation Code Targeting for RISC-V[J]. Journal of Computer Research and Development, 2023, 60(10): 2322-2334. DOI: 10.7544/issn1000-1239.202220296
[3]	Fu Liguo, Pang Jianmin, Wang Jun, Zhang Jiahao, Yue Feng. Formal Model of Correctness and Optimization on Binary Translation[J]. Journal of Computer Research and Development, 2019, 56(9): 2001-2011. DOI: 10.7544/issn1000-1239.2019.20180513
[4]	Fu Liguo, Pang Janming, Wang Jun, Zhang Jiahao, Yue Feng. Optimization of Library Function Disposing in Dynamic Binary Translation[J]. Journal of Computer Research and Development, 2019, 56(8): 1783-1791. DOI: 10.7544/issn1000-1239.2019.20170871
[5]	Guo Yanchao, Gao Ling, Wang Hai, Zheng Jie, Ren Jie. Power Optimization Based on Dynamic Content Refresh in Mobile Edge Computing[J]. Journal of Computer Research and Development, 2018, 55(3): 563-571. DOI: 10.7544/issn1000-1239.2018.20170716
[6]	Zhang Shiwen, Li Zhiyong, Chen Shaomiao, and Li Renfa. Dynamic Multi-Objective Optimization Algorithm Based on Ecological Strategy[J]. Journal of Computer Research and Development, 2014, 51(6): 1313-1330.
[7]	Liu Chun'an, Wang Yuping. Dynamic Multi-Objective Optimization Evolutionary Algorithm Based on New Model[J]. Journal of Computer Research and Development, 2008, 45(4): 603-611.
[8]	Li Jianhui, Ma Xiangning, Zhu Chuanqi. Dynamic Binary Translation and Optimization[J]. Journal of Computer Research and Development, 2007, 44(1): 161-168.
[9]	Dou Quansheng, Zhou Chunguang, Xu Zhongyu, Pan Guanyu. Swarm-Core Evolutionary Particle Swarm Optimization in Dynamic Optimization Environments[J]. Journal of Computer Research and Development, 2006, 43(1): 89-95.
[10]	Ma Xiangning, Wu Chenggang, Tang Feng, Feng Xiaobing, and Zhang Zhaoqing. Two Condition Code Optimization Approaches in Binary Translation[J]. Journal of Computer Research and Development, 2005, 42(2): 329-337.