SparseMode：用于高效SpMV向量化代码生成的稀疏编译框架

王昊天; 丁岩; 何贤浩; 肖国庆; 阳王东

doi:10.7544/issn1000-1239.202550139

SparseMode：用于高效SpMV向量化代码生成的稀疏编译框架

SparseMode: A Sparse Compiler Framework for Efficient SpMV Vectorized Code Generation

摘要

摘要: 稀疏矩阵向量乘法（sparse matrix-vector multiplication，SpMV）是数值计算中的核心操作，广泛应用于科学计算、工程模拟以及机器学习中. SpMV的性能优化主要受限于不规则的稀疏模式，传统的优化通常依赖手动设计存储格式、计算策略和内存访问模式. 现有张量编译器如TACO和TVM通过领域特定语言（domain specific language，DSL）可实现高性能算子生成，减轻开发人员繁琐的手动优化工作，但对稀疏计算的优化支持尚显不足，难以根据不同的稀疏模式自适应优化性能. 为了解决这些问题，提出了名为SparseMode的稀疏编译框架，能够依据矩阵的稀疏模式为SpMV计算生成高效的向量化代码，并根据硬件平台的特性自适应地调整优化策略. 该编译框架首先设计了领域专属语言SpMV-DSL，能够简洁高效地表达SpMV的稀疏矩阵和计算操作. 然后提出了基于稀疏模式感知的方法，根据SpMV-DSL定义的矩阵存储格式和非零元素分布动态选择计算策略. 最后通过稀疏模式分析和调度优化生成高效并行的SpMV算子代码，以充分利用SIMD指令提升性能. 在不同硬件平台上的SpMV实验结果表明，SparseMode生成的SpMV算子代码相较于现有的TACO和TVM张量编译器实现了最高2.44倍的加速比.

Abstract: Sparse matrix-vector multiplication (SpMV) is a core operation in numerical computations, widely used in scientific computing, engineering simulations, and machine learning. The performance optimization of SpMV is mainly constrained by irregular sparse patterns, with traditional optimizations relying on manually designed storage formats, computation strategies, and memory access patterns. Existing tensor compilers such as TACO and TVM use domain-specific languages (DSL) to generate high-performance operators, alleviating developers from the burdensome manual optimization tasks. However, the optimization support for sparse computations remains insufficient, making it challenging to adaptively optimize performance for different sparse patterns. To address these issues, we propose a sparse compiler framework called SparseMode, which generates efficient vectorized code for SpMV computations based on the matrix’s sparse pattern and adaptively adjusts optimization strategies according to the characteristics of the hardware platform. The compiler framework first designs a domain-specific language, SpMV-DSL, which efficiently and concisely expresses sparse matrices and computational operations in SpMV. Then, a sparse pattern-aware method is introduced, which dynamically selects computation strategies based on the matrix storage format and non-zero element distribution defined in SpMV-DSL. Finally, efficient parallel SpMV operator code is generated through sparse pattern analysis and scheduling optimizations, fully utilizing SIMD instructions to enhance performance. Experimental results on different hardware platforms show that SpMV operator code generated by SparseMode achieves up to a 2.44 times speedup compared with the existing TACO and TVM tensor compilers.

HTML全文

参考文献(30)

施引文献

资源附件(1)