HVMS：基于混合向量化的SpMV优化机制

颜志远; 解壁伟; 包云岗

doi:10.7544/issn1000-1239.202330204

摘要: 在科学计算和系统工程等领域，稀疏矩阵向量乘（sparse matrix-vector multiplication，SpMV）占据着极其重要的位置. 受限于矩阵稀疏性所导致的访存不规则性，向量优化一直是SpMV的难点. 针对此问题，进行深入分析并且总结影响SpMV向量化效率的主要因素. 除却稀疏矩阵内非零元分布的不规则，不同稀疏矩阵之间的非零元分布特征亦有明显不同，导致单一的向量优化策略难以适用于多种不同特征的稀疏矩阵. 另一方面，多样化向量硬件在向量特性和指令上的差异，影响了SpMV向量优化方法的通用性. 把不规则的稀疏矩阵映射到规则的向量硬件上进行计算，是SpMV向量化面临的最主要挑战. 基于此，提出一种基于混合向量化方法的SpMV优化机制（hybrid vectorization-optimized mechanism of SpMV，HVMS）. HVMS首先对向量硬件的特性进行抽象建模，并基于抽象出的基本操作，设计相应的规则指导稀疏矩阵进行规则化转换. 按照不同的矩阵特征，HVMS将稀疏矩阵划分为不同的部分，弱化稀疏矩阵的不规则程度，并引入不同的优化策略最大化SpMV的向量化效率，从而提升性能. 基于Intel Xeon平台，在30个常用稀疏矩阵上对HVMS进行实验分析. 结果表明，相比现有代表性工作如CVR，SELL-C-σ，Intel MKL，HVMS分别获得1.60倍、1.72倍和1.93倍的平均加速比.

Abstract: Sparse matrix-vector multiplication (SpMV) plays an important role in a wide variety of scientific and engineering applications.Executing SpMV efficiently on vector processors is challenging because of the irregular memory access of sparse matrices. We conduct a detailed analysis of the main factors that have impact on the efficiency of SpMV vectorization.In addition to the irregular distribution of non-zero elements within sparse matrices, different sparse matrices also exhibit huge variations in the distribution characteristics of non-zero elements. Therefore, it is difficult to apply a universal vector optimization method for matrices with diverse characteristics. Furthermore, there is a big difference in vector computing and vector instructions for various vector processors. The primary challenge of SpMV vectorization lies in mapping the irregular sparse matrices onto the regular vector processor. In this paper, we propose a hybrid vectorization-optimized mechanism of SpMV(HVMS). HVMS models the characteristics of vector processors and designs corresponding rules based on the abstracted basic operations to guide the regularization conversion of sparse matrices. HVMS divides the matrix into different parts according to the different characteristics. For each part, the non-zero distribution can be less irregular and then HVMS introduces corresponding optimization mechanisms to boost the vectorization efficiency of SpMV. We implement and evaluate HVMS on an Intel Xeon processor and compare it with three state-of-the-art approaches using 30 sparse matrices. Experimental results show that HVMS can achieve an average speedup of 1.60x, 1.72x, and 1.93x over CVR, SELL-C-σ, and Intel MKL, respectively.

HVMS：基于混合向量化的SpMV优化机制

HVMS: A Hybrid Vectorization-Optimized Mechanism of SpMV