选择稀疏矩阵乘法最优存储格式的研究

李佳佳; 张秀霞; 谭光明; 陈明宇

选择稀疏矩阵乘法最优存储格式的研究

Study of Choosing the Optimal Storage Format of Sparse Matrix Vector Multiplication

摘要

摘要: 稀疏矩阵向量乘法(sparse matrix vector multiplication, SpMV)是科学和工程领域中重要的核心子程序之一，也是稀疏基本线性代数子程序(basic linear algebra subprograms, BLAS)库的重要函数.目前很多SpMV的优化工作在不同程度上获得了性能提升，但大多数优化工作针对特定存储格式或一类具有特定特征的稀疏矩阵缺乏通用性.因此高性能的SpMV实现并没有广泛地应用于实际应用和数值解法器中.另外，稀疏矩阵具有众多存储格式，不同存储格式的SpMV存在较大性能差异.根据以上现象，提出一个SpMV的自动调优器(SpMV auto-tuner, SMAT).对于一个给定的稀疏矩阵，SMAT结合矩阵特征选择并返回其最优的存储格式.应用程序通过调用SMAT来得到合适的存储格式，从而获得性能提升，同时随着SMAT中存储格式的扩展，更多的SpMV优化工作可以将性能优势在实际应用中发挥作用.使用佛罗里达大学的2366个稀疏矩阵作为测试集，在Intel上SMAT分别获得9.11GFLOPS(单精度)和2.44GFLOPS(双精度)的最高浮点性能，在AMD平台上获得了3.36GFLOPS(单精度)和1.52GFLOPS(双精度)的最高浮点性能.相比Intel的核心数学函数库(math kernel library, MKL)数学库，SMAT平均获得1.4~1.5倍的性能提升.

Abstract: Sparse matrix vector multiplication (SpMV) is one of the most important kernels in scientific and engineering applications. It is also one of the most essential subprograms of sparse BLAS library. A lot of work has been dedicated in optimizing SpMV, and some has achieved significant performance improvement. Since most of the optimization methods are less of generalization and only suitable for a specific type of sparse matrices, the optimized SpMV kernels have not been widely used in real applications and numerical solvers. Besides, there are many storage formats of a sparse matrix and most of them achieve diverse performance on different SpMV kernels. In this paper, considering different sparse matrix features, we present an SpMV auto-tuner (SMAT) to choose the optimal storage format for a given sparse matrix on different computer architectures. The optimal storage format releasing the highest SpMV performance is helpful to enhance the performance of applications. Moreover, SMAT is also extensible to new formats, which will make full use of the achievements of SpMV optimization in literatures. We test SMAT using 2366 sparse matrices from the University of Florida. SMAT achieves 9.11 GFLOPS (single) and 2.44 GFLOPS (double) on Intel platform, and 3.36 GFLOPS (single) and 1.52 GFLOPS(double) on AMD platform. Compared with Intel MKL library, the speedup of SMAT is 1.4 to 1.5 times.

HTML全文

参考文献(0)

施引文献

资源附件(0)