Abstract:
Sparse matrix vector multiplication (SpMV) is one of the most important kernels in scientific and engineering applications. It is also one of the most essential subprograms of sparse BLAS library. A lot of work has been dedicated in optimizing SpMV, and some has achieved significant performance improvement. Since most of the optimization methods are less of generalization and only suitable for a specific type of sparse matrices, the optimized SpMV kernels have not been widely used in real applications and numerical solvers. Besides, there are many storage formats of a sparse matrix and most of them achieve diverse performance on different SpMV kernels. In this paper, considering different sparse matrix features, we present an SpMV auto-tuner (SMAT) to choose the optimal storage format for a given sparse matrix on different computer architectures. The optimal storage format releasing the highest SpMV performance is helpful to enhance the performance of applications. Moreover, SMAT is also extensible to new formats, which will make full use of the achievements of SpMV optimization in literatures. We test SMAT using 2366 sparse matrices from the University of Florida. SMAT achieves 9.11 GFLOPS (single) and 2.44 GFLOPS (double) on Intel platform, and 3.36 GFLOPS (single) and 1.52 GFLOPS(double) on AMD platform. Compared with Intel MKL library, the speedup of SMAT is 1.4 to 1.5 times.