Optimization of Parallel Computation on Sparse Matrix-Vector Multiplication with High Predictability

Xia Tian; Fu Gelin; Qu Shaoru; Luo Zhongpei; Ren Pengju

doi:10.7544/issn1000-1239.202330421

Xia Tian, Fu Gelin, Qu Shaoru, Luo Zhongpei, Ren Pengju. Optimization of Parallel Computation on Sparse Matrix-Vector Multiplication with High Predictability[J]. Journal of Computer Research and Development, 2023, 60(9): 1973-1987. DOI: 10.7544/issn1000-1239.202330421

Citation:

Optimization of Parallel Computation on Sparse Matrix-Vector Multiplication with High Predictability

Graphical Abstract

Abstract

Abstract

Sparse matrix-vector multiplication (SpMV) has been widely applied in scientific computation, industry simulation and intelligent computation domains, which is the critical algorithm in all these applications. Usually, iterative computation of SpMV is required to fulfill precise numeric simulation, linear algebra solving and graph analytics requirements. However, due to the poor data locality, low cache usage and extreme irregular computation patterns caused by the highly sparse and random distributions, SpMV optimization has become one of the most challenging problems for modern high-performance processors. In this paper, we study the bottlenecks of SpMV on current out-of-order CPUs and propose to improve its performance by pursuing high predictability and low program complexity. Specifically, we improve the memory access regularity and locality by creating serialized access patterns so that the data prefetching efficiency and cache usage are optimized. We also improve the pipeline efficiency by creating regular branch patterns to make the branch prediction more accurate. Meanwhile, we flexibly lever the SIMD instructions to optimize the parallel execution and fully use CPU’s computation resources. Experimental results show that using the above optimization approaches, our SpMV kernel is effective to significantly alleviate the critical bottlenecks and improve the efficiency of CPU pipeline, cache and memory bandwidth usage. The resulting performance achieves average 2.6 times speedup against Intel’s commercial library of MKL, as well as average 1.3 times speedup against the existing best SpMV algorithm.

FullText(HTML)

References (24)

Supplements (0)

Cited By

Turn off MathJax

Article Contents

Optimization of Parallel Computation on Sparse Matrix-Vector Multiplication with High Predictability

Abstract

Catalog

Export File

Citation

Format

Content