Abstract:
Many high performance computing (HPC) applications such as solving sparse linear systems involve the calculation of sequences of Sparse Matrix-Vector Multiplications (SpMV) like
Ax,
A2x, …,
Asx, which is known as the sparse Matrix Power Kernel (MPK). Since MPK calls SpMV using the same sparse matrix, reusing the matrix elements in cache, instead of repeatedly loading them from main memory, can potentially alleviate the memory-constrained problem for SpMV and enhance the performance of MPK. However, reusing the matrix introduces data dependencies between subsequent SpMVs. Prior work mainly either focus on optimizing an individual SpMV invocation, or introduce significant overheads for cache data reuse in MPK. This paper proposes a Cache-aware MPK (Ca-MPK) to optimize MPK via cache data reuse. Based on the dependency graph of a sparse matrix, an architecture-aware recursive partitioning of the graph is designed to obtain subgraphs/submatrices which are fit into cache. Separating subgraphs (i.e., separators) are constructed to decouple data dependencies among subgraphs. Then cache data reuse is achieved by executing sequences of SpMVs on subgraphs with a specified order. Performance evaluation demonstrates that Ca-MPK outperforms the MPK implementations based on the Intel OneMKL library and the state-of-the-art approach, with an average speedup of up to about 1.57 times and 1.40 times, respectively.