Xie Zhen, Tan Guangming, Sun Ninghui. Research on Optimal Performance of Sparse Matrix-Vector Multiplication and Convoulution Using the Probability-Process-Ram Model[J]. Journal of Computer Research and Development, 2021, 58(3): 445-457. DOI: 10.7544/issn1000-1239.2021.20180601
Citation:
Xie Zhen, Tan Guangming, Sun Ninghui. Research on Optimal Performance of Sparse Matrix-Vector Multiplication and Convoulution Using the Probability-Process-Ram Model[J]. Journal of Computer Research and Development, 2021, 58(3): 445-457. DOI: 10.7544/issn1000-1239.2021.20180601
Xie Zhen, Tan Guangming, Sun Ninghui. Research on Optimal Performance of Sparse Matrix-Vector Multiplication and Convoulution Using the Probability-Process-Ram Model[J]. Journal of Computer Research and Development, 2021, 58(3): 445-457. DOI: 10.7544/issn1000-1239.2021.20180601
Citation:
Xie Zhen, Tan Guangming, Sun Ninghui. Research on Optimal Performance of Sparse Matrix-Vector Multiplication and Convoulution Using the Probability-Process-Ram Model[J]. Journal of Computer Research and Development, 2021, 58(3): 445-457. DOI: 10.7544/issn1000-1239.2021.20180601
1(State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190)
2(Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190)
3(School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing 100049)
Funds: This work was supported by the National Key Research and Development Program of China (2018YFB0204400), the Strategic Priority Research Program of Chinese Academy of Sciences (C)(XDC05010100), and the National Natural Science Foundation of China (62032023, 61972377, 61702483).
Performance models provide insightful perspectives to allow us to predict performance and propose optimization guidance. Although there has been much research, pinpointing bottlenecks of various memory access patterns and reaching high performance of both regular and irregular programs on various hardware configurations are still not trivial. In this work, we propose a novel model called probability-process-ram (PPR) to quantify the amount of compute and data transfer time on general-purpose multicore processors. The PPR model predicts the number of instruction for single-core and probability of memory access between each memory hierarchy through a newly designed cache simulator. By using the automatically extracted best optimization method and expectation, we use PPR model for analyzing and optimizing sparse matrix-vector multiplication and 1D convolution as case study for typical irregular and regular computational kernels. Then we obtain best block sizes for sparse matrices with various sparsity structures, as well as optimal optimization guidance for 1D convolution with different instruction sets support and data sizes. Comparison with Roofline model and ECM model, the proposed PPR model greatly improves prediction accuracy by the newly designed cache simulator and achieves comprehensive feedback ability.
Huan Dandan, Li Zusong, Hu Weiwu, Liu Zhiyong. A Cache Adaptive Write Allocate Policy[J]. Journal of Computer Research and Development, 2007, 44(2): 348-354.