Abstract:
TRSM (triangular matrix equation solver) is a commonly used algorithm for solving systems of linear equations, and is the core algorithm of various scientific computing libraries and mathematical software, which is widely used in the fields of scientific computing, engineering computing and machine learning. The small-scale irregular TRSM algorithm limits the scope of problem-solving and is an algorithm for efficiently handling smaller-scale, irregular data inputs. With the development of personalization and refinement in the field of high-performance computing, the demand for small-scale irregular TRSM computation in the scientific and industrial communities is becoming more and more obvious. While traditional algorithms are better suited for large-scale and regular TRSM computation, there is still room for improvement in the computational efficiency of small-scale and irregular TRSM. In this paper, we propose a small-scale irregular TRSM optimization scheme by combining hardware architecture and application scenario characteristics, designing a high-performance kernel from the perspectives of register chunking, boundary processing, and vectorization computation, and constructing an algorithmic library of small-scale irregular SI_TRSM (small-scale irregular TRSM) covering double-precision real numbers and double-precision complex numbers based on which the performance of this algorithm is greatly improved. Based on experimental results, the double-precision small-scale irregular TRSM algorithm library developed in this paper has shown to enhance the average performance of double-precision small-scale irregular real numbers by 29.4 times, and double-precision small-scale irregular complex numbers by 24.6 times in comparison with similar algorithms available in the MKL (Intel math kernel library).