高级检索

    一种改进的适合并行计算的TFQMR算法

    An Improved TFQMR Algorithm for Large Linear Systems Suited to Parallel Computing

    • 摘要: TFQMR算法是一种Krylov子空间算法,常用来求解大型稀疏线性方程组.通过改变TFQMR算法的计算次序,提出了一种改进的TFQMR(ITFQMR)算法.对比TFQMR算法,ITFQMR算法的数值稳定性和TFQMR算法相同,几乎没有增加计算量,但考虑了在MIMD并行机上实现时并行算法的性能,其同步开销减少为TFQMR算法的一半,并且所有内积计算以及矩阵向量乘是独立的,没有数据相关性,可以进行计算与通信的重叠.从理论和实验两个角度来讨论ITFQMR算法的性能,当处理机台数较多时,ITFQMR算法的计算速度快于TFQMR算法.实验说明了在有64台处理机机群上进行,最快的并行ITFQMR算法的计算速度大约比TFQMR算法快20%.

       

      Abstract: The transpose-free QMR(TFQMR) algorithm is a Krylov subspace algorithm that can be used to obtain fast solutions for linear systems with very large and very sparse coefficient matrices. In this paper, by changing the computation sequence in the TFQMR algorithm, an improved transpose-free QMR (ITFQMR) algorithm is proposed. The numerical stability of the ITFQMR algorithm is the same as TFQMR algorithm, but the synchronization overhead that represents the bottleneck of the parallel performance is effectively reduced by a factor of two. And all inner products of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time of vector updates. From the theoretical and experimental analysis it is found that the ITFQMR algorithm is faster than the TFQMR algorithm as the number of processors increases. The experiments performed on a 64-processor cluster indicate that the ITFQMR is approximately 20% faster than the TFQMR.

       

    /

    返回文章
    返回