Abstract:
The transpose-free QMR(TFQMR) algorithm is a Krylov subspace algorithm that can be used to obtain fast solutions for linear systems with very large and very sparse coefficient matrices. In this paper, by changing the computation sequence in the TFQMR algorithm, an improved transpose-free QMR (ITFQMR) algorithm is proposed. The numerical stability of the ITFQMR algorithm is the same as TFQMR algorithm, but the synchronization overhead that represents the bottleneck of the parallel performance is effectively reduced by a factor of two. And all inner products of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time of vector updates. From the theoretical and experimental analysis it is found that the ITFQMR algorithm is faster than the TFQMR algorithm as the number of processors increases. The experiments performed on a 64-processor cluster indicate that the ITFQMR is approximately 20% faster than the TFQMR.