Abstract:
Graphics processing unit (GPU) has gained more and more attention as a kind of processor architecture with high parallelism, followed by various general purpose GPU computing technologies represented by NVIDIA CUDA. Multi-GPU parallel computing has also attracted many researchers. Involved by co-ordination and interaction, multi-GPU computing has a higher requirement on data division and data communications. To reduce the complexity while developing multi-GPU software, the authors propose a code generating technology based on templates, which can generate an OpenMP+CUDA source code in order to support parallelizing transplantation from a single GPU program. It uses simple guide statements to describe the data division and communications and finally generate multi-threaded CUDA memory management API in C language, in order to wrap the CUDA memory operation and simulate the CPU-GPU and GPU-GPU data transfer. So it is easy to auto-generate multi-GPU procedures from single-GPU ones. Finally, the authors provide a sample CUDA program for solving Laplace equation using Gauss-Seidel iteration showing the input of the template system. The transplanted result shows that although the generated code may have more synchronization statements, the performance lost is small and can be omitted. Therefore, by making use of the template system, the authors achieve a notable drop in the cost of multi-GPU CUDA programming and an improvement of the CUDA programmers productivity.