一种单GPU程序向多GPU移植的模板化技术

李建江; 李兴钢; 路  川; 樊少明

一种单GPU程序向多GPU移植的模板化技术

A Template Technology for Transplanting from Single-GPU Programs to Multi-GPU Programs

摘要

摘要: 图形处理器(GPU)作为一种高度并行化的处理器架构，已得到越来越多的重视，目前已诞生了以NVIDIA CUDA为代表的各种GPU通用计算技术，同时多GPU并行计算也已有了实际的应用.多GPU并行计算涉及GPU与CPU两者之间的协调和交互，对程序员有着更高的要求.为此，提出一种基于模板的源代码生成技术，通过模板转化来支持单GPU程序的并行化移植.最后通过一个实例表明使用提出的CUDA源代码移植框架能够自动生成与手写程序等价的代码，可以显著降低多GPU下CUDA程序的开发代价，提高CUDA应用程序员的生产效率.

Abstract: Graphics processing unit (GPU) has gained more and more attention as a kind of processor architecture with high parallelism, followed by various general purpose GPU computing technologies represented by NVIDIA CUDA. Multi-GPU parallel computing has also attracted many researchers. Involved by co-ordination and interaction, multi-GPU computing has a higher requirement on data division and data communications. To reduce the complexity while developing multi-GPU software, the authors propose a code generating technology based on templates, which can generate an OpenMP+CUDA source code in order to support parallelizing transplantation from a single GPU program. It uses simple guide statements to describe the data division and communications and finally generate multi-threaded CUDA memory management API in C language, in order to wrap the CUDA memory operation and simulate the CPU-GPU and GPU-GPU data transfer. So it is easy to auto-generate multi-GPU procedures from single-GPU ones. Finally, the authors provide a sample CUDA program for solving Laplace equation using Gauss-Seidel iteration showing the input of the template system. The transplanted result shows that although the generated code may have more synchronization statements, the performance lost is small and can be omitted. Therefore, by making use of the template system, the authors achieve a notable drop in the cost of multi-GPU CUDA programming and an improvement of the CUDA programmers productivity.

HTML全文

参考文献(0)

施引文献

资源附件(0)