面向GPU平台的通用Stencil自动调优框架

孙庆骁; 杨海龙

doi:10.7544/issn1000-1239.202440612

面向GPU平台的通用Stencil自动调优框架

Generalized Stencil Auto-Tuning Framework on GPU Platform

摘要

摘要: Stencil计算在科学应用中得到了广泛采用. 许多高性能计算（HPC）平台利用GPU的高计算能力来加速Stencil计算. 近年来，Stencil计算在阶数、内存访问和计算模式等方面变得更加复杂. 为了使Stencil计算适配GPU架构，学术界提出了各种基于流处理和分块的优化技术. 由于Stencil计算模式和GPU架构的多样性，没有单一的优化技术适合所有Stencil实例. 因此，研究人员提出了Stencil自动调优机制来对给定优化技术组合进行参数搜索. 然而，现有机制引入了庞大的离线分析成本和在线预测开销，并且无法灵活地推广到任意Stencil模式. 为了解决上述问题，提出了通用Stencil自动调优框架GeST，其在GPU平台上实现Stencil计算的极致性能优化. 具体来说，GeST通过零填充格式构建全局搜索空间，利用变异系数量化参数相关性并生成参数组；之后，GeST迭代地从参数组选取参数值，根据奖励策略调整采样比例并通过哈希编码避免冗余执行. 实验结果表明，与其他先进的自动调优工作相比，GeST能够在短时间内识别出性能更优的参数设置.

Abstract: Stencil computations are widely adopted in scientific applications. Many HPC platforms utilize the high computation capability of GPUs to accelerate Stencil computations. In recent years, Stencils have become more complex in terms of Stencil order, memory accesses, and computation patterns. To adapt Stencil computations to GPU architectures, the academic community has proposed a variety of optimization techniques based on streaming and tiling. Due to the diversity of Stencil computational patterns and GPU architectures, no single optimization technique fits all Stencil instances. Therefore, researchers have proposed Stencil auto-tuning mechanisms to conduct parameter searches for a given combination of optimization techniques. However, existing mechanisms introduce huge offline profiling costs and online prediction overhead, unable to be flexible to arbitrary Stencil patterns. To address the above problems, we propose a generalized Stencil auto-tuning framework GeST, which achieves the ultimate performance optimization of Stencil computations on GPU platforms. Specifically, GeST constructs the global search space through the zero-padding format, quantifying parameter correlations via the coefficient of variation to generate parameter groups. After that, GeST iteratively selects parameter values from the parameter groups, adjusting the sampling ratio according to the reward policy and avoiding redundant execution through Hash coding. The experimental results show that GeST can identify better-performing parameter settings in a short time compared with other state-of-the-art auto-tuning work.

HTML全文

参考文献(42)

施引文献

资源附件(0)