Abstract:
Stencil computations are widely adopted in scientific applications. Many HPC platforms utilize the high computation capability of GPUs to accelerate stencil computations. In recent years, stencils have become more complex in terms of stencil order, memory accesses, and computation patterns. To adapt stencil computations to GPU architectures, the academic community has proposed a variety of optimization techniques based on streaming and tiling. Due to the diversity of stencil computational patterns and GPU architectures, no single optimization technique fits all stencil instances. Therefore, researchers have proposed stencil auto-tuning mechanisms to conduct parameter searches for a given combination of optimization techniques. However, existing mechanisms introduce huge offline profiling costs and online prediction overhead, unable to be flexible to arbitrary stencil patterns. To address the above problems, this paper proposes a generalized stencil auto-tuning framework GeST, which achieves the ultimate performance optimization of stencil computations on GPU platforms. Specifically, GeST constructs the global search space through the zero-padding format, quantifying parameter correlations via the coefficient of variation to generate parameter groups. After that, GeST iteratively selects parameter values from the parameter groups, adjusting the sampling ratio according to the reward policy and avoiding redundant execution through hash coding. The experimental results show that GeST can identify better-performing parameter settings in a short time compared to other state-of-the-art auto-tuning works.