ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2020, Vol. 57 ›› Issue (12): 2621-2634.doi: 10.7544/issn1000-1239.2020.20190734

Previous Articles     Next Articles

A Parallel Star Stencil Algorithm Based on Tessellating

Cao Hang1,2, Yuan Liang1, Huang Shan1,2, Zhang Yunquan1, Xu Yongjun1, Lu Pengqi1,2, Zhang Guangting1   

  1. 1(State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190);2(University of Chinese Academy of Sciences, Beijing 100049)
  • Online:2020-12-01
  • Supported by: 
    This work was supported by the National Key Research and Development Program of China (2016YFB0200803), the Strategic Priority Research Program of Chinese Academy of Sciences (C) (XDC01040100), the National Natural Science Foundation of China (61972376, 62072431, 61432018), and the Beijing Natural Science Foundation (L182053).

Abstract: The Stencil computation, i.e. the structured grid computing, is a very common kind of loop nesting algorithm in scientific and engineering applications. The exhaustively studied tiling method is of great effectiveness as one of the transformation techniques to exploit the data locality and parallelism of Stencil computations. However, the state-of-the-art work of tiling often uniformly handles different Stencil shapes. We first present a concept called natural block to identify the difference between the star and box Stencils. Then we propose a new two-level tessellation scheme for star Stencils, where the natural block, as well as its successive blocks can tessellate the spatial space and their extensions along the time dimension are able to form a tessellation of the iteration space. Furthermore, a novel implementation technique called double updating is developed for star Stencils specifically, which updates each element twice continuously and improves the in-core data reuse pattern. In addition, we adopt coarsening and block reuse to enhance the parallelization performance. Theoretical analysis shows that our scheme achieves a better cache complexity than existing methods such as Girih and Pluto. The experiments on performance and bandwidth are conducted on a multicore system. The results demonstrate the effectiveness of our approach.

Key words: Stencil computation, tessellation, star Stencil, box Stencil, natural block

CLC Number: