Cao Hang, Yuan Liang, Huang Shan, Zhang Yunquan, Xu Yongjun, Lu Pengqi, Zhang Guangting. A Parallel Star Stencil Algorithm Based on Tessellating[J]. Journal of Computer Research and Development, 2020, 57(12): 2621-2634. DOI: 10.7544/issn1000-1239.2020.20190734
Citation:
Cao Hang, Yuan Liang, Huang Shan, Zhang Yunquan, Xu Yongjun, Lu Pengqi, Zhang Guangting. A Parallel Star Stencil Algorithm Based on Tessellating[J]. Journal of Computer Research and Development, 2020, 57(12): 2621-2634. DOI: 10.7544/issn1000-1239.2020.20190734
Cao Hang, Yuan Liang, Huang Shan, Zhang Yunquan, Xu Yongjun, Lu Pengqi, Zhang Guangting. A Parallel Star Stencil Algorithm Based on Tessellating[J]. Journal of Computer Research and Development, 2020, 57(12): 2621-2634. DOI: 10.7544/issn1000-1239.2020.20190734
Citation:
Cao Hang, Yuan Liang, Huang Shan, Zhang Yunquan, Xu Yongjun, Lu Pengqi, Zhang Guangting. A Parallel Star Stencil Algorithm Based on Tessellating[J]. Journal of Computer Research and Development, 2020, 57(12): 2621-2634. DOI: 10.7544/issn1000-1239.2020.20190734
1(State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190)
Funds: This work was supported by the National Key Research and Development Program of China (2016YFB0200803), the Strategic Priority Research Program of Chinese Academy of Sciences (C) (XDC01040100), the National Natural Science Foundation of China (61972376, 62072431, 61432018), and the Beijing Natural Science Foundation (L182053).
The Stencil computation, i.e. the structured grid computing, is a very common kind of loop nesting algorithm in scientific and engineering applications. The exhaustively studied tiling method is of great effectiveness as one of the transformation techniques to exploit the data locality and parallelism of Stencil computations. However, the state-of-the-art work of tiling often uniformly handles different Stencil shapes. We first present a concept called natural block to identify the difference between the star and box Stencils. Then we propose a new two-level tessellation scheme for star Stencils, where the natural block, as well as its successive blocks can tessellate the spatial space and their extensions along the time dimension are able to form a tessellation of the iteration space. Furthermore, a novel implementation technique called double updating is developed for star Stencils specifically, which updates each element twice continuously and improves the in-core data reuse pattern. In addition, we adopt coarsening and block reuse to enhance the parallelization performance. Theoretical analysis shows that our scheme achieves a better cache complexity than existing methods such as Girih and Pluto. The experiments on performance and bandwidth are conducted on a multicore system. The results demonstrate the effectiveness of our approach.