• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Cao Hang, Yuan Liang, Huang Shan, Zhang Yunquan, Xu Yongjun, Lu Pengqi, Zhang Guangting. A Parallel Star Stencil Algorithm Based on Tessellating[J]. Journal of Computer Research and Development, 2020, 57(12): 2621-2634. DOI: 10.7544/issn1000-1239.2020.20190734
Citation: Cao Hang, Yuan Liang, Huang Shan, Zhang Yunquan, Xu Yongjun, Lu Pengqi, Zhang Guangting. A Parallel Star Stencil Algorithm Based on Tessellating[J]. Journal of Computer Research and Development, 2020, 57(12): 2621-2634. DOI: 10.7544/issn1000-1239.2020.20190734

A Parallel Star Stencil Algorithm Based on Tessellating

Funds: This work was supported by the National Key Research and Development Program of China (2016YFB0200803), the Strategic Priority Research Program of Chinese Academy of Sciences (C) (XDC01040100), the National Natural Science Foundation of China (61972376, 62072431, 61432018), and the Beijing Natural Science Foundation (L182053).
More Information
  • Published Date: November 30, 2020
  • The Stencil computation, i.e. the structured grid computing, is a very common kind of loop nesting algorithm in scientific and engineering applications. The exhaustively studied tiling method is of great effectiveness as one of the transformation techniques to exploit the data locality and parallelism of Stencil computations. However, the state-of-the-art work of tiling often uniformly handles different Stencil shapes. We first present a concept called natural block to identify the difference between the star and box Stencils. Then we propose a new two-level tessellation scheme for star Stencils, where the natural block, as well as its successive blocks can tessellate the spatial space and their extensions along the time dimension are able to form a tessellation of the iteration space. Furthermore, a novel implementation technique called double updating is developed for star Stencils specifically, which updates each element twice continuously and improves the in-core data reuse pattern. In addition, we adopt coarsening and block reuse to enhance the parallelization performance. Theoretical analysis shows that our scheme achieves a better cache complexity than existing methods such as Girih and Pluto. The experiments on performance and bandwidth are conducted on a multicore system. The results demonstrate the effectiveness of our approach.

Catalog

    Article views (1013) PDF downloads (464) Cited by()
    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return