高级检索
    曹杭, 袁良, 黄珊, 张云泉, 徐勇军, 陆鹏起, 张广婷. 一种基于空间密铺的星型Stencil并行算法[J]. 计算机研究与发展, 2020, 57(12): 2621-2634. DOI: 10.7544/issn1000-1239.2020.20190734
    引用本文: 曹杭, 袁良, 黄珊, 张云泉, 徐勇军, 陆鹏起, 张广婷. 一种基于空间密铺的星型Stencil并行算法[J]. 计算机研究与发展, 2020, 57(12): 2621-2634. DOI: 10.7544/issn1000-1239.2020.20190734
    Cao Hang, Yuan Liang, Huang Shan, Zhang Yunquan, Xu Yongjun, Lu Pengqi, Zhang Guangting. A Parallel Star Stencil Algorithm Based on Tessellating[J]. Journal of Computer Research and Development, 2020, 57(12): 2621-2634. DOI: 10.7544/issn1000-1239.2020.20190734
    Citation: Cao Hang, Yuan Liang, Huang Shan, Zhang Yunquan, Xu Yongjun, Lu Pengqi, Zhang Guangting. A Parallel Star Stencil Algorithm Based on Tessellating[J]. Journal of Computer Research and Development, 2020, 57(12): 2621-2634. DOI: 10.7544/issn1000-1239.2020.20190734

    一种基于空间密铺的星型Stencil并行算法

    A Parallel Star Stencil Algorithm Based on Tessellating

    • 摘要: Stencil计算(模板计算)是科学工程应用中一类常见的嵌套循环算法.分块方法是提高数据局部性和并行性的高效优化技术之一,目前已有大量针对分块方法的探索,但现有工作往往对不同Stencil形状都采用同一处理方法.首先在空间层面引出“自然块”的概念来区分星型Stencil和盒型Stencil的特征,然后提出一个新的针对星型Stencil的2层密铺方案,此方案中自然块和它的后继块可以密铺数据空间区域,这些分块沿着时间维度扩展,能够密铺整个迭代空间.此外,针对星型Stencil设计了一个新颖的“2次更新”优化技术,改善了核内数据重用模式.理论分析表明:此方案相比现有方法有更低的缓存复杂度,实验结果证实了此方案的有效性.

       

      Abstract: The Stencil computation, i.e. the structured grid computing, is a very common kind of loop nesting algorithm in scientific and engineering applications. The exhaustively studied tiling method is of great effectiveness as one of the transformation techniques to exploit the data locality and parallelism of Stencil computations. However, the state-of-the-art work of tiling often uniformly handles different Stencil shapes. We first present a concept called natural block to identify the difference between the star and box Stencils. Then we propose a new two-level tessellation scheme for star Stencils, where the natural block, as well as its successive blocks can tessellate the spatial space and their extensions along the time dimension are able to form a tessellation of the iteration space. Furthermore, a novel implementation technique called double updating is developed for star Stencils specifically, which updates each element twice continuously and improves the in-core data reuse pattern. In addition, we adopt coarsening and block reuse to enhance the parallelization performance. Theoretical analysis shows that our scheme achieves a better cache complexity than existing methods such as Girih and Pluto. The experiments on performance and bandwidth are conducted on a multicore system. The results demonstrate the effectiveness of our approach.

       

    /

    返回文章
    返回