ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2020, Vol. 57 ›› Issue (12): 2621-2634.doi: 10.7544/issn1000-1239.2020.20190734

• 软件技术 • 上一篇    下一篇



  1. 1(计算机体系结构国家重点实验室(中国科学院计算技术研究所) 北京 100190) 2(中国科学院大学 北京 100049) (
  • 出版日期: 2020-12-01
  • 基金资助: 
    国家重点研发计划项目(2016YFB0200803);中国科学院战略性先导科技专项(C类)(XDC01040100); 国家自然科学基金项目(61972376,62072431,61432018);北京市自然科学基金项目(L182053)

A Parallel Star Stencil Algorithm Based on Tessellating

Cao Hang1,2, Yuan Liang1, Huang Shan1,2, Zhang Yunquan1, Xu Yongjun1, Lu Pengqi1,2, Zhang Guangting1   

  1. 1(State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190);2(University of Chinese Academy of Sciences, Beijing 100049)
  • Online: 2020-12-01
  • Supported by: 
    This work was supported by the National Key Research and Development Program of China (2016YFB0200803), the Strategic Priority Research Program of Chinese Academy of Sciences (C) (XDC01040100), the National Natural Science Foundation of China (61972376, 62072431, 61432018), and the Beijing Natural Science Foundation (L182053).

摘要: Stencil计算(模板计算)是科学工程应用中一类常见的嵌套循环算法.分块方法是提高数据局部性和并行性的高效优化技术之一,目前已有大量针对分块方法的探索,但现有工作往往对不同Stencil形状都采用同一处理方法.首先在空间层面引出“自然块”的概念来区分星型Stencil和盒型Stencil的特征,然后提出一个新的针对星型Stencil的2层密铺方案,此方案中自然块和它的后继块可以密铺数据空间区域,这些分块沿着时间维度扩展,能够密铺整个迭代空间.此外,针对星型Stencil设计了一个新颖的“2次更新”优化技术,改善了核内数据重用模式.理论分析表明:此方案相比现有方法有更低的缓存复杂度,实验结果证实了此方案的有效性.

关键词: Stencil计算, 密铺, 星型Stencil, 盒型Stencil, 自然块

Abstract: The Stencil computation, i.e. the structured grid computing, is a very common kind of loop nesting algorithm in scientific and engineering applications. The exhaustively studied tiling method is of great effectiveness as one of the transformation techniques to exploit the data locality and parallelism of Stencil computations. However, the state-of-the-art work of tiling often uniformly handles different Stencil shapes. We first present a concept called natural block to identify the difference between the star and box Stencils. Then we propose a new two-level tessellation scheme for star Stencils, where the natural block, as well as its successive blocks can tessellate the spatial space and their extensions along the time dimension are able to form a tessellation of the iteration space. Furthermore, a novel implementation technique called double updating is developed for star Stencils specifically, which updates each element twice continuously and improves the in-core data reuse pattern. In addition, we adopt coarsening and block reuse to enhance the parallelization performance. Theoretical analysis shows that our scheme achieves a better cache complexity than existing methods such as Girih and Pluto. The experiments on performance and bandwidth are conducted on a multicore system. The results demonstrate the effectiveness of our approach.

Key words: Stencil computation, tessellation, star Stencil, box Stencil, natural block