Automatic Insertion Method of High-Performance Synchronization Primitives for Ascend Processors

Li Shuaijiang; Zhang Xinyuan; Zhao Jiacheng; Tian Xinghui; Shi Xiyu; Xu Xiaoxin; Cui Huimin

doi:10.7544/issn1000-1239.202440093

Li Shuaijiang, Zhang Xinyuan, Zhao Jiacheng, Tian Xinghui, Shi Xiyu, Xu Xiaoxin, Cui Huimin. Automatic Insertion Method of High-Performance Synchronization Primitives for Ascend Processors[J]. Journal of Computer Research and Development, 2025, 62(8): 1962-1978. DOI: 10.7544/issn1000-1239.202440093

Citation:

Automatic Insertion Method of High-Performance Synchronization Primitives for Ascend Processors

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Instruction-level parallelism (ILP) is a classic challenge in the field of processor architecture. Domain-specific architectures, such as the Ascend processor, expose more pipeline details to upper-layer software, and compilers/programmers explicitly control the synchronization between pipelines to optimize ILP. However, the physical synchronization resources between pipelines are limited, which limits the improvement of ILP. To address this issue, a high-performance automatic synchronization primitive insertion method for the Ascend processor is proposed. By introducing the abstraction of “virtual synchronization resources”, this method decouples the insertion of synchronization primitives from the selection of physical synchronization resources. Firstly, a heuristic algorithm is proposed to insert virtual synchronization primitives in complex control flow graphs. Then, a significant number of virtual synchronization resources are mapped to an extremely limited number of physical synchronization resources through virtual synchronization primitive merging and other techniques. At the same time, redundant synchronization primitives in the program are removed based on the partial order relationship between instructions, while ensuring program correctness and stringent hardware resource constraints. Experiments on the Ascend 910A platform using instruction-level and operator-level benchmark programs show that the programs with automatically inserted synchronization primitives achieve performance comparable to or on par with those manually inserted by expert programmers, while ensuring correctness.

FullText(HTML)

References (34)

Cited By

Turn off MathJax

Article Contents

Automatic Insertion Method of High-Performance Synchronization Primitives for Ascend Processors

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content