Advanced Search
    You Xin, Yang Hailong, Lei Kelun, Kong Xianghao, Xu Jun, Luan Zhongzhi, Qian Depei. A Cross-Platform Fine-Grained Performance Analysis Technique for Redundant Zeros[J]. Journal of Computer Research and Development, 2023, 60(5): 1164-1176. DOI: 10.7544/issn1000-1239.202111189
    Citation: You Xin, Yang Hailong, Lei Kelun, Kong Xianghao, Xu Jun, Luan Zhongzhi, Qian Depei. A Cross-Platform Fine-Grained Performance Analysis Technique for Redundant Zeros[J]. Journal of Computer Research and Development, 2023, 60(5): 1164-1176. DOI: 10.7544/issn1000-1239.202111189

    A Cross-Platform Fine-Grained Performance Analysis Technique for Redundant Zeros

    • Software inefficiencies caused by redundant zeros will introduce massive zero values to be loaded or used for trivial computation, which significantly wastes memory and compute resources. However, the compiler toolchain still cannot effectively identify the redundant operations dealing with zeros and hardware optimizations handling redundant zeros have not been adopted in commercial hardware yet. Although ZeroSpy can detect the existence of redundant zero buried within software and report sufficient information for performance optimization, its detection is still limited in Intel platform as well as its large overhead. Therefore, we propose a cross-platform tool DrZero to overcome these limitations. DrZero can detect redundant zeros in both x86 and ARM platforms and it implements novel online analysis based on buffered tracing for lower overhead. For ARM platform, we propose floating-point estimation via dataflow analysis to estimate the data type of a memory operand for further detection. The evaluation results demonstrate that DrZero can detect redundant zeros with code-centric, data-centric analysis on both x86 and ARM platforms with 45.31×, 54.20× and 14.12×, 13.40× performance overheads, respectively. Besides, DrZero incurs 37.2% and 55.8% lower time overheads than ZeroSpy with code-centric and data-centric analysis on the x86 platform, respectively. Based on the optimization guidance revealed by DrZero, we can achieve 1.76× and 2.12× speedups at maximum on both x86 and ARM platforms after eliminating redundant zeros for evaluated applications. DrZero is open-source at https://github.com/buaa-hipo/zerospy-drcctprof.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return