高级检索
    林新华, 王杰, 王一超, 左思成. 基于数据分布一致性的处理器硬件性能计数器复用估计方法[J]. 计算机研究与发展, 2022, 59(6): 1192-1201. DOI: 10.7544/issn1000-1239.20200989
    引用本文: 林新华, 王杰, 王一超, 左思成. 基于数据分布一致性的处理器硬件性能计数器复用估计方法[J]. 计算机研究与发展, 2022, 59(6): 1192-1201. DOI: 10.7544/issn1000-1239.20200989
    Lin Xinhua, Wang Jie, Wang Yichao, Zuo Sicheng. A Data Distribution-Consistency-Based Estimation Method for Multiplexing Processor Hardware Performance Counters[J]. Journal of Computer Research and Development, 2022, 59(6): 1192-1201. DOI: 10.7544/issn1000-1239.20200989
    Citation: Lin Xinhua, Wang Jie, Wang Yichao, Zuo Sicheng. A Data Distribution-Consistency-Based Estimation Method for Multiplexing Processor Hardware Performance Counters[J]. Journal of Computer Research and Development, 2022, 59(6): 1192-1201. DOI: 10.7544/issn1000-1239.20200989

    基于数据分布一致性的处理器硬件性能计数器复用估计方法

    A Data Distribution-Consistency-Based Estimation Method for Multiplexing Processor Hardware Performance Counters

    • 摘要: 同时可记录的处理器硬件事件数量受限于处理器硬件性能计算器的数量.目前主流处理器可支持大量(数百个)硬件事件,但由于片上寄存器数量有限,仅提供了少量(通常6~12个)硬件性能计数器.为缓解这一矛盾,硬件计数器复用技术(multiplexing,MPX)通过分时复用策略,利用少量计算寄存器来估算大量硬件事件.但在实践中,由于已有基于时间局部性的MPX估计算法结果准确率偏低,导致MPX一直未被广泛采用.为了提升MPX结果准确率,主要工作包括3部分:1)通过Kolmogorov-Smirnov正态性检验,发现针对同一硬件事件,相同代码在单计数器记录单事件(one counter one event, OCOE)的OCOE模式和MPX模式下,存在数据分布一致性的规律;2)基于此规律,提出了轮廓线估计法(outline estimation, OLE);3)在开源MPX库NeoMPX上实现了OLE算法,并在主 流X86和ARM处理器上进行了验证.实验结果表明:在对16个硬件事件同时进行采集时,OLE算法相比PAPI默认的MPX估计算法,结果准确率平均提高了10.5%左右,最多可提升46.6%;相比已有算法,结果准确率分别提升了18.8% 和17.7%.

       

      Abstract: The number of processor hardware events can be collected simultaneously and is limited by the number of processor hardware performance counters. Modern CPUs support hundreds of low-level hardware events, while only offer a small number (usually 6~12) of hardware performance counters (to collect these hardware events) due to limited register resource. To deal with this problem, multiplexing (MPX) is proposed to estimate simultaneously collected hardware events under the constrain of limited hardware counters. However, the low-accuracy of existing time-locality-based estimation algorithms prevents MPX from wide usage in real conditions. In order to improve the MPX accuracy, we design a new estimation algorithm. Our work includes three parts: 1) we characterize the distribution of MPX results and one counter one event (OCOE) by Kolmogorov-Smirnov test and find the distribution consistency of MPX results; 2) we propose a new distribution-consistency-based estimation algorithm for MPX, outline estimation (OLE); 3) we validate OLE within the open-source MPX library NeoMPX on the mainstream X86 and ARM processors. The results show that, for simultaneously collecting 16 processor hardware events, OLE can improve up to 46.6% accuracy than the PAPI default MPX estimation algorithm and achieve 18.8% and 17.7% higher accuracy than the other four state-of-art MPX estimation algorithms respectively.

       

    /

    返回文章
    返回