Abstract:
The number of processor hardware events can be collected simultaneously and is limited by the number of processor hardware performance counters. Modern CPUs support hundreds of low-level hardware events, while only offer a small number (usually 6~12) of hardware performance counters (to collect these hardware events) due to limited register resource. To deal with this problem, multiplexing (MPX) is proposed to estimate simultaneously collected hardware events under the constrain of limited hardware counters. However, the low-accuracy of existing time-locality-based estimation algorithms prevents MPX from wide usage in real conditions. In order to improve the MPX accuracy, we design a new estimation algorithm. Our work includes three parts: 1) we characterize the distribution of MPX results and one counter one event (OCOE) by Kolmogorov-Smirnov test and find the distribution consistency of MPX results; 2) we propose a new distribution-consistency-based estimation algorithm for MPX, outline estimation (OLE); 3) we validate OLE within the open-source MPX library NeoMPX on the mainstream X86 and ARM processors. The results show that, for simultaneously collecting 16 processor hardware events, OLE can improve up to 46.6% accuracy than the PAPI default MPX estimation algorithm and achieve 18.8% and 17.7% higher accuracy than the other four state-of-art MPX estimation algorithms respectively.