ISSN 1000-1239 CN 11-1777/TP

• 论文 • 上一篇    下一篇

主成分线性回归模型分析应用程序性能

李胜梅1 程步奇2 高兴誉3 乔 林1 汤志忠1   

  1. 1(清华大学计算机科学与技术系 北京 100084) 2(英特尔中国研究中心编程系统实验室 北京 100080) 3(中国科学院数学与系统科学研究院 北京 100190) (lism03@mails.tsinghua.edu.cn)
  • 出版日期: 2009-11-15

Principal Component Linear Regression Analysis on Performance of Applications

Li Shengmei1, Cheng buqi2, Gao Xingyu3, Qiao Lin1, and Tang Zhizhong1   

  1. 1(Department of Computer Science and Technology, Tsinghua University, Beijing 100084) 2(Programming Systems Laboratory, Intel China Research Center, Beijing 100080) 3(Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190)
  • Online: 2009-11-15

摘要: 应用程序的性能分析能够给体系架构设计者和性能优化者提供有效的参考和指导.采用主成分线性回归模型分析了SPEC CPU2006的整型程序性能.模型选取性能监测单元采样到的事件为自变量,每条指令的时钟周期数(CPI)作为因变量.模型中采用主成分分析法消除了性能事件之间的相关性.实验结果表明,模型的拟合优度在90%以上,对性能进行预测的平均相对误差为15%.模型从量化上分析了L1,L2高速缓存缺失作为影响性能的关键因素是怎样影响程序性能的.

关键词: 性能分析, cache缺失, 主成分分析, 线性回归, SPECCPU2006

Abstract: The factors influencing application performance are various and the extents of influence are different. Analyzing and distinguishing the extents of influence caused by various factors can guide the architects in the architecture design and help programmers in the optimization. However, it is not easy to distinguish the extents of influence because the factors may correlate each other themselves. In this paper, a principal component linear regression model aiming at performance of SPEC CPU2006 integer benchmarks is set up. Cycles per instruction(CPI) is used to represent the application performance and the performance events monitored by performance monitor unit (PMU) are used to represent the influencing factors. Principal component analysis is implemented to eliminate the linear correlation among performance events. Then linear regression model is set up which uses CPI as the dependent variable and principal components as the independent variables. This model can analyze the influence on CPI caused by the performance events i.e. L1 data cache miss, L2 cache miss, DTLB miss, branch mis-prediction, micro-fusion, memory disambiguation events quantitatively. The model is validated by the t test and F test with goodness of fit over 90%. The average relative prediction error of the model is 15%. The results show quantitatively how L1 and L2 cache misses dominate the performance of the applications.

Key words: performance analysis, cache miss, principal component analysis, linear regression analysis, SPEC CPU2006