高级检索
    贾耀仓, 武成岗, 张兆庆. 指导cache静态划分的程序性能profiling优化技术[J]. 计算机研究与发展, 2012, 49(1): 93-102.
    引用本文: 贾耀仓, 武成岗, 张兆庆. 指导cache静态划分的程序性能profiling优化技术[J]. 计算机研究与发展, 2012, 49(1): 93-102.
    Jia Yaocang, Wu Chenggang, Zhang Zhaoqing. Program’s Performance Profiling Optimization for Guiding Static Cache Partitioning[J]. Journal of Computer Research and Development, 2012, 49(1): 93-102.
    Citation: Jia Yaocang, Wu Chenggang, Zhang Zhaoqing. Program’s Performance Profiling Optimization for Guiding Static Cache Partitioning[J]. Journal of Computer Research and Development, 2012, 49(1): 93-102.

    指导cache静态划分的程序性能profiling优化技术

    Program’s Performance Profiling Optimization for Guiding Static Cache Partitioning

    • 摘要: 对于共享cache的多核处理器,如何管理好各个核对cache的利用,对于充分发挥多核处理器性能是很关键的问题.目前采用的cache替换方法程序间会出现性能干扰,cache静态划分技术则是通过为同时运行的程序分配不同的空间来解决性能干扰问题.为了给程序分配合适大小的cache空间,需要对程序进行性能profiling,即事先多遍运行收集程序在各种cache容量下的性能数据,这种性能profiling方法开销巨大,影响实用.为了解决性能profiling需要多遍运行程序的问题,提出了只需单遍运行的程序性能profiling优化技术.该技术利用在线的phase分析技术识别程序的运行阶段,避免对相同阶段的重复profiling;同时分析程序各phase的性能同cache容量变化的关系趋势,对于性能不敏感的容量变化则不进行profiling,降低开销.在程序运行结束后通过程序各phase在cache各种容量下的性能来估计程序在各容量下的整体性能,以指导cache静态划分.实验表明,该技术的开销仅为7%,而该方法指导的cache划分比未划分时有8%的性能改进,同多遍运行的程序性能profiling指导的cache划分性能相比仅有1%的下降.

       

      Abstract: How to coordinate the core’s utilizing of cache resource is a key issue for shared cache multi-core processors. The current methods used in cache replacement may cause performance interference between the programs simultaneously running on the cores. Static cache partitioning techniques divide the shared cache into exclusive regions for programs to address the interference problem. In order to allocate the cache space with appropriate size to programs, performance profiling is needed to collect program’s performance under a variety of cache capacities, which has to run program multi-pass, one pass for each cache capacity. The enormous overhead of profiling prevents the static partitioning method from practical use. This paper presents a performance profiling optimization which needs only single-pass run. Phase analysis is used to identify program’s phases to eliminate redundant profiling for the same phases and unnecessary profiling in some cache capacities where program’s performance behavior is insensitive. Then the program’s overall performance under different capacities could be estimated with phases. Experimental results show that the overhead of this method is only 7%, while using it to guide partitioning can get 8% performance improvement than with no partitioning, and only 1% decline compared with the multi-pass profiling guided partitioning.

       

    /

    返回文章
    返回