高级检索
    周 谦 冯晓兵 张兆庆. cache profiling信息指导的软件流水[J]. 计算机研究与发展, 2008, 45(5): 834-840.
    引用本文: 周 谦 冯晓兵 张兆庆. cache profiling信息指导的软件流水[J]. 计算机研究与发展, 2008, 45(5): 834-840.
    Zhou Qian, Feng Xiaobing, and Zhang Zhaoqing. Software Pipelining with Cache Profiling Information[J]. Journal of Computer Research and Development, 2008, 45(5): 834-840.
    Citation: Zhou Qian, Feng Xiaobing, and Zhang Zhaoqing. Software Pipelining with Cache Profiling Information[J]. Journal of Computer Research and Development, 2008, 45(5): 834-840.

    cache profiling信息指导的软件流水

    Software Pipelining with Cache Profiling Information

    • 摘要: 软件流水是一种重要的指令调度技术,它通过同时执行来自不同循环迭代的指令来加快循环的执行时间.随着处理器速度和访存速度差距越拉越大,访存指令尤其是cache miss的访存指令日益成为系统性能提高的瓶颈.由于这些指令的延迟不是固定的,如何在软件流水中预测并掩盖这些访存指令的延迟是非常重要的.与前人预测访存延迟的方法不同,引入cache profiling技术,通过动态收集到profile信息来预测访存延迟,并进行适当的调度.当增加模调度循环中的访存指令的延迟时,启动间隔也会随之增大,导致性能不会随之上升. CSMS算法和FLMS算法在尽量不增大启动间隔的情况下,改变访存指令的延迟.改进了CSMS算法和FLMS算法,根据cache profiling的信息来改变访存延迟,所以比前人的方法更为准确.实验表明,新方法可以有效地提高程序性能,对SPEC2000测试程序平均性能提高1%左右,个别例子的性能改进高达11%.

       

      Abstract: Software pipelining is an important instruction scheduling technique. It tries to improve the performance of a loop by overlapping the execution of several successive iterations. As the gap between the speed of processor and memory becomes larger and larger, memory access instructions, especially the instructions which cause cache miss, become the bottleneck that restricts high performance. As these instructions’s latency is not fixed, it is very important to predict and hide the latency of these memory access instructions. Unlike the method used by others, cache profiling technique is introduced, collecting runtime information to predict memory access latency, and to schedule accordingly. When increasing the memory access latency in the software pipelined loop, the initial interval may also increase, thus the performance may not increase. The CSMS and FLMS algorithms are trying to change the memory access latency without increasing the initial interval. The CSMS and FLMS algorithms are improved, changing the memory access latency according to cache profiling information, so it is more accurate than the method used before. Experiment result shows that the new method can improve the performance effectively, increasing performance of SPEC2000 1% on average, some case being as high as 11%.

       

    /

    返回文章
    返回