高级检索
    张昆, 过锋, 郑方, 谢向辉. 众核处理器的流水线紧耦合指令循环缓存设计[J]. 计算机研究与发展, 2017, 54(4): 813-820. DOI: 10.7544/issn1000-1239.2017.20160116
    引用本文: 张昆, 过锋, 郑方, 谢向辉. 众核处理器的流水线紧耦合指令循环缓存设计[J]. 计算机研究与发展, 2017, 54(4): 813-820. DOI: 10.7544/issn1000-1239.2017.20160116
    Zhang Kun, Guo Feng, Zheng Fang, Xie Xianghui. Design of a Pipeline-Coupled Instruction Loop Cache for Many-Core Processors[J]. Journal of Computer Research and Development, 2017, 54(4): 813-820. DOI: 10.7544/issn1000-1239.2017.20160116
    Citation: Zhang Kun, Guo Feng, Zheng Fang, Xie Xianghui. Design of a Pipeline-Coupled Instruction Loop Cache for Many-Core Processors[J]. Journal of Computer Research and Development, 2017, 54(4): 813-820. DOI: 10.7544/issn1000-1239.2017.20160116

    众核处理器的流水线紧耦合指令循环缓存设计

    Design of a Pipeline-Coupled Instruction Loop Cache for Many-Core Processors

    • 摘要: 能效比是未来高性能计算机需要解决的重要问题.众核处理器作为高性能计算机的重要实现手段,其微结构的优化设计对能效比提升尤为关键.提出了1种面向众核处理器的流水线紧耦合的指令循环缓存设计,以较小的L0指令缓存提供更加高能效的指令取指.作为体系结构研究同硬件可实现性紧密结合的1次尝试,设计始终考虑了硬件实现代价这一关键约束.为了控制L0指令缓存对流水线性能的影响,指令缓存采用了循环出口预取技术,以此保证指令缓存提供的低功耗的指令取指能够最终转化为流水线能效比的提升.在gem5模拟器上实现了对指令循环缓存的模拟.对SPEC2006的测试结果表明,在不影响流水线性能的前提下,设计的典型配置可以减少27%的指令取指功耗以及31.5%的流水线前段部件动态功耗.

       

      Abstract: Energy efficiency is a great challenge in the design of future high performance computers. Since the many-core processor becomes a key choice of future high performance computers, the optimization of its micro-architecture is very important for the improvement of energy efficiency. This paper proposes a pipeline-coupled instruction loop cache for the many-core processor. The instruction loop cache is small sized so that it will provide more energy-efficient instruction storage. As an attempt of implementation-aware micro-architecture research, the loop cache is designed under constraints of hardware costs from the beginning. In order to alleviate the impact to the pipeline performance, the loop cache adopts a prefetching technique. The instruction loop cache prefetches the exit path of the loop into the cache when a loop is detected. The prefetching mechanism guarantees that the design of the loop cache in the pipeline can lead to the improvement of the energy efficiency. The instruction loop cache is implemented in the gem5 simulator. Experiments on a set of SPEC2006 benchmarks show that a typical configuration can reduce on average 27% of instruction fetching power and 31.5% power of the pipeline front-end.

       

    /

    返回文章
    返回