高级检索
    肖俊华, 冯子军, 章隆兵. 片上多处理器中延迟和容量权衡的cache结构[J]. 计算机研究与发展, 2009, 46(1): 167-175.
    引用本文: 肖俊华, 冯子军, 章隆兵. 片上多处理器中延迟和容量权衡的cache结构[J]. 计算机研究与发展, 2009, 46(1): 167-175.
    Xiao Junhua, Feng Zijun, Zhang Longbing. The Tradeoff Cache Between Latency and Capacity in Chip Multiprocessors[J]. Journal of Computer Research and Development, 2009, 46(1): 167-175.
    Citation: Xiao Junhua, Feng Zijun, Zhang Longbing. The Tradeoff Cache Between Latency and Capacity in Chip Multiprocessors[J]. Journal of Computer Research and Development, 2009, 46(1): 167-175.

    片上多处理器中延迟和容量权衡的cache结构

    The Tradeoff Cache Between Latency and Capacity in Chip Multiprocessors

    • 摘要: 片上多处理器中二级cache的设计面临着延迟和容量不能同时满足的矛盾,私有结构有较小的命中延迟但是减少了cache的有效容量,共享结构能增加cache的有效容量但是有较长的命中延迟.提出了一种适用于CMP的cache结构——延迟和容量权衡的cache结构(TCLC).该结构是一种混合私有结构和共享结构的设计,核心思想是动态识别cache块的共享类型,根据不同共享类型分别对其进行优化,对私有cache块采用迁移的优化策略,对共享只读cache块采用复制的优化策略,对共享读写cache块采用中心放置的优化策略,以期达到访问延迟接近私有结构,有效容量接近共享结构的目的,从而缓解线延迟的影响,减少平均内存访问延迟.全系统模拟的实验结果表明,采用TCLC结构,相对于私有结构性能平均提高13.7%,相对于共享结构性能平均提高12%.

       

      Abstract: Chip multiprocessors (CMP) have become the main stream microprocessor architecture. In CMP, the cache, especially the last level cache, is the critical part of its performance and becomes a focus of current research activities. CMP cache faces the conflicting requirements of satisfying both latency and capacity, and has to trade off between techniques that reduce off-chip and cross-chip misses. The private cache design minimizes the cache access latency but reduces the total effective cache capacity. The shared cache design maximizes the effective cache capacity but incurs long hit latency. In this paper, a CMP cache design (tradeoff cache between latency and capacity,TCLC) is proposed. TCLC is a private and shared hybrid design. TCLC can dynamically identify the cache blocks shared type and optimize them respectively. The private type is optimized through migration policy, the shared read-only type is optimized through replication policy, and the shared read-write type is optimized through center placement policy. TCLC tries to make cache access latency close to private design, and effective cache capacity close to shared design, which can mitigate the impact of the wire delay and reduce the average memory access latency. The experiment results indicate that this proposal performs 13.7% better than a private cache and 12% better than a shared cache.

       

    /

    返回文章
    返回