ISSN 1000-1239 CN 11-1777/TP

• 论文 • 上一篇    下一篇

一种基于容量复用的异构CMP Cache

高 翔1,2 章隆兵1 胡伟武1   

  1. 1(中国科学院计算技术研究所系统结构重点实验室 北京 100190) 2(中国科学技术大学计算机科学与技术系 合肥 230027) (
  • 出版日期: 2008-05-15

A CapacityShared Heterogeneous CMP Cache

Gao Xiang1,2, Zhang Longbing 1, and Hu Weiwu1   

  1. 1(Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190) 2(Department of Computer Science and Technology, University of Science and Technology of China, Hefei 230027)
  • Online: 2008-05-15

摘要: 多核环境下的Cache设计技术受到线延时和应用等多方面因素影响,私有和共享方案都存在各自的不足.提出了一种异构的CMP Cache结构,采用两类具有不同Cache层次的结点组成多核芯片,设计了基于间接索引的Cache容量复用等技术,提供了容量有效且访问迅速的片上存储层次.在全系统环境下对SPEC CPU2000, SPLASH2等程序的评测结果表明,异构CMP Cache结构能够适应各类应用的需要,对单进程和多线程应用平均性能提高分别可达16%和9%.异构CMP Cache同时具有硬件设计简单的特点,具有较好的工程可实现性,其设计思想将应用在未来的龙芯多核处理器设计中.

关键词: 片上多核处理器, 存储层次, 异构, 容量复用, 高速缓存一致性

Abstract: The characteristics of advanced integrated circuit technologies require architects to look for new ways to utilize large numbers of gates and mitigate the effects of high interconnect delays. Chip multiprocessors (CMPs) exploit increasing transistor counts by placing multiple processors on a single die. As the chip multiprocessors (CMPs) have become the trend of high performance microprocessors, the target workloads become more and more diversified. Due to the wire delay problem and diversity of applications, neither private nor shared caches can provide both large capacity and fast access in CMPs. A novel CMP cache design, the heterogeneous CMP cache (HCC) is presented, in which chips are constructed by tiles of two different categories. L2 caches of private tiles provide lowest hit latency and L2 cache of shared tiles increases the effective cache capacity for shared data. Incorporating indirectindex cache technology to share capacity between different hierarchies, HCC provide a both capacityeffective and access fast on chip memory subsystem. Detailed fullsystem simulations are used to analyze the HCC performance for various programs, including SPEC CPU2000, SPLASH2 and commercial workloads. The result shows that HCC improves performance by 16% for singlethreaded benchmarks and 9% for multithread benchmarks. HCC is easy to implement and the design ideas will be used in the future multicore processors of Godson series.

Key words: chip multiprocessor, memory hierarchy, heterogeneous, capacitysharing, cachecoherence