ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (6): 1242-1253.doi: 10.7544/issn1000-1239.2015.20150140

所属专题: 2015面向应用领域需求的体系结构

• 系统结构 • 上一篇    下一篇

多核处理器目录缓存结构设计

王恩东,唐士斌,陈继承,王洪伟,倪璠,赵雅倩   

  1. (高效能服务器和存储技术国家重点实验室(浪潮集团有限公司) 北京 100085) (wangend@inspur.com)
  • 出版日期: 2015-06-01
  • 基金资助: 
    基金项目:国家“八六三”高技术研究发展计划基金项目(2013AA011701)

Directory Cache Design for Multi-Core Processor

Wang Endong, Tang Shibin, Chen Jicheng, Wang Hongwei, Ni Fan, Zhao Yaqian   

  1. (State Key Laboratory of High-End Server & Storage Technology (Inspur Group Company Limited), Beijing 100085)
  • Online: 2015-06-01

摘要: 随着物联网、云计算与网络舆情分析等应用的快速发展,大数据处理的应用已经成为数据中心的核心负载.数据中心服务器普遍采用多核处理器,而目录缓存作为多核处理器结构中维护缓存一致性的关键部件,对其结构研究(如稀疏目录)更多地关注于目录缓存的容量与可扩展性,更适合处理高性能计算等计算密集型应用.然而,当多核处理器执行延迟敏感的大数据应用程序时,目录缓存的高访存延迟严重制约了数据中心的服务质量.针对该问题,新型主从目录缓存结构优化了数据访问过程中的一致性协议通路,其中主目录区分共享与私有数据,管理私有数据的访存操作,降低私有数据的访存延迟,提高了从目录的容量利用率;从目录维护共享数据的缓存一致性,采用有限位标签结构,提高了从目录的存储效率.实验在Simics+GEMS模拟平台上对大数据程序测试集Cloudsuite-v1.0进行评估.结果表明在以大数据应用程序为主的运行环境下,与2倍容量的稀疏目录相比,主从目录缓存结构降低了24.39%的硬件开销,降低了28.45%的缓存缺失延时,提升了3.5%的处理器IPC;与缓存内目录相比,主从目录结构虽然损失了5.14%的缓存缺失延时与1.1%的处理器IPC,但是降低了42.59%的硬件开销.

关键词: 大数据, 多核处理器, 缓存一致性, 目录缓存, 稀疏目录

Abstract: With the development of Internet of things, cloud computing and Internet public opinion analysis, big data applications are growing into the critical workloads in current data center. Directory cache is used to guarantee cache coherence in chip multi-processor, which is massively deployed in data centers. Previous researches proposed all kinds of innovation to improve the utilization of directory cache capacity and scalability, making it more suitable for high-performance computing. Big data workloads are timing sensitive, which is not satisfied by previous works. To meet the requirement of big data workloads, master-salve directory is a novel directory cache design, which can optimize the path of memory instruction. In the novel directory cache design, master directory picks up private data accesses and provides services for them to reduce miss-latency, and slave directory provides cache coherence for shared memory space to improve the utilization of cache capacity and the scalability of chip multi-processor. Our experiment benchmark is CloudSuite-v1.0, running on Simics+GEMS simulator. Compared with sparse directory with 2×capacity, the experimental results show that master-slave directory can reduce hardware overhead by 24.39%, and reduce miss-latency by 28.45%, and improve IPC by 3.5%. Compared with in-cache directory, the results show that master-slave directory sacrifices 5.14% miss-latency and 1.1% IPC, but reduces hardware overhead by 42.59%.

Key words: big data, multi-core processor, cache coherence, directory cache, sparse directory

中图分类号: