ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2021, Vol. 58 ›› Issue (6): 1166-1175.doi: 10.7544/issn1000-1239.2021.20210174

所属专题: 2021计算机芯片关键技术前沿与进展专题

• 系统结构 • 上一篇    下一篇



  1. (国防科技大学计算机学院 长沙 410073) (
  • 出版日期: 2021-06-01
  • 基金资助: 

Design and Implementation of Configurable Cache Coherence Protocol for Multi-Core Processor

Chen Zhiqiang, Zhou Hongwei, Feng Quanyou, Deng Rangyu   

  1. (College of Computer Science and Technology, National University of Defense Technology, Changsha 410073)
  • Online: 2021-06-01
  • Supported by: 
    This work was supported by the Foundation of the Key Laboratory of Defense Technology for Parallel and Distributed Processing (WDZC20205500117).

摘要: 多核处理器需要维护缓存的一致性问题.基于目录的一致性协议具有较好的扩展性、较低的延迟,应用较多.分布式目录访问带宽高、目录查询速度快、物理实现灵活.分布式目录一致性协议设计复杂度高,验证困难,为了降低自主CPU研发和产业化的风险,提出了一种面向多核处理器的可配置分布式目录控制单元(configurable distribute directory unit, CDDU),通过微操作机制,实现动态配置缓存一致性协议.该设计增加了多核系统缓存一致性协议的灵活性与容错性,可以实现协议状态转换和协议流程的配置,能够解决由于一致性协议设计缺陷导致的功能故障,可以防止一致性协议设计不足引起的死锁.测试结果表明:设计方案展现了良好的可配置性、可扩展性,避免了死锁产生,代价是少量的性能损耗以及面积开销.主要思想在自主飞腾64核处理器中进行了实现,为确保处理器的协议正确性发挥了重要作用,同时在该芯片的多路扩展实现过程中提高了协议的鲁棒性,消除了潜在的死锁.

关键词: 多核处理器, 一致性协议, 可配置, 容错, 死锁

Abstract: In multi-core system, it is necessary to maintain the consistency of cache. Common cache coherence protocols can be divided into snoop-based protocol and directory-based protocol. Directory-based protocol has better scalability, lower latency and can be applied to more applications. According to the size of the directory, it can be divided into centralized directory and distributed directory. Distributed directory takes up less space and less time to inquiry. However, it’s hard to design and verify cache coherence based on distributed directory. To reduce the risk in designing CPU, a configurable distribute directory unit (CDDU) is proposed. It increases the flexibility and fault tolerance of the multi-core system by the way of changing state transformation and protocol flow. The special design can protect system from design defects that may lead to severe error, and it shows good performance in dealing with deadlock problems caused by cache coherence. It provides considerable fault-tolerance that can give the designer more freedom and opportunity. The simulation result indicates that it provides considerable scalability and prevents the occurrence of potential deadlock at the cost of subtle performance loss and area expense. The methodology mentioned in this paper has been used in the design of 64-core FT processor,which ensures the correctness of cache coherence protocol without totally modifying the initial design.Moreover, it improves the robustness of the protocol and eliminates the potential deadlock with a subtle impact on system performance.

Key words: multi-core processor, coherence protocol, configurable, fault-tolerance, deadlock