ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2019, Vol. 56 ›› Issue (4): 719-729.doi: 10.7544/issn1000-1239.2019.20170898

• 系统结构 • 上一篇    下一篇



  1. (国防科技大学计算机学院 长沙 410073) (
  • 出版日期: 2019-04-01
  • 基金资助: 

A Simple and Efficient Cache Coherence Protocol Based on Self-Updating

He Ximing, Ma Sheng, Huang Libo, Chen Wei, Wang Zhiying   

  1. (College of Computer, National University of Defense Technology, Changsha 410073)
  • Online: 2019-04-01

摘要: 随着片上多处理器系统核数的增加,当前一致性协议上存在的许多问题使共享存储系统复杂而低效.目前一些一致性协议极其复杂,例如MESI(modified exclusive shared or invalid)协议,存在众多的中间状态和竞争.并且这些协议还会导致额外失效通信,以及大量记录共享信息的目录存储开销(目录协议)或广播消息的网络开销(监听协议).对数据无竞争的程序实现了一种简单高效一致性协议VISU(valid/invalid states based on self-updating),这种协议基于自更新操作(self-updating)、只包含2个稳定状态(valid/invalid).所设计的两状态VISU协议消除了目录和间接事务.首先基于并行编程的数据无竞争(data race free, DRF)模型,采用在同步点进行自更新共享数据来保证正确性.其次利用动态识别私有和共享数据的技术,提出了对私有数据进行写回、对共享数据进行写直达的方案.对于私有数据,简单的写回策略能够简化不必要的片上通信.在L1 cache中,对于共享数据的写直达方式能确保LLC(last level cache)中数据最新从而消除了几乎所有的一致性状态.实现的VISU协议开销低、不需要目录、没有间接传输和众多的一致性状态,且更加容易验证,同时获得了与MESI目录协议几乎相当甚至更优的性能.

关键词: 共享存储, 片上多处理器, cache一致性协议, 自更新, VISU协议

Abstract: As the number of cores in a chip multiprocessor increases, cache coherence protocols have become a performance bottleneck of the share-memory system. The overhead and complexity of current cache coherence protocols seriously restrict the development of the share-memory system. Specifically, directory protocols need high storage overhead to keep track of sharer list and snooping protocols consume significant network bandwidth to broadcast messages. Some coherence protocols, such as MESI (modified exclusive shared or invalid) protocol, are extremely complex and have numerous transient states and data race. This paper implements a simple and efficient cache coherence protocol named VISU (valid/invalid states based on self-updating) for data-race-free programs. VISU is based on a self-updating mechanism and only includes two stable states (valid and invalid). Furthermore, the VISU protocol eliminates the directory and indirection transactions and reduces significant overheads. First, we propose self-updating shared blocks at synchronization points for correction with the data-race-free guarantee of parallel programming. Second, taking advantage of techniques that dynamically classify private data (only accessed by one processor) and shared data, we propose write-back for private data and write-through for shared data. For private data, a simple write-back policy can reduce the unnecessary on-chip network traffic. In L1 cache, a write-through policy for shared data which can keep the newest shared data in LLC, would obviate almost all coherence states. Our approach implements a truly cost-less two-state coherence protocol. The VISU protocol does not require directory or indirect transfer and is easier to verify while at the same time obtains similar even better performance of MESI directory protocol.

Key words: shared memory, chip multiprocessors, cache coherence protocol, self-updating, VISU protocol