ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2020, Vol. 57 ›› Issue (3): 660-667.doi: 10.7544/issn1000-1239.2020.20190074

• 系统结构 • 上一篇    



  1. 1(江苏大学计算机科学与通信工程学院 江苏镇江 212013);2(清华大学计算机科学与技术系 北京 100084);3(中兴通讯股份有限公司 南京 210012) (
  • 出版日期: 2020-03-01
  • 基金资助: 

A Consistency Mechanism for Distributed Persistent Memory File System

Chen Bo1,2, Lu Youyou2, Cai Tao1, Chen Youmin2, Tu Yaofeng3, Shu Jiwu2   

  1. 1(School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, Jiangsu 212013);2(Department of Computer Science and Technology, Tsinghua University, Beijing 100084);3(ZTE Corporation, Nanjing 210012)
  • Online: 2020-03-01
  • Supported by: 
    This work was supported by the National Key Research and Development Program of China (2018YFB1003301), the Key Program of the National Natural Science Foundation of China (61832011, 61772300), the National Natural Science Foundation of China (61806086), and the Cooperative Project of ZTE Corporation (20182002008).

摘要: 持久性内存(persistent memory, PM)和远程直接内存访问(remote direct memory access, RDMA)具有高带宽、低延迟的硬件性能,这为设计高性能的分布式存储系统提供了新的机遇.然而,它们这些新的特性为高效的数据一致性管理引出了诸多问题:一方面,持久性内存数据一致性依赖于CPU主动执行硬件指令刷写缓存实现,而这类指令开销极高,严重影响CPU处理性能;另一方面,RDMA在服务器端CPU不参与的情况下直接读写服务器端内存,因此服务器端CPU无法主动感知数据写入事件以执行数据刷写操作,一旦系统崩溃会造成数据不一致的问题.针对以上2个问题,提出一种分布式持久性内存文件系统的一致性机制(crash consistency mechanism, CCM):首先设计实现了基于操作日志的一致性保障策略,通过将每次操作的元信息记录至日志,并持久化,以保障系统的一致性状态;其次,设计了一种客户端对服务器端的远程写一致性策略,在完成数据传输的同时使服务器端CPU主动执行数据刷写;最后实现了一种服务器端的数据异步持久化,以提高系统的处理能力.测试结果表明,基于CCM的文件系统写吞吐可达到网络裸带宽的88%.相比于现有系统Octopus,CCM性能下降控制在1%以内.

关键词: 持久性内存, 远程直接内存访问, 一致性, 操作日志, 分布式文件系统

Abstract: Persistent memory and RDMA (remote direct memory access) provide high bandwidth and low latency to storage systems, and this brings new opportunities for designing high performance distributed storage system. However, their new features raise many challenges for data consistency management. On the one hand, to consistently update data in persistent memory, one needs to actively execute hardware instructions to flush data out of the CPU cache, and such instructions can lead to extremely high overhead and seriously affect the CPU performance. On the other hand, RDMA can directly read and write remote memory without the involvements of the remote CPU. Therefore, the server CPU is unaware of the remote writing events thus fails to perform data flushing. In case of system failures, the data will be in an inconsistent state. Regarding the above two problems, this paper proposes CCM, a consistency mechanism for distributed persistent memory file system. Firstly, we design and implement a consistency strategy based on persistent operation log to maintain system consistency by writing operation information to log and persisting it. Secondly, we design a consistency strategy from client side to server side, which enables the remote CPU to actively flush data when the data transferring is completed. Lastly, we implement an asynchronous data flushing at server side to improve system performance. Our experimental results show that the write bandwidth can occupy 88% of network’s raw bandwidth. Compared with Octopus, the state-of-the-art distributed file system, CCM only shows a performance reduction of less than 1%.

Key words: persistent memory, remote direct memory access, consistency, operation log, distributed file system