ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2021, Vol. 58 ›› Issue (2): 384-396.doi: 10.7544/issn1000-1239.2021.20200369

所属专题: 2021大数据时代的存储系统与智能存储技术专题

• 系统结构 • 上一篇    下一篇

一种基于RDMA多播机制的分布式持久性内存文件系统

陈茂棠1,郑圣安2,游理通1,王晶钰1,闫田1,屠要峰3,韩银俊3,黄林鹏1   

  1. 1(上海交通大学计算机科学与工程系 上海 200240);2(清华大学计算机科学与技术系 北京 100084);3(中兴通讯股份有限公司 南京 210012) (chenmaotang@sjtu.edu.cn)
  • 出版日期: 2021-02-01
  • 基金资助: 
    国家重点研发计划项目(2018YFB1003302);上海交通大学-华为联合实验室项目(FA2018091021-202004)

A Distributed Persistent Memory File System Based on RDMA Multicast

Chen Maotang1, Zheng Sheng’an2, You Litong1, Wang Jingyu1, Yan Tian1, Tu Yaofeng3, Han Yinjun3, Huang Linpeng1   

  1. 1(Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240);2(Department of Computer Science and Technology, Tsinghua University, Beijing 100084);3(ZTE Corporation, Nanjing 210012)
  • Online: 2021-02-01
  • Supported by: 
    This work was supported by the National Key Research and Development Program of China (2018YFB1003302) and the SJTU-Huawei Innovation Research Lab Project (FA2018091021-202004).

摘要: 持久性内存技术与远程直接内存访问(remote direct memory access, RDMA)技术的发展,为高效分布式系统的设计提供了新的思路.然而,现有的基于RDMA的分布式系统没有充分利用RDMA的多播能力,难以解决1对多传输场景下的多拷贝文件数据传输问题,严重影响了系统性能.针对此问题,提出一种基于RDMA多播机制的分布式持久性内存文件系统(RDMA multicast transmission based distributed persistent memory file system, MTFS),通过低延迟多播通信机制充分利用RDMA多播能力,将数据高效传输到多个数据节点,从而避免了多拷贝传输操作带来的高延迟.为提升传输操作灵活性,MTFS设计了多模式多播远程过程调用(remote procedure call, RPC)机制,实现了RPC请求自适应识别,并通过优化返回机制将部分传输操作移出关键路径,进一步提升传输效率.同时MTFS提供了轻量级一致性保障机制,通过设计故障恢复功能、数据校验系统、重传策略与窗口机制,当节点出现崩溃时进行快速恢复,并在传输出现错误时实现数据精准检测与纠正,保证了数据的可靠性和一致性.实验证明,MTFS在各测试集上相比现有系统GlusterFS吞吐量提升了10.2~219倍.在Redis数据库的工作负载下,MTFS相比于NOVA取得了最高10.7%的性能提升,并在多线程测试中取得了良好的可扩展性.

关键词: 持久性内存, 远程直接内存访问, 多播, 分布式文件系统, 远程过程调用

Abstract: The development of persistent memory and remote direct memory access(RDMA) provides new opportunities for designing efficient distributed systems. However, the existing RDMA-based distributed systems are far from fully exploiting RDMA multicast capabilities, which makes them difficult to solve the problem of multi-copy file data transmission in one-to-many transmission, degrading system performance. In this paper, a distributed persistent memory and RDMA multicast transmission based file system(MTFS) is proposed. It efficiently transmits data to different data nodes by the low-latency multicast transmission mechanism, which makes full use of the RDMA multicast capability, hence avoiding high latency due to multi-copy file data transmission operations. To improve the flexibility of transmission operations, a multi-mode multicast remote procedure call(RPC) mechanism is proposed, which enables the adaptive recognition of RPC requests, and moves transmission operations out of the critical path to further improve transmission efficiency. MTFS also provides a lightweight consistency guarantee mechanism. By designing a crash recovery mechanism, a data verification module and a retransmission scheme, MTFS is able to quickly recover from a crash, and achieves file system reliability and data consistency by error detection and data correction. Experimental results show that MTFS has greatly increased the throughput by 10.2-219 times compared with GlusterFS. MTFS outperforms NOVA by 10.7% on the Redis workload, and achieves good scalability in multi-thread workloads.

Key words: persistent memory, remote direct memory access, multicast, distributed file system, remote procedure call

中图分类号: