ISSN 1000-1239 CN 11-1777/TP

• 系统结构 •

一种基于RDMA多播机制的分布式持久性内存文件系统

1. 1(上海交通大学计算机科学与工程系 上海 200240);2(清华大学计算机科学与技术系 北京 100084);3(中兴通讯股份有限公司 南京 210012) (chenmaotang@sjtu.edu.cn)
• 出版日期: 2021-02-01
• 基金资助:
国家重点研发计划项目(2018YFB1003302)；上海交通大学-华为联合实验室项目(FA2018091021-202004)

A Distributed Persistent Memory File System Based on RDMA Multicast

Chen Maotang1, Zheng Sheng’an2, You Litong1, Wang Jingyu1, Yan Tian1, Tu Yaofeng3, Han Yinjun3, Huang Linpeng1

1. 1(Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240);2(Department of Computer Science and Technology, Tsinghua University, Beijing 100084);3(ZTE Corporation, Nanjing 210012)
• Online: 2021-02-01
• Supported by:
This work was supported by the National Key Research and Development Program of China (2018YFB1003302) and the SJTU-Huawei Innovation Research Lab Project (FA2018091021-202004).

Abstract: The development of persistent memory and remote direct memory access(RDMA) provides new opportunities for designing efficient distributed systems. However, the existing RDMA-based distributed systems are far from fully exploiting RDMA multicast capabilities, which makes them difficult to solve the problem of multi-copy file data transmission in one-to-many transmission, degrading system performance. In this paper, a distributed persistent memory and RDMA multicast transmission based file system(MTFS) is proposed. It efficiently transmits data to different data nodes by the low-latency multicast transmission mechanism, which makes full use of the RDMA multicast capability, hence avoiding high latency due to multi-copy file data transmission operations. To improve the flexibility of transmission operations, a multi-mode multicast remote procedure call(RPC) mechanism is proposed, which enables the adaptive recognition of RPC requests, and moves transmission operations out of the critical path to further improve transmission efficiency. MTFS also provides a lightweight consistency guarantee mechanism. By designing a crash recovery mechanism, a data verification module and a retransmission scheme, MTFS is able to quickly recover from a crash, and achieves file system reliability and data consistency by error detection and data correction. Experimental results show that MTFS has greatly increased the throughput by 10.2-219 times compared with GlusterFS. MTFS outperforms NOVA by 10.7% on the Redis workload, and achieves good scalability in multi-thread workloads.