ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2021, Vol. 58 ›› Issue (2): 384-396.doi: 10.7544/issn1000-1239.2021.20200369

Special Issue: 2021大数据时代的存储系统与智能存储技术专题

Previous Articles     Next Articles

A Distributed Persistent Memory File System Based on RDMA Multicast

Chen Maotang1, Zheng Sheng’an2, You Litong1, Wang Jingyu1, Yan Tian1, Tu Yaofeng3, Han Yinjun3, Huang Linpeng1   

  1. 1(Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240);2(Department of Computer Science and Technology, Tsinghua University, Beijing 100084);3(ZTE Corporation, Nanjing 210012)
  • Online:2021-02-01
  • Supported by: 
    This work was supported by the National Key Research and Development Program of China (2018YFB1003302) and the SJTU-Huawei Innovation Research Lab Project (FA2018091021-202004).

Abstract: The development of persistent memory and remote direct memory access(RDMA) provides new opportunities for designing efficient distributed systems. However, the existing RDMA-based distributed systems are far from fully exploiting RDMA multicast capabilities, which makes them difficult to solve the problem of multi-copy file data transmission in one-to-many transmission, degrading system performance. In this paper, a distributed persistent memory and RDMA multicast transmission based file system(MTFS) is proposed. It efficiently transmits data to different data nodes by the low-latency multicast transmission mechanism, which makes full use of the RDMA multicast capability, hence avoiding high latency due to multi-copy file data transmission operations. To improve the flexibility of transmission operations, a multi-mode multicast remote procedure call(RPC) mechanism is proposed, which enables the adaptive recognition of RPC requests, and moves transmission operations out of the critical path to further improve transmission efficiency. MTFS also provides a lightweight consistency guarantee mechanism. By designing a crash recovery mechanism, a data verification module and a retransmission scheme, MTFS is able to quickly recover from a crash, and achieves file system reliability and data consistency by error detection and data correction. Experimental results show that MTFS has greatly increased the throughput by 10.2-219 times compared with GlusterFS. MTFS outperforms NOVA by 10.7% on the Redis workload, and achieves good scalability in multi-thread workloads.

Key words: persistent memory, remote direct memory access, multicast, distributed file system, remote procedure call

CLC Number: