Abstract:
RDMA (remote direct memory access) is being widely used in big data area, which allows local host to access the remote memory without the involvements of remote CPUs, and provides extremely high bandwidth, high throughput and low latency, thus helping to boost the performance of distributed storage systems dramatically. As a whole, the RDMA-enabled distributed storage systems bring new opportunity to the big data processing. In this paper, we firstly point out that simply replacing the network module in distributed systems cannot fully exploit the advantages of RDMA in both semantics and efficiency, and revolutions of storage system design are urgently needed. Then, two key aspects of efficiently using RDMA are illustrated: One is the efficient management of hardware resources, including the careful utilization of NIC an CPU cache, parallel acceleration of multicore CPUs and memory management, and the other is the reformation of the software by closely coupling the software design and RDMA semantics, which uses the new features of RDMA to redesign the data placement schemes, data indexing and distributed protocols. Relative research works of distributed file systems, distributed key-value stores, and distributed transactional systems are introduced to illustrate the above two aspects. Summarizes of the paper, and suggestions for future research are also given at the end of this paper.