Abstract:
Distributed lock is a crucial component in distributed storage systems. The performance of the lock protocol significantly influences the overall performance of the entire system. Remote direct memory access (RDMA) is an emerging data center networking technology that supports one-sided communication verbs and offers low CPU overhead, low latency, and high throughput. It presents new opportunities for designing high-performance distributed lock protocols. However, designing such protocols atop the RDMA network faces significant challenges, for example scalability and fairness. This paper addresses these challenges by proposing FeLock, an RDMA-based distributed lock protocol. FeLock achieves high performance while tackling the above challenges by leveraging different RDMA communication verbs, which enables the clients to communicate directly with both the server for lock acquisition and release and other clients to hand over lock ownership. Specifically, first, to improve performance, FeLock introduces a per-node lock management mechanism to reduce network roundtrips on the critical paths of the lock protocol. Second, to achieve scalability, FeLock incorporates a round-robin handover mechanism, in which nodes are logically organized into a ring, and clients hand over lock ownership sequentially according to their positions within the ring. Third, to ensure fairness and prevent clients from starvation, FeLock employs a node credit mechanism that limits the number of consecutive lock acquisitions by any single node, thereby preventing clients on any node from being indefinitely blocked by others. Experiment results demonstrate that FeLock achieves performance comparable to or exceeding that of existing one-sided RDMA lock protocols, such as DSLR, while exhibiting better fairness and scalability. With 3 to 120 clients, FeLock achieves throughput 1.01 to 7.51 times of DSLR, with its fairness improved by up to 2.24 times.