Abstract:
With the rapid growth of data volume and Web services, the cluster size is getting bigger and bigger in datacenters. The probability of service interruption grows dramatically due to machine and network failures. How to achieve a fault-tolerant distributed system becomes very important. State machine replication is one of the most general methods for building a fault-tolerant system, and distributed consensus problem is one of the most basic and core issues in replicated state machine systems. Paxos and a series of Paxos-like consensus algorithms can effectively solve this problem. In recent years, more and more systems use consensus-related techniques to ensure their reliability and availability, and studies on distributed consensus algorithms are also emerging in an endless stream. These consensus algorithms can be divided into two categories, leader-based consensus algorithms and leaderless consensus algorithms. With the development of network technologies such as remote direct memory access(RDMA) and hardware technologies such as field-programmable gate array(FPGA), some consensus algorithms combining with new network technologies and hardware technologies have appeared, which are used to improve the performance of distributed systems. In this paper, we introduce Paxos series algorithms from the perspective of the development of distributed consensus algorithms, discuss the advantages and disadvantages of the algorithms in different scenarios, and further give a future outlook on the research and application directions.