Abstract:
Congestion control is one of the key technologies for realizing high-performance data center networks, and it affects important network performance indicators such as throughput, latency, and packet loss rate. Over the past 20 years, with the continuous expansion of the scale of data centers and the increasing requirements of upper-layer applications for network performance, the deployment of Remote Direct Memory Access (RDMA) technology based on lossless underlying networks has received widespread attention within the industry. However, the Priority-Based Flow Control (PFC) mechanism, while maintaining a lossless network, will introduce problems such as head-of-line blocking, leading to a decline in network performance or even network paralysis. As a crucial auxiliary means for achieving a lossless network, how to design a practical RDMA congestion control mechanism has become a hot issue. By dividing the congestion control process into congestion awareness and congestion regulation, this paper comprehensively reviews the research achievements in this field: Firstly, from the perspectives of explicit feedback and latency, different representative algorithms for congestion awareness are elaborated and summarized in detail; Secondly, representative algorithms for congestion regulation are introduced in detail from the dimensions of rate and window, and their advantages and disadvantages are summarized; Some optimization work of algorithms and congestion control algorithms based on reinforcement learning methods are supplemented; Finally, the existing challenges in this field are summarized and discussed.