Abstract:
With the rapid development of Artificial Intelligence-Generated Content (AIGC) technologies and the widespread deployment of Large Language Models (LLMs), networks in artificial intelligence data centers are encountering significant challenges. Flow control is a crucial approach for optimizing network performance, providing extremely high bandwidth and ultra-low latency. This paper reviews key issues and solutions in the field of fine-grained flow control, focusing on advances in three key areas: adaptive load balancing mechanisms that dynamically distribute traffic making full use of network resources to avoid congestion, proactive congestion control strategies designed to predict and alleviate potential congestion, and out-of-order packet reordering techniques that ensure data integrity despite non-sequential arrivals. We summarize the mainstream implementation solutions and provide a detailed comparison. Building on this, we discuss the key technical solutions currently adopted by leading artificial intelligence data centers, along with the network devices that support fine-grained flow control. We also identify unresolved challenges in this field, propose potential solutions, and explore future development trends, especially as AI technologies continue to evolve and demand more sophisticated network infrastructures. This review offers valuable insights for researchers and practitioners working to optimize network performance in AI-driven applications and highlights important directions for future research in fine-grained flow control.