高级检索

    面向智算中心的细粒度流量控制技术综述

    A Survey on Fine-Grained Flow Control for Artificial Intelligence Data Centers

    • 摘要: 随着人工智能生成内容技术的迅猛发展和大语言模型广泛应用,智算中心的网络面临着严峻挑战,流量控制是优化网络性能的重要方法. 综述细粒度流量控制领域的关键问题及解决方案,重点梳理了自适应负载均衡机制、主动式拥塞控制机制和乱序包重排机制3个方面的研究进展. 自适应负载均衡能够有效规避网络内部的拥塞,主动式拥塞控制用于预防自适应负载均衡无法避免的最后一跳拥塞问题,而乱序包重排解决了自适应负载均衡过程中可能引发的数据包乱序问题,三者协同作用确保了网络在高负载、高延迟等复杂环境下的稳定性与高效性. 在此基础上,阐述了当前主流智算中心采用的关键技术方案,以及目前支持细粒度流量控制的网络设备,最后,总结了该领域尚未解决的关键问题及可能的解决方案,并未来发展趋势进行了展望.

       

      Abstract: With the rapid development of Artificial Intelligence-Generated Content (AIGC) technologies and the widespread deployment of Large Language Models (LLMs), networks in artificial intelligence data centers are encountering significant challenges. Flow control is a crucial approach for optimizing network performance, providing extremely high bandwidth and ultra-low latency. This paper reviews key issues and solutions in the field of fine-grained flow control, focusing on advances in three key areas: adaptive load balancing mechanisms that dynamically distribute traffic making full use of network resources to avoid congestion, proactive congestion control strategies designed to predict and alleviate potential congestion, and out-of-order packet reordering techniques that ensure data integrity despite non-sequential arrivals. We summarize the mainstream implementation solutions and provide a detailed comparison. Building on this, we discuss the key technical solutions currently adopted by leading artificial intelligence data centers, along with the network devices that support fine-grained flow control. We also identify unresolved challenges in this field, propose potential solutions, and explore future development trends, especially as AI technologies continue to evolve and demand more sophisticated network infrastructures. This review offers valuable insights for researchers and practitioners working to optimize network performance in AI-driven applications and highlights important directions for future research in fine-grained flow control.

       

    /

    返回文章
    返回