Abstract:
Artificial intelligence data centers (AIDCs) have emerged as a new and increasingly critical form of computing infrastructure. Unlike traditional cloud data centers that primarily provide virtualization of general-purpose computing resources, AIDCs are designed to support high-performance, AI-centric workloads such as large-scale model training and inference. As AI models continue to grow and complex, AIDC networks are required to deliver unprecedented levels of bandwidth, low latency, and efficient cross-device coordination. This paper provides a comprehensive analysis of AIDC network characteristics, including service requirements, topology design, communication patterns, and traffic characteristics, and further examines the unique challenges that arise from these aspects. Building on a network layering perspective, the paper then presents a structured overview of key enabling technologies for AIDC networks. These technologies include collective communication libraries, transmission control, load balancing, data-link flow control, and fault management. Representative academic studies and industrial implementations are summarized to illustrate their strengths, limitations, and suitability for real-world AIDC deployments. Finally, the paper outlines several future development trends, such as the convergence of general-purpose and AI-oriented data centers, deeper specialization and co-design across AIDC hardware and software stacks, and the advancement of multi-tenant, shareable AI computing power. These trends are expected to play a pivotal role in shaping next-generation AI infrastructure.