高级检索

    高性能智算中心网络:现状、挑战与趋势

    High Performance Artificial Intelligence Data Center: Studies, Challenges and Trends

    • 摘要: 智算中心是数据中心发展到当前的一种新形态,与传统以提供虚拟计算资源为主要特征的云数据中心相比,智算中心主要为以人工智能为代表的新型高算力需求业务提供强大算力。本文对智算中心网络业务需求、拓扑结构、通信模式、流量特征等进行了深入阐述与分析,针对其独有特征带来的新问题与挑战逐一进行剖析。然后,按照网络分层模型梳理了以集合通信、传输控制、负载均衡、链路控制和故障管理为核心的高性能智算中心网络关键技术框架,并对这些关键技术现状和优劣势进行了详细的归纳总结。最后,在对当前技术发展分析的基础上,提出了通算智算一体化、智算中心技术专用化、智能算力多租户化等未来发展趋势。

       

      Abstract: Artificial intelligence data centers (AIDCs) are the new form of data center nowadays. Compared with traditional cloud data centers that mainly provide virtualization of computing resources, AIDCs mainly deliver services for new types of high-performance computing businesses represented by artificial intelligence. This paper makes an in-depth explanation and analysis of the AIDC network business requirements, topology, communication patterns, and traffic characteristics. This paper analyzes each of the new issues and challenges brought about by those unique characteristics. Then, according to the network layering model, this paper sorts out the key technology framework of high performance AIDC network including collective communication, transmission control, load balancing, data link flow control and incident management with a detailed summary of existing studies. Finally, based on the analysis of the current technology development, this paper proposes the future development trends such as the integration of AIDCs and general computing data centers, the specialization of AIDCs technology, and the multi-tenancy of computing power provided by AIDCs.

       

    /

    返回文章
    返回