高级检索

    面向数据中心网络的链路故障实时检测即服务

    Real-Time Link Fault Detection as a Service for Datacenter Network

    • 摘要: 在大规模数据中心网络中,链路故障检测是保障网络连通性,确保线上业务正常运转的重要手段.当前链路故障检测功能一般由中间盒设备来提供或被直接整合到交换设备中.随着软件定义网络和网络功能虚拟化(network function virtualization, NFV)技术的发展,各项网络功能正逐渐从专用设备中分离出来,以服务的形式部署在云端为用户提供解决方案.然而,当前链路故障检测方法面临着单次探测用时过长、网络带宽占用率过高以及服务器负载过重等严峻挑战,并不适用于构建实时性需求较高的云服务.为此,需要对已有链路故障检测工作中存在的问题进行分析,提出探测矩阵的概念,以及基于探测矩阵优化的链路故障检测方法,并设计一个链路故障检测控制器与SDN控制器协同的服务架构,以此实现云端的链路故障实时检测即服务.最后,通过仿真实验的方式验证了该实时检测方法在单次探测用时、网络带宽占用以及端点负载3方面同之前工作相比具有显著优势,且优化探测矩阵所带来的开销是可容忍的.

       

      Abstract: In large scale datacenter network, link fault detection is an important way to guarantee network connectivity and the performance of large-scale online applications. Currently, the function of link fault detection is provided by middlebox or switches. With the development of software defined networking (SDN) and network function virtualization (NFV), many network functions are decoupled from the hardware devices, while being deployed in the cloud as services. However, the existing methods of link fault detection face some challenges, such as time consuming, high usage of bandwidth, and server overload. To tackle these challenges, we first analyze the existing work on link fault detection. Then we propose the concept of probe matrix and the probe matrix optimization based link fault detection method. We also design a service framework by combining the link fault detection controller and the SDN controller. Finally, the simulation results show that the proposed method significantly outperforms the existing work in detection period, usages of bandwidth and server CPU with tolerable computational overheads for probe matrix optimization.

       

    /

    返回文章
    返回