高级检索

    胖树拓扑中高效实用的定制多播路由算法

    Practical and Efficient Customized Multicast Routing in Fat-Tree Topology

    • 摘要: 在高性能计算领域,多播路由算法对硬件集合操作的性能具有至关重要的影响.随着系统规模的不断扩大,多播组的个数急剧增加,可能会超过硬件支持的多播表条目数,而现有的多播路由算法要么没有给出解决方案,要么存在时间开销大、多播路由经常变化等问题.为此,首先对胖树中的无冲突多播生成树数量进行了量化研究,并以此为基础提出了一种适用于胖树的高效实用的定制多播路由算法(customized multicast routing for limited multicast forwarding table size, C-MR4LMS).C-MR4LMS在构建多播树时,根据多播组的MGID(multicast global identification)静态地将多播组映射到1棵生成树中,从而快速完成多播树的构建;而在合并多播树时,仅需合并使用同一生成树的多播组,且不会改变被合并多播组的路由.然后提出了2种减少多播树冲突的方法:一是分层的MGID分配策略,以避免出现同一终端节点使用同一颜色加入多个多播组的情况;二是相互无干扰的作业节点分配策略,保证2个作业的多播组互不干扰.最后,在ibsim模拟器及神威E级原型机上对C-MR4LMS进行了测试,该多播路由算法计算多播路由的时间比现有的多播路由算法有了显著下降,最大下降了94%.

       

      Abstract: In high performance computing, multicast routing algorithms have important impact on the performance of collective operations supported by hardware. As the supercomputers become larger and larger, the number of MCGs (multicast groups) increases rapidly, and it may exceed the number of MFT (multicast forwarding table) entries supported by hardware. However, the existing multicast routing algorithms either do not provide solutions to this problem, or have problems such as heavy time overhead and variability for multicast routing. To address this problem, we first quantitatively study the number of conflict-free multicast spanning trees in fat-tree, and propose a practical and efficient customized multicast routing algorithm called C-MR4LMS (customized multicast routing for limited multicast forwarding table size). When constructing multicast tree, the MCG is statically mapped to a spanning tree according to the MGID (multicast global identification); thus we can construct the multicast tree quickly. When merging multicast trees, only the ones using the same spanning tree need to be merged, and the route of the merged MCGs will not be changed. Then we propose two methods to reduce the confliction of multicast routes. One is the layered MGID allocation strategy to avoid the situation that the terminal node uses the same color to join multiple MCGs. The other one is the interference-free node allocation strategy for different jobs to ensure that the MCGs of two jobs do not interfere with each other. Finally, we test the performance of C-MR4LMS in ibsim simulator and Sunway exascale prototype system and obtain the satisfying results. In particular, the running time of the algorithm compared with the existing algorithms is significantly reduced, and we get up to 94% reduction in runtime.

       

    /

    返回文章
    返回