ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2017, Vol. 54 ›› Issue (11): 2534-2546.doi: 10.7544/issn1000-1239.2017.20151069

Previous Articles     Next Articles

A Sliced Multi-Rail Interconnection Network for Large-Scale Clusters

Shao En1,2, Yuan Guojun1,2, Huan Zhixuan1,2, Cao Zheng1, Sun Ninghui1   

  1. 1(State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190); 2(University of Chinese Academy of Sciences, Beijing 100049)
  • Online:2017-11-01

Abstract: In large-scale clusters, the design of interconnection network is facing greater challenges. Firstly, the increasing computing capacity of a single node requires the network providing higher bandwidth and lower latency. Secondly, the increasing number of nodes requires the network to have extremely better scalability. Thirdly, the increasing scale of system leads to worse performance of collective communication, which is harmful to the performance and scalability of applications. Fourthly, the increasing number of devices requires the network to have better reliability. As the performance of computing nodes keeps increasing, interconnection network has gradually become the bottleneck of large-scale computing system. However, switch chip, the core component of interconnection network, can offer limited aggregate bandwidth because of the constraint of physical processes and packaging technologies. With the co-design of network architecture and switch micro-architecture, this paper proposes a sliced multi-rail network architecture regarding the given aggregate bandwidth. Through mathematical modeling and network simulation, we studies the performance boundaries of sliced multi-rail network. Evaluation results show that the average latency of the short message (less than 128B)can be increased by more than 10 times.

Key words: large-scale clusters, multi-rail network, bandwidth division, data center network, large-scale network simulation

CLC Number: