Traditional single-stage switch architectures cannot scale up well, so multi-stage architectures are widely considered in large scale switching fabric designs. Topology and packet routing style of multi-stage switching fabrics influence the performance heavily. Based on the comparison of several popular k-ary n-cube structures used in MPP systems it is argued that the 3D Torus network is most suitable for implementing large switching fabrics. Then a novel routing algorithm DMR is proposed. It can achieve high throughput and high availability by balancing traffic loads on multiple paths while at the same time it can maintain packets order in one flow. The performance of DMR routing algorithm is studied using a simulation approach and is compared with two other routing algorithms, the e-cube routing and the random routing. The results show that the performance of the DMR algorithm is almost the same as that of the random routing and much better than the e-cube routing. At the same time the DMR algorithm can maintain the packets order in one flow while the random routing cannot.