MPI Alltoall通信在多核机群中的优化

李  强; 孙凝晖; 霍志刚; 马  捷

MPI Alltoall通信在多核机群中的优化

Optimizing MPI Alltoall Communications in Multicore Clusters

摘要

摘要: MPI Alltoall是一种重要的集合通信.在多核机群中，一个节点内的多个进程同时参与Alltoall通信.一方面，这些进程可以利用共享内存优化通信性能.虽然当前基于首进程的方法利用共享内存提高了Alltoall小消息通信的性能，但由于采用固定数目的首进程，这些方法不能使所有不同长度的小消息都获得最优性能.另一方面，这些进程需要竞争节点内有限的网络资源.在Alltoall大消息的通信中存在许多个同步消息.然而竞争导致同步消息的延迟增大了数十倍，同步开销不可忽略.针对这些问题，提出了两种不同的优化方法.对于Alltoall小消息通信，PLP方法根据小消息的长度采用不同数目的首进程；对于Alltoall大消息通信，LSS方法将同步消息的总数从3N减少到2N.相关实验结果验证了这两种方法.对于小消息，PLP方法总是可以获得最优的性能.对于大消息，LSS方法获得的性能提升比例几乎为常数，并且与系统的规模无关；其中32KB和64KB消息的性能提高了25%.

Abstract: MPI Alltoall is an important collective operation. In multicore clusters, many processes run in a node. On the one hand, shared memory can be adopted to optimize Alltoall communications of small messages by leader-based schemes. However, as these schemes adopt a fixed number of leader processes, the optimal performance can't be obtained for all small messages. On the other hand, processes within a node contend for the same network resource. In Alltoall communications of large messages, many synchronization messages are used. Nevertheless, the contention makes their latency increase many times and the synchronization overhead can't be ingored. To solve these problems, two optimizations are presented. For small messages, the PLP method adopts changeable numbers of leader processes. For large messages, the LSS method reduces the number of synchronization messages from 3N to 2N. The evaluations prove two methods. For small messages, the PLP method always obtains optimal performance. For large messages, the LSS method brings almost constant improvement percentage. The performance is improved by 25% for 32KB and 64KB messages.

HTML全文

参考文献(0)

施引文献

资源附件(0)