Abstract:
In recent years, large models have made unprecedented progresses in variety of domains, such as natural language processing and machine vision. Mixture of Experts (MoE) has emerged as one of the most popular architectures for large models due to its distinct advantages in model parameter scalability, computational cost control and complex task processing. However, with the continuous increase of the parameter scale, the execution efficiency and scalability of the system are becoming increasingly challenging to meet the demand, and must be addressed urgently. The system optimization approach is an effective solution to solve this problem, which has become a hot research area. In light of this, we review the present research status of MoE system optimization techniques in the era of large model in this paper. To begin, we describe the present development state of work for MoE large model, and analyze the performance bottlenecks it faced on the system side. Then, we comprehensively sort out and deeply analyze the most recent research progress from four system core dimensions, ranging from memory occupation, communication latency, computational efficiency to parallel scaling, and compare and elaborate on the key technologies, application scenarios and optimization directions; Finally, we summarize the current research state of MoE system optimization and outline some future research directions as well.