大模型时代的混合专家系统优化综述

史宏志; 赵健; 赵雅倩; 李茹杨; 魏辉; 胡克坤; 温东超; 金良

doi:10.7544/issn1000-1239.202440016

大模型时代的混合专家系统优化综述

Survey on System Optimization for Mixture of Experts in the Era of Large Models

摘要

摘要: 近年来，大模型推动自然语言处理、机器视觉等众多领域取得前所未有的进展. 混合专家（mixture of experts，MoE）凭借在模型参数扩展、计算成本控制和复杂任务处理等方面的独特优势成为大模型的主流架构之一. 然而，随着参数规模的持续增长，系统的执行效率和可扩展能力愈发难以满足需求，亟待解决. 系统优化方法是解决这一挑战的有效途径，日益成为研究热点. 故综述大模型时代MoE系统优化技术的研究现状，首先介绍MoE大模型的发展现状，并分析其在系统端面临的性能瓶颈；然后从内存占用、通信延迟、计算效率和并行扩展4个系统核心维度对最新的研究进展进行全面梳理和深入分析，并对其中涉及的关键技术、适用场景和待优化方向进行详细对比阐述；最后总结MoE系统优化的研究现状，并展望未来研究方向.

Abstract: In recent years, large models have made unprecedented progresses in variety of domains, such as natural language processing and machine vision. Mixture of experts (MoE) has emerged as one of the most popular architectures for large models due to its distinct advantages in model parameter scalability, computational cost control and complex task processing. However, with the continuous increase of the parameter scale, the execution efficiency and scalability of the system are becoming increasingly challenging to meet the demand, and must be addressed urgently. The system optimization approach is an effective solution to solve this problem, which has become a hot research area. In light of this, we review the present research status of MoE system optimization techniques in the era of large model in this paper. To begin, we describe the present development state of work for MoE large model, and analyze the performance bottlenecks it faces on the system side. Then, we comprehensively sort out and deeply analyze the most recent research progress from four system core dimensions, ranging from memory occupation, communication latency, computational efficiency to parallel scaling, and compare and elaborate on the key technologies, application scenarios and optimization directions; finally, we summarize the current research state of MoE system optimization and outline some future research directions as well.

HTML全文

参考文献(148)

施引文献

资源附件(1)