面向多智能体强化学习中部分可观测性的主动通信掩码重建方法

贾月洋; 刘文彬; 王恩

doi:10.7544/issn1000-1239.202550701

面向多智能体强化学习中部分可观测性的主动通信掩码重建方法

Active Communication Masked Reconstruction Method for Partial Observability in Multi-Agent Reinforcement Learning

摘要

摘要: 在多智能体强化学习（multi-agent reinforcement learning，MARL）中，部分可观测性问题严重制约了智能体间的协作。尽管集中训练和分散执行（centralized training and decentralized execution，CTDE）范式通过在训练阶段引入全局状态缓解了该问题，但在执行阶段，智能体仍只能依赖局部观测。为进一步解决该限制，目前有2种方案：一种通过引入通信机制实现智能体间的信息共享，但会面临通信开销大、信息冗余等问题；另一种则通过掩码重建推理全局信息，但受限于样本稀疏性，重建信息仅来自于单一智能体的局部观测，导致重建精度低，可能会干扰决策。基于此，提出一种主动通信掩码重建方法。在集中训练阶段，该方法根据全局观测轨迹对智能体进行动态分组，并为每个智能体从各组中选取最相关的通信对象，进而在分散执行阶段，智能体依据通信对象集合进行主动通信，从而实现全局信息的准确重建，最终辅助智能体决策。通过在星际争霸多智能体挑战赛（StarCraftⅡ Multi-Agent Challenge，SMAC）上的一系列实验，证明了该方法在部分可观测场景下对提升多智能体系统协作与整体策略性能的有效性。

Abstract: In multi-agent reinforcement learning (MARL), partial observability poses a significant challenge to effective collaboration among agents. Although the centralized training with decentralized execution (CTDE) paradigm alleviates this challenge by leveraging global state information during training, agents must still rely solely on local observations during execution. To address this limitation, existing research has primarily followed two directions. The first direction introduces communication mechanisms to enable information sharing among agents; however, these methods often suffer from high communication overhead and redundant information. The second direction leverages mask reconstruction techniques to infer the missing global information; however, due to sample sparsity, the inferred information depends only on local observations, resulting in low inference accuracy and potential interference with decision-making. To overcome these limitations, we propose an active communication masked reconstruction method. During centralized training, the method dynamically groups agents based on global observation trajectories and allows each agent to select the most relevant communication targets from these groups. During decentralized execution, agents proactively communicate according to their selected target sets, enabling accurate reconstruction of global information and ultimately supporting their decision-making. Extensive experiments conducted on the StarCraftⅡ Multi-Agent Challenge (SMAC) demonstrate the effectiveness of the proposed approach in improving coordination and overall performance in partially observable multi-agent scenarios.

HTML全文

参考文献(50)

施引文献

资源附件(0)