Abstract:
In multi-agent reinforcement learning (MARL), partial observability poses a significant challenge to effective collaboration among agents. Although the centralized training with decentralized execution (CTDE) paradigm alleviates this challenge by leveraging global state information during training, agents must still rely solely on local observations during execution. To address this limitation, existing research has primarily followed two directions. The first introduces communication mechanisms to enable information sharing among agents; however, these methods often suffer from high communication overhead and redundant information. The second direction leverages mask reconstruction techniques to infer the missing global information; however, due to sample sparsity, the inferred information depends only on local observations, resulting in low inference accuracy and potential interference with decision-making. To overcome these limitations, this paper proposes an active communication masked reconstruction method. During centralized training, the method dynamically groups agents based on global observation trajectories and allows each agent to select the most relevant communication targets from these groups. During decentralized execution, agents proactively communicate according to their selected target sets, enabling accurate reconstruction of global information and ultimately supporting their decision-making. Extensive experiments conducted on the StarCraft Multi-Agent Challenge (SMAC) demonstrate the effectiveness of the proposed approach in improving coordination and overall performance in partially observable multi-agent scenarios.