高级检索

    图多智能体任务建模视角下的协作子任务行为发现

    Discovering Coordinated Subtask Patterns from a Graphical Multi-Agent Task Modeling Perspective

    • 摘要: 大量多智能体任务都表现出近似可分解结构,其中相同交互集合中智能体间交互强度大,而不同交互集合中智能体间交互强度小. 有效建模该结构并利用其来协调智能体动作选择可以提升合作型多智能体任务中多智能体强化学习算法的学习效率. 然而,目前已有工作通常忽视并且无法有效实现这一目标. 为解决该问题,使用动态图来建模多智能体任务中的近似可分解结构,并由此提出一种名叫协作子任务行为(coordinated subtask pattern,CSP)的新算法来增强智能体间局部以及全局协作. 具体而言,CSP算法使用子任务来识别智能体间的交互集合,并利用双层策略结构来将所有智能体周期性地分配到多个子任务中. 这种分配方式可以准确刻画动态图上智能体间的交互关系. 基于这种子任务分配,CSP算法提出子任务内和子任务间行为约束来提升智能体间局部以及全局协作. 这2种行为约束确保相同子任务内的部分智能体间可以预知彼此动作选择,同时所有智能体选择优异的联合动作来最大化整体任务性能. 在星际争霸环境的多个地图上开展实验,实验结果表明CSP算法明显优于多种对比算法,验证了所提算法可以实现智能体间的高效协作.

       

      Abstract: Numerous multi-agent tasks exhibit a nearly decomposable structure, wherein interactions among agents within the same interaction set are strong while interactions between different sets are weak. Efficiently modeling this structure and leveraging it to coordinate agents can enhance the learning efficiency of multi-agent reinforcement learning algorithms for cooperative multi-agent tasks, while existing work typically neglects and fails. To address this limitation, we model the nearly decomposable structure using a dynamic graph and accordingly propose a novel algorithm named coordinated subtask pattern (CSP) that enhances both local and global coordination among agents. Specifically, CSP identifies agents’ interaction sets as subtasks and utilizes a bi-level structure to periodically distribute agents into multiple subtasks, which ensures accurate characterizations regarding their interactions on the dynamic graph. Based on the subtask assignment, CSP proposes intra-subtask and inter-subtask pattern constraints to facilitate both local and global coordination among agents. These two constraints ensure that partial agents within the same subtask are aware of their action selections and all agents select superior joint actions that maximize the overall task performance. Experimentally, we evaluate CSP across multiple maps of SMAC benchmark, and its superior performance against multiple baseline algorithms demonstrates its effectiveness on efficiently coordinating agents.

       

    /

    返回文章
    返回