Processing math: 100%
  • 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

图多智能体任务建模视角下的协作子任务行为发现

李超, 李文斌, 高阳

李超, 李文斌, 高阳. 图多智能体任务建模视角下的协作子任务行为发现[J]. 计算机研究与发展, 2024, 61(8): 1904-1916. DOI: 10.7544/issn1000-1239.202440189
引用本文: 李超, 李文斌, 高阳. 图多智能体任务建模视角下的协作子任务行为发现[J]. 计算机研究与发展, 2024, 61(8): 1904-1916. DOI: 10.7544/issn1000-1239.202440189
Li Chao, Li Wenbin, Gao Yang. Discovering Coordinated Subtask Patterns from a Graphical Multi-Agent Task Modeling Perspective[J]. Journal of Computer Research and Development, 2024, 61(8): 1904-1916. DOI: 10.7544/issn1000-1239.202440189
Citation: Li Chao, Li Wenbin, Gao Yang. Discovering Coordinated Subtask Patterns from a Graphical Multi-Agent Task Modeling Perspective[J]. Journal of Computer Research and Development, 2024, 61(8): 1904-1916. DOI: 10.7544/issn1000-1239.202440189
李超, 李文斌, 高阳. 图多智能体任务建模视角下的协作子任务行为发现[J]. 计算机研究与发展, 2024, 61(8): 1904-1916. CSTR: 32373.14.issn1000-1239.202440189
引用本文: 李超, 李文斌, 高阳. 图多智能体任务建模视角下的协作子任务行为发现[J]. 计算机研究与发展, 2024, 61(8): 1904-1916. CSTR: 32373.14.issn1000-1239.202440189
Li Chao, Li Wenbin, Gao Yang. Discovering Coordinated Subtask Patterns from a Graphical Multi-Agent Task Modeling Perspective[J]. Journal of Computer Research and Development, 2024, 61(8): 1904-1916. CSTR: 32373.14.issn1000-1239.202440189
Citation: Li Chao, Li Wenbin, Gao Yang. Discovering Coordinated Subtask Patterns from a Graphical Multi-Agent Task Modeling Perspective[J]. Journal of Computer Research and Development, 2024, 61(8): 1904-1916. CSTR: 32373.14.issn1000-1239.202440189

图多智能体任务建模视角下的协作子任务行为发现

基金项目: 国家自然科学基金项目(62192783,62106100,62276142);江苏省自然科学基金项目(BK20221441);江苏省产业前瞻与关键核心技术竞争项目(BE2021028);深圳市中央引导地方科技发展资金项目(2021Szvup056)
详细信息
    作者简介:

    李超: 1996年生. 博士研究生. CCF学生会员. 主要研究方向为强化学习、多智能体系统、任务建模

    李文斌: 1991年生. 博士,副研究员. CCF会员. 主要研究方向为机器学习、元学习、持续学习

    高阳: 1972年生. 博士,教授. CCF委员会委员. 主要研究方向为多智能体强化学习、博弈论、机器学习

    通讯作者:

    李文斌(liwenbin@nju.edu.cn

  • 中图分类号: TP18

Discovering Coordinated Subtask Patterns from a Graphical Multi-Agent Task Modeling Perspective

Funds: This work was supported by the National Natural Science Foundation of China (62192783, 62106100, 62276142), the Natural Science Foundation of Jiangsu Province (BK20221441), the Primary Research and Development Plan of Jiangsu Province (BE2021028), and the Shenzhen Fundamental Research Program (2021Szvup056).
More Information
    Author Bio:

    Li Chao: born in 1996. PhD candidate. Student member of CCF. His main research interests include reinforcement learning, multi-agent systems, and task modeling

    Li Wenbin: born in 1991. PhD, associate researcher. Member of CCF. His main research interests include machine learning, meta learning, and continual learning

    Gao Yang: born in 1972. PhD, professor. Committee member of CCF. His main research interests include multi-agent reinforcement learning, game theory, and machine learning

  • 摘要:

    大量多智能体任务都表现出近似可分解结构,其中相同交互集合中智能体间交互强度大,而不同交互集合中智能体间交互强度小. 有效建模该结构并利用其来协调智能体动作选择可以提升合作型多智能体任务中多智能体强化学习算法的学习效率. 然而,目前已有工作通常忽视并且无法有效实现这一目标. 为解决该问题,使用动态图来建模多智能体任务中的近似可分解结构,并由此提出一种名叫协作子任务行为(coordinated subtask pattern,CSP)的新算法来增强智能体间局部以及全局协作. 具体而言,CSP算法使用子任务来识别智能体间的交互集合,并利用双层策略结构来将所有智能体周期性地分配到多个子任务中. 这种分配方式可以准确刻画动态图上智能体间的交互关系. 基于这种子任务分配,CSP算法提出子任务内和子任务间行为约束来提升智能体间局部以及全局协作. 这2种行为约束确保相同子任务内的部分智能体间可以预知彼此动作选择,同时所有智能体选择优异的联合动作来最大化整体任务性能. 在星际争霸环境的多个地图上开展实验,实验结果表明CSP算法明显优于多种对比算法,验证了所提算法可以实现智能体间的高效协作.

    Abstract:

    Numerous multi-agent tasks exhibit a nearly decomposable structure, wherein interactions among agents within the same interaction set are strong while interactions between different sets are weak. Efficiently modeling this structure and leveraging it to coordinate agents can enhance the learning efficiency of multi-agent reinforcement learning algorithms for cooperative multi-agent tasks, while existing work typically neglects and fails. To address this limitation, we model the nearly decomposable structure using a dynamic graph and accordingly propose a novel algorithm named coordinated subtask pattern (CSP) that enhances both local and global coordination among agents. Specifically, CSP identifies agents’ interaction sets as subtasks and utilizes a bi-level structure to periodically distribute agents into multiple subtasks, which ensures accurate characterizations regarding their interactions on the dynamic graph. Based on the subtask assignment, CSP proposes intra-subtask and inter-subtask pattern constraints to facilitate both local and global coordination among agents. These two constraints ensure that partial agents within the same subtask are aware of their action selections and all agents select superior joint actions that maximize the overall task performance. Experimentally, we evaluate CSP across multiple maps of SMAC benchmark, and its superior performance against multiple baseline algorithms demonstrates its effectiveness on efficiently coordinating agents.

  • 图  1   对于多智能体任务中近似可分解结构的动态协作图

    Figure  1.   Dynamic coordination graph of the nearly decomposable structure in multi-agent tasks

    图  2   一个6智能体任务在某一时间步的智能体动态协作图

    Figure  2.   Agent dynamic coordination graph of a 6-agent task at certain timestep

    图  3   CSP算法结构

    Figure  3.   Architecture of CSP algorithm

    图  4   星际争霸环境6个地图上本文算法与多个基准算法的对比结果

    注:图中实线代表5个随机种子设定下所有算法的平均测试胜率,而阴影区域代表标准差.

    Figure  4.   Comparison results of our algorithm against multiple baselines on six maps in SMAC

    图  5   地图3s5z和2c_vs_64zg下CSP算法主要模块的消融实验

    注:图中实线代表5个随机种子设定下所有算法的平均测试胜率,而阴影区域代表标准差.

    Figure  5.   Ablation studies regarding major modules of CSP algorithm on maps of 3s5z and 2c_vs_64zg

    表  1   星际争霸环境下选用地图的描述

    Table  1   Descriptions of Selected Maps in SMAC

    地图信息 难度 作战单元配置
    3s5z 简单 友方:3 Stalkers, 5 Zealots
    敌方:3 Stalkers, 5 Zealots
    1c3s5z 简单 友方:1 Colossus, 3 Stalkers, 5 Zealots
    敌方:1 Colossus, 3 Stalkers, 5 Zealots
    3s_vs_5z 困难 友方:3 Stalkers
    敌方:5 Zealots
    5m_vs_6m 困难 友方:5 Marines
    敌方:6 Marines
    2c_vs_64zg 困难 友方:2 Colossi
    敌方:64 Zerglings
    MMM2 超级困难 友方:1 Medivac, 2 Marauders, 7 Marines
    敌方:1 Medivac, 3 Marauders, 8 Marines
    下载: 导出CSV

    表  2   所有地图下CSP算法超参数设置

    Table  2   Hyper-Parameter Settings for CSP Algorithm on All Maps

    地图mkαλ
    3s5z550.0011.0
    1c3s5z350.0011.0
    2c_vs_64zg350.0011.0
    3s_vs_5z350.0011.0
    5m_vs_6m350.0011.0
    MMM2350.0010.1
    下载: 导出CSV
  • [1]

    Cao Yongcan, Yu Wenwu, Ren Wei, et al. An overview of recent progress in the study of distributed multi-agent coordination[J]. IEEE Transactions on Industrial Informatics, 2012, 9(1): 427−438

    [2]

    Wang Xiaoqiang, Ke Liangjun, Qiao Zhimin, et al. Large-scale traffic signal control using a novel multiagent reinforcement learning[J]. IEEE Transactions on Cybernetics, 2020, 51(1): 174−187

    [3] 高涵,罗娟,蔡乾娅,等. 一种基于异步决策的智能交通信号协调方法[J]. 计算机研究与发展,2023,60(12):2797−2805 doi: 10.7544/issn1000-1239.202220773

    Gao Han, Luo Juan, Cai Qianya, et al. An intelligent traffic signal coordination method based on asynchronous decision-making[J]. Journal of Computer Research and Development, 2023, 60(12): 2797−2805 (in Chinese) doi: 10.7544/issn1000-1239.202220773

    [4] 郑莹莹,周俊龙,申钰凡,等. 时间和能量敏感的端—边—云车路协同系统资源调度优化方法[J]. 计算机研究与发展,2023,60(5):1037−1052

    Zheng Yingying, Zhou Junlong, Shen Yufan, et al. Time and energy-sensitive end-edge-cloud resource provisioning optimization method for collaborative vehicle-road systems[J]. Journal of Computer Research and Development, 2023, 60(5): 1037−1052 (in Chinese)

    [5]

    De Souza C, Newbury R, Cosgun A, et al. Decentralized multi-agent pursuit using deep reinforcement learning[J]. IEEE Robotics and Automation Letters, 2021, 6(3): 4552−4559 doi: 10.1109/LRA.2021.3068952

    [6]

    Zhang Weipeng, Zhang Ning, Yan Junchi, et al. Auto uning of price prediction models for high-frequency trading via reinforcement learning[J]. Pattern Recognition, 2022, 125: 108543 doi: 10.1016/j.patcog.2022.108543

    [7]

    Xu Jing, Zhong Fangwei, Wang Yizhou. Learning multi-agent coordination for enhancing target coverage in directional sensor networks[C/OL]//Advances in Neural Information Processing Systems. 2020[2024-05-10]. https://proceedings.neurips.cc/paper/2020/hash/7250eb93b3c18cc9daa29cf58af7a004-Abstract.html

    [8]

    Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529−533 doi: 10.1038/nature14236

    [9]

    Simon H A. The architecture of complexity[J]. Proceedings of the American Philosophical Society, 1962, 106(6): 467−482

    [10]

    Sunehag P, Lever G, Gruslys A, et al. Value-decomposition networks for cooperative multi-agent learning based on team reward[C]//Proc of the Int Conf on Autonomous Agents and Multi-Agent Systems. Richland, SC, USA: International Foundation for Autonomous Agents and Multiagent Systems/ACM, 2018: 2085−2087

    [11]

    Rashid T, Samvelyan M, De Witt C S, et al. Monotonic value function factorization for deep multi-agent reinforcement learning[J]. The Journal of Machine Learning Research, 2020, 21(1): 7234−7284

    [12]

    Son K, Kim D, Kang W J, et al. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning[C/OL]//Proc of Int Conf on Machine Learning. New York: PMLR, 2019[2024-05-10]. https://proceedings.mlr.press/v97/son19a

    [13]

    Wang Jianhao, Ren Zhizhou, Liu Terry, et al. QPLEX: Duplex dueling multi-agent q-learning[C/OL]//Proc of Int Conf on Learning Representations. 2021[2024-05-10]. https://openreview.net/forum?id=Rcmk0xxIQV

    [14]

    Lowe R, Wu Y I, Tamar A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Advances in Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2017, 30: 6379−6390

    [15]

    Foerster J, Farquhar G, Afouras T, et al. Counterfactual multi-agent policy gradients[C]//Proc of the AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2018, 32(1): 2974−2982

    [16]

    Wang Yihan, Han Beining, Wang Tonghan, et al. DOP: Off-policy multi-agent decomposed policy gradients[C/OL]//Proc of Int Conf on Learning Representations. 2021[2024-05-10]. https://openreview.net/forum?id=6FqKiVAdI3Y

    [17]

    Peng Bei, Rashid T, Schroeder de Witt C, et al. FACMAC: Factored multi-agent centralised policy gradients[C/OL]//Advances in Neural Information Processing Systems. 2021[2024-05-10]. https://proceedings.neurips.cc/paper/2021/hash/65b9eea6e1cc6bb9f0cd2a47751a186f-Abstract.html

    [18]

    Yu Chao, Velu A, Vinitsky E, et al. The surprising effectiveness of ppo in cooperative multi-agent games[C/OL]//Advances in Neural Information Processing Systems. 2022[2024-05-10]. https://proceedings.neurips.cc/paper_files/paper/2022/hash/9c1535a02f0ce079433344e14d910597-Abstract-Datasets_and_Benchmarks.html

    [19]

    Zhong Yifan, Kuba J G, Feng Xidong, et al. Heterogeneous-agent reinforcement learning[J]. Journal of Machine Learning Research, 2024, 25: 1−67

    [20] 丁世飞,杜威,郭丽丽,等. 基于双评论家的多智能体深度确定性策略梯度方法[J]. 计算机研究与发展,2023,60(10):2394−2404 doi: 10.7544/issn1000-1239.2022.20399

    Ding Shifei, Du Wei, Guo Lili, et al. Multi-agent deep deterministic policy gradient method based on double critics[J]. Journal of Computer Research and Development, 2023, 60(10): 2394−2404 (in Chinese) doi: 10.7544/issn1000-1239.2022.20399

    [21]

    Guestrin C, Lagoudakis M, Parr R. Coordinated reinforcement learning[C]//Proc of Int Conf on Machine Learning, San Francisco, CA: Morgan Kaufmann Publishers Inc, 2002, 2: 227−234

    [22]

    Zhang Chongjie, Lesser V. Coordinated multi-agent reinforcement learning in networked distributed POMDPs[C]//Proc of the AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2011: 764−770

    [23]

    Zhang Chongjie, Lesser V. Coordinating multi-agent reinforcement learning with limited communication[C]//Proc of the Int Conf on Autonomous Agents and Multi-Agent Systems. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems, 2013: 1101−1108

    [24]

    Das A, Gervet T, Romoff J, et al. Tarmac: Targeted multi-agent communication[C/OL]//Proc of Int Conf on Machine Learning. New York: PMLR, 2019[2024-05-10]. https://proceedings.mlr.press/v97/das19a.html

    [25]

    Kim W, Park J, Sung Y. Communication in multi-agent reinforcement learning: intention sharing[C/OL]//Proc of Int Conf on Learning Representations. 2021[2024-05-10]. https://openreview.net/forum?id=qpsl2dR9twy

    [26]

    Mao Hangyu, Liu Wulong, Hao Jianye, et al. Neighborhood cognition consistent multi-agent reinforcement learning[C]//Proc of the AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI , 2020, 34(5): 7219−7226

    [27]

    Zang Yifan, He Jinmin, Li Kai, et al. Automatic grouping for efficient cooperative multi-agent reinforcement learning[C/OL]//Advances in Neural Information Processing Systems. 2024[2024-05-10]. https://proceedings.neurips.cc/paper_files/paper/2023/hash/906c860f1b7515a8ffec02dcdac74048-Abstract-Conference.html

    [28]

    Liu Weiwei, Peng Linpeng, Wen Licheng, et al. Decomposing shared networks for separate cooperation with multi-agent reinforcement learning[J]. Information Sciences, 2023, 641: 119085 doi: 10.1016/j.ins.2023.119085

    [29]

    Liu Yilin, Luo Guiyang, Yuan Quan, et al. GpLight: Grouped multi-agent reinforcement learning for large-scale traffic signal control[C/OL]//Proc of the Int Joint Conf on Artificial Intelligence. 2023[2024-05-10]. https://www.ijcai.org/proceedings/2023/23

    [30]

    Liu Shanqi, Hu Yujing, Wu Runze, et al. Adaptive value decomposition with greedy marginal contribution computation for cooperative multi-agent reinforcement learning[C]//Proc of the Int Conf on Autonomous Agents and Multi-Agent Systems. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems, 2023: 31−39

    [31]

    Wang Tonghan, Dong Heng, Lesser V, et al. ROMA: Multi-agent reinforcement learning with emergent roles[C/OL]//Proc of the Int Conf on Machine Learning. New York: PMLR, 2020[2024-05-10]. https://proceedings.mlr.press/v119/wang20f

    [32]

    Wang Tonghan, Gupta T, Mahajan A, et al, et al. RODE: Learning roles to decompose multi-agent tasks[C/OL]//Proc of the Int Conf on Learning Representations. 2021[2024-05-10]. https://openreview.net/forum?id=TTUVg6vkNjK

    [33]

    Yang Mingyu, Zhao Jian, Hu Xunhan, et al. LDSA: Learning dynamic subtask assignment in cooperative multi-agent reinforcement learning[C/OL]//Advances in Neural Information Processing Systems. 2022[2024-05-10]. https://proceedings.neurips.cc/paper_files/paper/2022/hash/0b4145b562cc22fb7fa50a2cd17c191d-Abstract-Conference.html

    [34]

    Iqbal S, Costales R, Sha F. ALMA: Hierarchical learning for composite multi-agent tasks[C/OL]//Advances in Neural Information Processing Systems. 2022[2024-05-10]. https://proceedings.neurips.cc/paper_files/paper/2022/hash/2f27964513a28d034530bfdd117ea31d-Abstract-Conference.html

    [35]

    Yuan Lei, Wang Chenghe, Wang Jianhao, et al. Multi-agent concentrative coordination with decentralized task representation[C/OL]//Proc of the Int Joint Conf on Artificial Intelligence. 2022[2024-05-10]. https://www.ijcai.org/proceedings/2022/85

    [36]

    Li Chenghao, Wang Tonghan, Wu Chengjie, et al. Celebrating diversity with subtask specialization in shared multiagent reinforcement learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 1−15

    [37]

    Samvelyan M, Rashid T, De Witt C S, et al. The starcraft multi-agent challenge[C]//Proc of the Int Conf on Autonomous Agents and Multi-Agent Systems. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems, 2019: 2186−2188

    [38]

    Wang Tonghan, Wang Jianhao, Wu Yi, et al. Influence-based multi-agent exploration[C/OL]//Proc of the Int Conf on Learning Representations. 2020[2024-05-10]. https://openreview.net/forum?id=BJgy96EYvr

    [39]

    Mahajan A, Rashid T, Samvelyan M, et al. MAVEN: Multi-agent variational exploration[C/OL]//Advances in Neural Information Processing Systems. 2019[2024-05-10]. https://papers.nips.cc/paper_files/paper/2019/hash/f816dc0acface7498e10496222e9db10-Abstract.html

  • 期刊类型引用(7)

    1. 张淑芬,张宏扬,任志强,陈学斌. 联邦学习的公平性综述. 计算机应用. 2025(01): 1-14 . 百度学术
    2. 朱智韬,司世景,王健宗,程宁,孔令炜,黄章成,肖京. 联邦学习的公平性研究综述. 大数据. 2024(01): 62-85 . 百度学术
    3. 李锦辉,吴毓峰,余涛,潘振宁. 数据孤岛下基于联邦学习的用户电价响应刻画及其应用. 电力系统保护与控制. 2024(06): 164-176 . 百度学术
    4. 刘新,刘冬兰,付婷,王勇,常英贤,姚洪磊,罗昕,王睿,张昊. 基于联邦学习的时间序列预测算法. 山东大学学报(工学版). 2024(03): 55-63 . 百度学术
    5. 赵泽华,梁美玉,薛哲,李昂,张珉. 基于数据质量评估的高效强化联邦学习节点动态采样优化. 智能系统学报. 2024(06): 1552-1561 . 百度学术
    6. 杨秀清,彭长根,刘海,丁红发,汤寒林. 基于数据质量评估的公平联邦学习方案. 计算机与数字工程. 2022(06): 1278-1285 . 百度学术
    7. 黎志鹏. 高可靠的联邦学习在图神经网络上的聚合方法. 工业控制计算机. 2022(10): 85-87+90 . 百度学术

    其他类型引用(10)

图(5)  /  表(2)
计量
  • 文章访问数:  176
  • HTML全文浏览量:  39
  • PDF下载量:  76
  • 被引次数: 17
出版历程
  • 收稿日期:  2024-03-15
  • 修回日期:  2024-05-09
  • 网络出版日期:  2024-07-04
  • 刊出日期:  2024-07-31

目录

    /

    返回文章
    返回