图多智能体任务建模视角下的协作子任务行为发现

李超; 李文斌; 高阳

doi:10.7544/issn1000-1239.202440189

图多智能体任务建模视角下的协作子任务行为发现

计算机软件新技术全国重点实验室（南京大学）南京　210023

基金项目: 国家自然科学基金项目（62192783，62106100，62276142）；江苏省自然科学基金项目（BK20221441）；江苏省产业前瞻与关键核心技术竞争项目（BE2021028）；深圳市中央引导地方科技发展资金项目（2021Szvup056）

详细信息

作者简介:
李超: 1996年生. 博士研究生. CCF学生会员. 主要研究方向为强化学习、多智能体系统、任务建模

李文斌: 1991年生. 博士，副研究员. CCF会员. 主要研究方向为机器学习、元学习、持续学习

高阳: 1972年生. 博士，教授. CCF委员会委员. 主要研究方向为多智能体强化学习、博弈论、机器学习

通讯作者:
李文斌（liwenbin@nju.edu.cn）

中图分类号: TP18
计量
- 文章访问数: 176
- HTML全文浏览量: 39
- PDF下载量: 76
出版历程
- 收稿日期: 2024-03-15
- 修回日期: 2024-05-09
- 网络出版日期: 2024-07-04
- 刊出日期: 2024-07-31

Discovering Coordinated Subtask Patterns from a Graphical Multi-Agent Task Modeling Perspective

State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023

Funds: This work was supported by the National Natural Science Foundation of China (62192783, 62106100, 62276142), the Natural Science Foundation of Jiangsu Province (BK20221441), the Primary Research and Development Plan of Jiangsu Province (BE2021028), and the Shenzhen Fundamental Research Program (2021Szvup056).

More Information

Author Bio:
Li Chao: born in 1996. PhD candidate. Student member of CCF. His main research interests include reinforcement learning, multi-agent systems, and task modeling

Li Wenbin: born in 1991. PhD, associate researcher. Member of CCF. His main research interests include machine learning, meta learning, and continual learning

Gao Yang: born in 1972. PhD, professor. Committee member of CCF. His main research interests include multi-agent reinforcement learning, game theory, and machine learning

摘要

摘要:
大量多智能体任务都表现出近似可分解结构，其中相同交互集合中智能体间交互强度大，而不同交互集合中智能体间交互强度小. 有效建模该结构并利用其来协调智能体动作选择可以提升合作型多智能体任务中多智能体强化学习算法的学习效率. 然而，目前已有工作通常忽视并且无法有效实现这一目标. 为解决该问题，使用动态图来建模多智能体任务中的近似可分解结构，并由此提出一种名叫协作子任务行为（coordinated subtask pattern，CSP）的新算法来增强智能体间局部以及全局协作. 具体而言，CSP算法使用子任务来识别智能体间的交互集合，并利用双层策略结构来将所有智能体周期性地分配到多个子任务中. 这种分配方式可以准确刻画动态图上智能体间的交互关系. 基于这种子任务分配，CSP算法提出子任务内和子任务间行为约束来提升智能体间局部以及全局协作. 这2种行为约束确保相同子任务内的部分智能体间可以预知彼此动作选择，同时所有智能体选择优异的联合动作来最大化整体任务性能. 在星际争霸环境的多个地图上开展实验，实验结果表明CSP算法明显优于多种对比算法，验证了所提算法可以实现智能体间的高效协作.
- 多智能体强化学习 /
- 合作型任务 /
- 近似可分解结构 /
- 动态图 /
- 协作
Abstract:
Numerous multi-agent tasks exhibit a nearly decomposable structure, wherein interactions among agents within the same interaction set are strong while interactions between different sets are weak. Efficiently modeling this structure and leveraging it to coordinate agents can enhance the learning efficiency of multi-agent reinforcement learning algorithms for cooperative multi-agent tasks, while existing work typically neglects and fails. To address this limitation, we model the nearly decomposable structure using a dynamic graph and accordingly propose a novel algorithm named coordinated subtask pattern (CSP) that enhances both local and global coordination among agents. Specifically, CSP identifies agents’ interaction sets as subtasks and utilizes a bi-level structure to periodically distribute agents into multiple subtasks, which ensures accurate characterizations regarding their interactions on the dynamic graph. Based on the subtask assignment, CSP proposes intra-subtask and inter-subtask pattern constraints to facilitate both local and global coordination among agents. These two constraints ensure that partial agents within the same subtask are aware of their action selections and all agents select superior joint actions that maximize the overall task performance. Experimentally, we evaluate CSP across multiple maps of SMAC benchmark, and its superior performance against multiple baseline algorithms demonstrates its effectiveness on efficiently coordinating agents.
- multi-agent reinforcement learning /
- cooperative tasks /
- nearly decomposable structure /
- dynamic graph /
- coordination

HTML全文

图 1 对于多智能体任务中近似可分解结构的动态协作图

Figure 1. Dynamic coordination graph of the nearly decomposable structure in multi-agent tasks

下载: 全尺寸图片幻灯片

图 2 一个6智能体任务在某一时间步的智能体动态协作图

Figure 2. Agent dynamic coordination graph of a 6-agent task at certain timestep

下载: 全尺寸图片幻灯片

图 3 CSP算法结构

Figure 3. Architecture of CSP algorithm

下载: 全尺寸图片幻灯片

图 4 星际争霸环境6个地图上本文算法与多个基准算法的对比结果

注：图中实线代表5个随机种子设定下所有算法的平均测试胜率，而阴影区域代表标准差.

Figure 4. Comparison results of our algorithm against multiple baselines on six maps in SMAC

下载: 全尺寸图片幻灯片

图 5 地图3s5z和2c_vs_64zg下CSP算法主要模块的消融实验

注：图中实线代表5个随机种子设定下所有算法的平均测试胜率，而阴影区域代表标准差.

Figure 5. Ablation studies regarding major modules of CSP algorithm on maps of 3s5z and 2c_vs_64zg

下载: 全尺寸图片幻灯片

表 1 星际争霸环境下选用地图的描述

Table 1 Descriptions of Selected Maps in SMAC

地图信息	难度	作战单元配置
3s5z	简单	友方：3 Stalkers, 5 Zealots 敌方：3 Stalkers, 5 Zealots
1c3s5z	简单	友方：1 Colossus, 3 Stalkers, 5 Zealots 敌方：1 Colossus, 3 Stalkers, 5 Zealots
3s_vs_5z	困难	友方：3 Stalkers 敌方：5 Zealots
5m_vs_6m	困难	友方：5 Marines 敌方：6 Marines
2c_vs_64zg	困难	友方：2 Colossi 敌方：64 Zerglings
MMM2	超级困难	友方：1 Medivac, 2 Marauders, 7 Marines 敌方：1 Medivac, 3 Marauders, 8 Marines

下载: 导出CSV

表 2 所有地图下CSP算法超参数设置

Table 2 Hyper-Parameter Settings for CSP Algorithm on All Maps

地图	$m$	$k$	$\alpha$	$\lambda$
3s5z	5	5	0.001	1.0
1c3s5z	3	5	0.001	1.0
2c_vs_64zg	3	5	0.001	1.0
3s_vs_5z	3	5	0.001	1.0
5m_vs_6m	3	5	0.001	1.0
MMM2	3	5	0.001	0.1

下载: 导出CSV

参考文献(39)

[1]	Cao Yongcan, Yu Wenwu, Ren Wei, et al. An overview of recent progress in the study of distributed multi-agent coordination[J]. IEEE Transactions on Industrial Informatics, 2012, 9(1): 427−438
[2]	Wang Xiaoqiang, Ke Liangjun, Qiao Zhimin, et al. Large-scale traffic signal control using a novel multiagent reinforcement learning[J]. IEEE Transactions on Cybernetics, 2020, 51(1): 174−187
[3]	高涵,罗娟,蔡乾娅,等. 一种基于异步决策的智能交通信号协调方法[J]. 计算机研究与发展,2023,60(12):2797−2805 doi: 10.7544/issn1000-1239.202220773 Gao Han, Luo Juan, Cai Qianya, et al. An intelligent traffic signal coordination method based on asynchronous decision-making[J]. Journal of Computer Research and Development, 2023, 60(12): 2797−2805 (in Chinese) doi: 10.7544/issn1000-1239.202220773
[4]	郑莹莹,周俊龙,申钰凡,等. 时间和能量敏感的端—边—云车路协同系统资源调度优化方法[J]. 计算机研究与发展,2023,60(5):1037−1052 Zheng Yingying, Zhou Junlong, Shen Yufan, et al. Time and energy-sensitive end-edge-cloud resource provisioning optimization method for collaborative vehicle-road systems[J]. Journal of Computer Research and Development, 2023, 60(5): 1037−1052 (in Chinese)
[5]	De Souza C, Newbury R, Cosgun A, et al. Decentralized multi-agent pursuit using deep reinforcement learning[J]. IEEE Robotics and Automation Letters, 2021, 6(3): 4552−4559 doi: 10.1109/LRA.2021.3068952
[6]	Zhang Weipeng, Zhang Ning, Yan Junchi, et al. Auto uning of price prediction models for high-frequency trading via reinforcement learning[J]. Pattern Recognition, 2022, 125: 108543 doi: 10.1016/j.patcog.2022.108543
[7]	Xu Jing, Zhong Fangwei, Wang Yizhou. Learning multi-agent coordination for enhancing target coverage in directional sensor networks[C/OL]//Advances in Neural Information Processing Systems. 2020[2024-05-10]. https://proceedings.neurips.cc/paper/2020/hash/7250eb93b3c18cc9daa29cf58af7a004-Abstract.html
[8]	Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529−533 doi: 10.1038/nature14236
[9]	Simon H A. The architecture of complexity[J]. Proceedings of the American Philosophical Society, 1962, 106(6): 467−482
[10]	Sunehag P, Lever G, Gruslys A, et al. Value-decomposition networks for cooperative multi-agent learning based on team reward[C]//Proc of the Int Conf on Autonomous Agents and Multi-Agent Systems. Richland, SC, USA: International Foundation for Autonomous Agents and Multiagent Systems/ACM, 2018: 2085−2087
[11]	Rashid T, Samvelyan M, De Witt C S, et al. Monotonic value function factorization for deep multi-agent reinforcement learning[J]. The Journal of Machine Learning Research, 2020, 21(1): 7234−7284
[12]	Son K, Kim D, Kang W J, et al. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning[C/OL]//Proc of Int Conf on Machine Learning. New York: PMLR, 2019[2024-05-10]. https://proceedings.mlr.press/v97/son19a
[13]	Wang Jianhao, Ren Zhizhou, Liu Terry, et al. QPLEX: Duplex dueling multi-agent q-learning[C/OL]//Proc of Int Conf on Learning Representations. 2021[2024-05-10]. https://openreview.net/forum?id=Rcmk0xxIQV
[14]	Lowe R, Wu Y I, Tamar A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Advances in Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2017, 30: 6379−6390
[15]	Foerster J, Farquhar G, Afouras T, et al. Counterfactual multi-agent policy gradients[C]//Proc of the AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2018, 32(1): 2974−2982
[16]	Wang Yihan, Han Beining, Wang Tonghan, et al. DOP: Off-policy multi-agent decomposed policy gradients[C/OL]//Proc of Int Conf on Learning Representations. 2021[2024-05-10]. https://openreview.net/forum?id=6FqKiVAdI3Y
[17]	Peng Bei, Rashid T, Schroeder de Witt C, et al. FACMAC: Factored multi-agent centralised policy gradients[C/OL]//Advances in Neural Information Processing Systems. 2021[2024-05-10]. https://proceedings.neurips.cc/paper/2021/hash/65b9eea6e1cc6bb9f0cd2a47751a186f-Abstract.html
[18]	Yu Chao, Velu A, Vinitsky E, et al. The surprising effectiveness of ppo in cooperative multi-agent games[C/OL]//Advances in Neural Information Processing Systems. 2022[2024-05-10]. https://proceedings.neurips.cc/paper_files/paper/2022/hash/9c1535a02f0ce079433344e14d910597-Abstract-Datasets_and_Benchmarks.html
[19]	Zhong Yifan, Kuba J G, Feng Xidong, et al. Heterogeneous-agent reinforcement learning[J]. Journal of Machine Learning Research, 2024, 25: 1−67
[20]	丁世飞,杜威,郭丽丽,等. 基于双评论家的多智能体深度确定性策略梯度方法[J]. 计算机研究与发展,2023,60(10):2394−2404 doi: 10.7544/issn1000-1239.2022.20399 Ding Shifei, Du Wei, Guo Lili, et al. Multi-agent deep deterministic policy gradient method based on double critics[J]. Journal of Computer Research and Development, 2023, 60(10): 2394−2404 (in Chinese) doi: 10.7544/issn1000-1239.2022.20399
[21]	Guestrin C, Lagoudakis M, Parr R. Coordinated reinforcement learning[C]//Proc of Int Conf on Machine Learning, San Francisco, CA: Morgan Kaufmann Publishers Inc, 2002, 2: 227−234
[22]	Zhang Chongjie, Lesser V. Coordinated multi-agent reinforcement learning in networked distributed POMDPs[C]//Proc of the AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2011: 764−770
[23]	Zhang Chongjie, Lesser V. Coordinating multi-agent reinforcement learning with limited communication[C]//Proc of the Int Conf on Autonomous Agents and Multi-Agent Systems. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems, 2013: 1101−1108
[24]	Das A, Gervet T, Romoff J, et al. Tarmac: Targeted multi-agent communication[C/OL]//Proc of Int Conf on Machine Learning. New York: PMLR, 2019[2024-05-10]. https://proceedings.mlr.press/v97/das19a.html
[25]	Kim W, Park J, Sung Y. Communication in multi-agent reinforcement learning: intention sharing[C/OL]//Proc of Int Conf on Learning Representations. 2021[2024-05-10]. https://openreview.net/forum?id=qpsl2dR9twy
[26]	Mao Hangyu, Liu Wulong, Hao Jianye, et al. Neighborhood cognition consistent multi-agent reinforcement learning[C]//Proc of the AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI , 2020, 34(5): 7219−7226
[27]	Zang Yifan, He Jinmin, Li Kai, et al. Automatic grouping for efficient cooperative multi-agent reinforcement learning[C/OL]//Advances in Neural Information Processing Systems. 2024[2024-05-10]. https://proceedings.neurips.cc/paper_files/paper/2023/hash/906c860f1b7515a8ffec02dcdac74048-Abstract-Conference.html
[28]	Liu Weiwei, Peng Linpeng, Wen Licheng, et al. Decomposing shared networks for separate cooperation with multi-agent reinforcement learning[J]. Information Sciences, 2023, 641: 119085 doi: 10.1016/j.ins.2023.119085
[29]	Liu Yilin, Luo Guiyang, Yuan Quan, et al. GpLight: Grouped multi-agent reinforcement learning for large-scale traffic signal control[C/OL]//Proc of the Int Joint Conf on Artificial Intelligence. 2023[2024-05-10]. https://www.ijcai.org/proceedings/2023/23
[30]	Liu Shanqi, Hu Yujing, Wu Runze, et al. Adaptive value decomposition with greedy marginal contribution computation for cooperative multi-agent reinforcement learning[C]//Proc of the Int Conf on Autonomous Agents and Multi-Agent Systems. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems, 2023: 31−39
[31]	Wang Tonghan, Dong Heng, Lesser V, et al. ROMA: Multi-agent reinforcement learning with emergent roles[C/OL]//Proc of the Int Conf on Machine Learning. New York: PMLR, 2020[2024-05-10]. https://proceedings.mlr.press/v119/wang20f
[32]	Wang Tonghan, Gupta T, Mahajan A, et al, et al. RODE: Learning roles to decompose multi-agent tasks[C/OL]//Proc of the Int Conf on Learning Representations. 2021[2024-05-10]. https://openreview.net/forum?id=TTUVg6vkNjK
[33]	Yang Mingyu, Zhao Jian, Hu Xunhan, et al. LDSA: Learning dynamic subtask assignment in cooperative multi-agent reinforcement learning[C/OL]//Advances in Neural Information Processing Systems. 2022[2024-05-10]. https://proceedings.neurips.cc/paper_files/paper/2022/hash/0b4145b562cc22fb7fa50a2cd17c191d-Abstract-Conference.html
[34]	Iqbal S, Costales R, Sha F. ALMA: Hierarchical learning for composite multi-agent tasks[C/OL]//Advances in Neural Information Processing Systems. 2022[2024-05-10]. https://proceedings.neurips.cc/paper_files/paper/2022/hash/2f27964513a28d034530bfdd117ea31d-Abstract-Conference.html
[35]	Yuan Lei, Wang Chenghe, Wang Jianhao, et al. Multi-agent concentrative coordination with decentralized task representation[C/OL]//Proc of the Int Joint Conf on Artificial Intelligence. 2022[2024-05-10]. https://www.ijcai.org/proceedings/2022/85
[36]	Li Chenghao, Wang Tonghan, Wu Chengjie, et al. Celebrating diversity with subtask specialization in shared multiagent reinforcement learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 1−15
[37]	Samvelyan M, Rashid T, De Witt C S, et al. The starcraft multi-agent challenge[C]//Proc of the Int Conf on Autonomous Agents and Multi-Agent Systems. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems, 2019: 2186−2188
[38]	Wang Tonghan, Wang Jianhao, Wu Yi, et al. Influence-based multi-agent exploration[C/OL]//Proc of the Int Conf on Learning Representations. 2020[2024-05-10]. https://openreview.net/forum?id=BJgy96EYvr
[39]	Mahajan A, Rashid T, Samvelyan M, et al. MAVEN: Multi-agent variational exploration[C/OL]//Advances in Neural Information Processing Systems. 2019[2024-05-10]. https://papers.nips.cc/paper_files/paper/2019/hash/f816dc0acface7498e10496222e9db10-Abstract.html

施引文献(17)

期刊类型引用(7)

1.	张淑芬，张宏扬，任志强，陈学斌. 联邦学习的公平性综述. 计算机应用. 2025(01): 1-14 . 百度学术
2.	朱智韬，司世景，王健宗，程宁，孔令炜，黄章成，肖京. 联邦学习的公平性研究综述. 大数据. 2024(01): 62-85 . 百度学术
3.	李锦辉，吴毓峰，余涛，潘振宁. 数据孤岛下基于联邦学习的用户电价响应刻画及其应用. 电力系统保护与控制. 2024(06): 164-176 . 百度学术
4.	刘新，刘冬兰，付婷，王勇，常英贤，姚洪磊，罗昕，王睿，张昊. 基于联邦学习的时间序列预测算法. 山东大学学报(工学版). 2024(03): 55-63 . 百度学术
5.	赵泽华，梁美玉，薛哲，李昂，张珉. 基于数据质量评估的高效强化联邦学习节点动态采样优化. 智能系统学报. 2024(06): 1552-1561 . 百度学术
6.	杨秀清，彭长根，刘海，丁红发，汤寒林. 基于数据质量评估的公平联邦学习方案. 计算机与数字工程. 2022(06): 1278-1285 . 百度学术
7.	黎志鹏. 高可靠的联邦学习在图神经网络上的聚合方法. 工业控制计算机. 2022(10): 85-87+90 . 百度学术