Discovering Coordinated Subtask Patterns from a Graphical Multi-Agent Task Modeling Perspective
-
摘要:
大量多智能体任务都表现出近似可分解结构,其中相同交互集合中智能体间交互强度大,而不同交互集合中智能体间交互强度小. 有效建模该结构并利用其来协调智能体动作选择可以提升合作型多智能体任务中多智能体强化学习算法的学习效率. 然而,目前已有工作通常忽视并且无法有效实现这一目标. 为解决该问题,使用动态图来建模多智能体任务中的近似可分解结构,并由此提出一种名叫协作子任务行为(coordinated subtask pattern,CSP)的新算法来增强智能体间局部以及全局协作. 具体而言,CSP算法使用子任务来识别智能体间的交互集合,并利用双层策略结构来将所有智能体周期性地分配到多个子任务中. 这种分配方式可以准确刻画动态图上智能体间的交互关系. 基于这种子任务分配,CSP算法提出子任务内和子任务间行为约束来提升智能体间局部以及全局协作. 这2种行为约束确保相同子任务内的部分智能体间可以预知彼此动作选择,同时所有智能体选择优异的联合动作来最大化整体任务性能. 在星际争霸环境的多个地图上开展实验,实验结果表明CSP算法明显优于多种对比算法,验证了所提算法可以实现智能体间的高效协作.
Abstract:Numerous multi-agent tasks exhibit a nearly decomposable structure, wherein interactions among agents within the same interaction set are strong while interactions between different sets are weak. Efficiently modeling this structure and leveraging it to coordinate agents can enhance the learning efficiency of multi-agent reinforcement learning algorithms for cooperative multi-agent tasks, while existing work typically neglects and fails. To address this limitation, we model the nearly decomposable structure using a dynamic graph and accordingly propose a novel algorithm named coordinated subtask pattern (CSP) that enhances both local and global coordination among agents. Specifically, CSP identifies agents’ interaction sets as subtasks and utilizes a bi-level structure to periodically distribute agents into multiple subtasks, which ensures accurate characterizations regarding their interactions on the dynamic graph. Based on the subtask assignment, CSP proposes intra-subtask and inter-subtask pattern constraints to facilitate both local and global coordination among agents. These two constraints ensure that partial agents within the same subtask are aware of their action selections and all agents select superior joint actions that maximize the overall task performance. Experimentally, we evaluate CSP across multiple maps of SMAC benchmark, and its superior performance against multiple baseline algorithms demonstrates its effectiveness on efficiently coordinating agents.
-
-
表 1 星际争霸环境下选用地图的描述
Table 1 Descriptions of Selected Maps in SMAC
地图信息 难度 作战单元配置 3s5z 简单 友方:3 Stalkers, 5 Zealots
敌方:3 Stalkers, 5 Zealots1c3s5z 简单 友方:1 Colossus, 3 Stalkers, 5 Zealots
敌方:1 Colossus, 3 Stalkers, 5 Zealots3s_vs_5z 困难 友方:3 Stalkers
敌方:5 Zealots5m_vs_6m 困难 友方:5 Marines
敌方:6 Marines2c_vs_64zg 困难 友方:2 Colossi
敌方:64 ZerglingsMMM2 超级困难 友方:1 Medivac, 2 Marauders, 7 Marines
敌方:1 Medivac, 3 Marauders, 8 Marines表 2 所有地图下CSP算法超参数设置
Table 2 Hyper-Parameter Settings for CSP Algorithm on All Maps
地图 m k α λ 3s5z 5 5 0.001 1.0 1c3s5z 3 5 0.001 1.0 2c_vs_64zg 3 5 0.001 1.0 3s_vs_5z 3 5 0.001 1.0 5m_vs_6m 3 5 0.001 1.0 MMM2 3 5 0.001 0.1 -
[1] Cao Yongcan, Yu Wenwu, Ren Wei, et al. An overview of recent progress in the study of distributed multi-agent coordination[J]. IEEE Transactions on Industrial Informatics, 2012, 9(1): 427−438
[2] Wang Xiaoqiang, Ke Liangjun, Qiao Zhimin, et al. Large-scale traffic signal control using a novel multiagent reinforcement learning[J]. IEEE Transactions on Cybernetics, 2020, 51(1): 174−187
[3] 高涵,罗娟,蔡乾娅,等. 一种基于异步决策的智能交通信号协调方法[J]. 计算机研究与发展,2023,60(12):2797−2805 doi: 10.7544/issn1000-1239.202220773 Gao Han, Luo Juan, Cai Qianya, et al. An intelligent traffic signal coordination method based on asynchronous decision-making[J]. Journal of Computer Research and Development, 2023, 60(12): 2797−2805 (in Chinese) doi: 10.7544/issn1000-1239.202220773
[4] 郑莹莹,周俊龙,申钰凡,等. 时间和能量敏感的端—边—云车路协同系统资源调度优化方法[J]. 计算机研究与发展,2023,60(5):1037−1052 Zheng Yingying, Zhou Junlong, Shen Yufan, et al. Time and energy-sensitive end-edge-cloud resource provisioning optimization method for collaborative vehicle-road systems[J]. Journal of Computer Research and Development, 2023, 60(5): 1037−1052 (in Chinese)
[5] De Souza C, Newbury R, Cosgun A, et al. Decentralized multi-agent pursuit using deep reinforcement learning[J]. IEEE Robotics and Automation Letters, 2021, 6(3): 4552−4559 doi: 10.1109/LRA.2021.3068952
[6] Zhang Weipeng, Zhang Ning, Yan Junchi, et al. Auto uning of price prediction models for high-frequency trading via reinforcement learning[J]. Pattern Recognition, 2022, 125: 108543 doi: 10.1016/j.patcog.2022.108543
[7] Xu Jing, Zhong Fangwei, Wang Yizhou. Learning multi-agent coordination for enhancing target coverage in directional sensor networks[C/OL]//Advances in Neural Information Processing Systems. 2020[2024-05-10]. https://proceedings.neurips.cc/paper/2020/hash/7250eb93b3c18cc9daa29cf58af7a004-Abstract.html
[8] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529−533 doi: 10.1038/nature14236
[9] Simon H A. The architecture of complexity[J]. Proceedings of the American Philosophical Society, 1962, 106(6): 467−482
[10] Sunehag P, Lever G, Gruslys A, et al. Value-decomposition networks for cooperative multi-agent learning based on team reward[C]//Proc of the Int Conf on Autonomous Agents and Multi-Agent Systems. Richland, SC, USA: International Foundation for Autonomous Agents and Multiagent Systems/ACM, 2018: 2085−2087
[11] Rashid T, Samvelyan M, De Witt C S, et al. Monotonic value function factorization for deep multi-agent reinforcement learning[J]. The Journal of Machine Learning Research, 2020, 21(1): 7234−7284
[12] Son K, Kim D, Kang W J, et al. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning[C/OL]//Proc of Int Conf on Machine Learning. New York: PMLR, 2019[2024-05-10]. https://proceedings.mlr.press/v97/son19a
[13] Wang Jianhao, Ren Zhizhou, Liu Terry, et al. QPLEX: Duplex dueling multi-agent q-learning[C/OL]//Proc of Int Conf on Learning Representations. 2021[2024-05-10]. https://openreview.net/forum?id=Rcmk0xxIQV
[14] Lowe R, Wu Y I, Tamar A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Advances in Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2017, 30: 6379−6390
[15] Foerster J, Farquhar G, Afouras T, et al. Counterfactual multi-agent policy gradients[C]//Proc of the AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2018, 32(1): 2974−2982
[16] Wang Yihan, Han Beining, Wang Tonghan, et al. DOP: Off-policy multi-agent decomposed policy gradients[C/OL]//Proc of Int Conf on Learning Representations. 2021[2024-05-10]. https://openreview.net/forum?id=6FqKiVAdI3Y
[17] Peng Bei, Rashid T, Schroeder de Witt C, et al. FACMAC: Factored multi-agent centralised policy gradients[C/OL]//Advances in Neural Information Processing Systems. 2021[2024-05-10]. https://proceedings.neurips.cc/paper/2021/hash/65b9eea6e1cc6bb9f0cd2a47751a186f-Abstract.html
[18] Yu Chao, Velu A, Vinitsky E, et al. The surprising effectiveness of ppo in cooperative multi-agent games[C/OL]//Advances in Neural Information Processing Systems. 2022[2024-05-10]. https://proceedings.neurips.cc/paper_files/paper/2022/hash/9c1535a02f0ce079433344e14d910597-Abstract-Datasets_and_Benchmarks.html
[19] Zhong Yifan, Kuba J G, Feng Xidong, et al. Heterogeneous-agent reinforcement learning[J]. Journal of Machine Learning Research, 2024, 25: 1−67
[20] 丁世飞,杜威,郭丽丽,等. 基于双评论家的多智能体深度确定性策略梯度方法[J]. 计算机研究与发展,2023,60(10):2394−2404 doi: 10.7544/issn1000-1239.2022.20399 Ding Shifei, Du Wei, Guo Lili, et al. Multi-agent deep deterministic policy gradient method based on double critics[J]. Journal of Computer Research and Development, 2023, 60(10): 2394−2404 (in Chinese) doi: 10.7544/issn1000-1239.2022.20399
[21] Guestrin C, Lagoudakis M, Parr R. Coordinated reinforcement learning[C]//Proc of Int Conf on Machine Learning, San Francisco, CA: Morgan Kaufmann Publishers Inc, 2002, 2: 227−234
[22] Zhang Chongjie, Lesser V. Coordinated multi-agent reinforcement learning in networked distributed POMDPs[C]//Proc of the AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2011: 764−770
[23] Zhang Chongjie, Lesser V. Coordinating multi-agent reinforcement learning with limited communication[C]//Proc of the Int Conf on Autonomous Agents and Multi-Agent Systems. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems, 2013: 1101−1108
[24] Das A, Gervet T, Romoff J, et al. Tarmac: Targeted multi-agent communication[C/OL]//Proc of Int Conf on Machine Learning. New York: PMLR, 2019[2024-05-10]. https://proceedings.mlr.press/v97/das19a.html
[25] Kim W, Park J, Sung Y. Communication in multi-agent reinforcement learning: intention sharing[C/OL]//Proc of Int Conf on Learning Representations. 2021[2024-05-10]. https://openreview.net/forum?id=qpsl2dR9twy
[26] Mao Hangyu, Liu Wulong, Hao Jianye, et al. Neighborhood cognition consistent multi-agent reinforcement learning[C]//Proc of the AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI , 2020, 34(5): 7219−7226
[27] Zang Yifan, He Jinmin, Li Kai, et al. Automatic grouping for efficient cooperative multi-agent reinforcement learning[C/OL]//Advances in Neural Information Processing Systems. 2024[2024-05-10]. https://proceedings.neurips.cc/paper_files/paper/2023/hash/906c860f1b7515a8ffec02dcdac74048-Abstract-Conference.html
[28] Liu Weiwei, Peng Linpeng, Wen Licheng, et al. Decomposing shared networks for separate cooperation with multi-agent reinforcement learning[J]. Information Sciences, 2023, 641: 119085 doi: 10.1016/j.ins.2023.119085
[29] Liu Yilin, Luo Guiyang, Yuan Quan, et al. GpLight: Grouped multi-agent reinforcement learning for large-scale traffic signal control[C/OL]//Proc of the Int Joint Conf on Artificial Intelligence. 2023[2024-05-10]. https://www.ijcai.org/proceedings/2023/23
[30] Liu Shanqi, Hu Yujing, Wu Runze, et al. Adaptive value decomposition with greedy marginal contribution computation for cooperative multi-agent reinforcement learning[C]//Proc of the Int Conf on Autonomous Agents and Multi-Agent Systems. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems, 2023: 31−39
[31] Wang Tonghan, Dong Heng, Lesser V, et al. ROMA: Multi-agent reinforcement learning with emergent roles[C/OL]//Proc of the Int Conf on Machine Learning. New York: PMLR, 2020[2024-05-10]. https://proceedings.mlr.press/v119/wang20f
[32] Wang Tonghan, Gupta T, Mahajan A, et al, et al. RODE: Learning roles to decompose multi-agent tasks[C/OL]//Proc of the Int Conf on Learning Representations. 2021[2024-05-10]. https://openreview.net/forum?id=TTUVg6vkNjK
[33] Yang Mingyu, Zhao Jian, Hu Xunhan, et al. LDSA: Learning dynamic subtask assignment in cooperative multi-agent reinforcement learning[C/OL]//Advances in Neural Information Processing Systems. 2022[2024-05-10]. https://proceedings.neurips.cc/paper_files/paper/2022/hash/0b4145b562cc22fb7fa50a2cd17c191d-Abstract-Conference.html
[34] Iqbal S, Costales R, Sha F. ALMA: Hierarchical learning for composite multi-agent tasks[C/OL]//Advances in Neural Information Processing Systems. 2022[2024-05-10]. https://proceedings.neurips.cc/paper_files/paper/2022/hash/2f27964513a28d034530bfdd117ea31d-Abstract-Conference.html
[35] Yuan Lei, Wang Chenghe, Wang Jianhao, et al. Multi-agent concentrative coordination with decentralized task representation[C/OL]//Proc of the Int Joint Conf on Artificial Intelligence. 2022[2024-05-10]. https://www.ijcai.org/proceedings/2022/85
[36] Li Chenghao, Wang Tonghan, Wu Chengjie, et al. Celebrating diversity with subtask specialization in shared multiagent reinforcement learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 1−15
[37] Samvelyan M, Rashid T, De Witt C S, et al. The starcraft multi-agent challenge[C]//Proc of the Int Conf on Autonomous Agents and Multi-Agent Systems. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems, 2019: 2186−2188
[38] Wang Tonghan, Wang Jianhao, Wu Yi, et al. Influence-based multi-agent exploration[C/OL]//Proc of the Int Conf on Learning Representations. 2020[2024-05-10]. https://openreview.net/forum?id=BJgy96EYvr
[39] Mahajan A, Rashid T, Samvelyan M, et al. MAVEN: Multi-agent variational exploration[C/OL]//Advances in Neural Information Processing Systems. 2019[2024-05-10]. https://papers.nips.cc/paper_files/paper/2019/hash/f816dc0acface7498e10496222e9db10-Abstract.html
-
期刊类型引用(7)
1. 张淑芬,张宏扬,任志强,陈学斌. 联邦学习的公平性综述. 计算机应用. 2025(01): 1-14 . 百度学术
2. 朱智韬,司世景,王健宗,程宁,孔令炜,黄章成,肖京. 联邦学习的公平性研究综述. 大数据. 2024(01): 62-85 . 百度学术
3. 李锦辉,吴毓峰,余涛,潘振宁. 数据孤岛下基于联邦学习的用户电价响应刻画及其应用. 电力系统保护与控制. 2024(06): 164-176 . 百度学术
4. 刘新,刘冬兰,付婷,王勇,常英贤,姚洪磊,罗昕,王睿,张昊. 基于联邦学习的时间序列预测算法. 山东大学学报(工学版). 2024(03): 55-63 . 百度学术
5. 赵泽华,梁美玉,薛哲,李昂,张珉. 基于数据质量评估的高效强化联邦学习节点动态采样优化. 智能系统学报. 2024(06): 1552-1561 . 百度学术
6. 杨秀清,彭长根,刘海,丁红发,汤寒林. 基于数据质量评估的公平联邦学习方案. 计算机与数字工程. 2022(06): 1278-1285 . 百度学术
7. 黎志鹏. 高可靠的联邦学习在图神经网络上的聚合方法. 工业控制计算机. 2022(10): 85-87+90 . 百度学术
其他类型引用(10)