Citation: | Hao Shaopu, Liu Quan, Xu Ping’an, Zhang Lihua, Huang Zhigang. Multi-Modal Imitation Learning Method with Cosine Similarity[J]. Journal of Computer Research and Development, 2023, 60(6): 1358-1372. DOI: 10.7544/issn1000-1239.202220119 |
Generative adversarial imitation learning is an inverse reinforcement learning (IRL) method based on generative adversarial framework to imitate expert policies from expert demonstrations. In practical tasks, expert demonstrations are often generated from multi-modal policies. However, most of the existing generative adversarial imitation learning (GAIL) methods assume that the expert demonstrations are generated from a single modal policy, which leads to the mode collapse problem where the generative adversarial imitation learning can only partially learn the modal policies. Therefore, the application of the method is greatly limited for multi-modal tasks. To address the mode collapse problem, we propose the multi-modal imitation learning method with cosine similarity (MCS-GAIL). The method introduces an encoder and a policy’s group, extracts the modal features of the expert demonstrations by the encoder, calculates the cosine similarity of the features between the sample of policy sampling and the expert demonstrations, and adds them to the loss function of the policy’s group to help the policy’s group learn the expert policies of the corresponding modalities. In addition, MCS-GAIL uses a new min-max game formulation for the policy’s group to learn different modal policies in a complementary way. Under the assumptions, we prove the convergence of MCS-GAIL by theoretical analysis. To verify the effectiveness of the method, MCS-GAIL is implemented on the Grid World and MuJoCo platforms and compared with the existing mode collapse methods. The experimental results show that MCS-GAIL can effectively learn multiple modal policies in all environments with high accuracy and stability.
[1] |
Osa T, Pajarinen J, Neumann G, et al. An algorithmic perspective on imitation learning[J]. Foundations and Trends in Robotics, 2018, 7(1/2): 1−179
|
[2] |
张凯峰,俞扬. 基于逆强化学习的示教学习方法综述[J]. 计算机研究与发展,2019,56(2):254−261 doi: 10.7544/issn1000-1239.2019.20170578
Zhang Kaifeng, Yu Yang. Methodologies for imitation learning via inverse reinforcement learning: A review[J]. Journal of Computer Research and Development, 2019, 56(2): 254−261 (in Chinese) doi: 10.7544/issn1000-1239.2019.20170578
|
[3] |
Maeda G J, Neumann G, Ewerton M, et al. Probabilistic movement primitives for coordination of multiple human–robot collaborative tasks[J]. Autonomous Robots, 2017, 41(3): 593−612 doi: 10.1007/s10514-016-9556-2
|
[4] |
Ng A Y, Russell S. Algorithms for inverse reinforcement learning [C] //Proc of the 17th Int Conf on Machine Learning. San Francisco, CA: Morgan Kaufmann, 2000: 663−670
|
[5] |
Arora S, Doshi P. A survey of inverse reinforcement learning: Challenges, methods and progress[J]. Artificial Intelligence, 2021, 297: 103500 doi: 10.1016/j.artint.2021.103500
|
[6] |
Ho J, Ermon S. Generative adversarial imitation learning [C] //Advances in Neural Information Processing Systems 29. Cambridge, MA: MIT, 2016: 4565−4573
|
[7] |
Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets [C] //Advances in Neural Information Processing Systems 27. Cambridge, MA: MIT, 2014: 2672−2680
|
[8] |
林嘉豪,章宗长,姜冲,等. 基于生成对抗网络的模仿学习综述[J]. 计算机学报,2020,43(2):326−351 doi: 10.11897/SP.J.1016.2020.00326
Lin Jiahao, Zhang Zongzhang, Jiang Chong, et al. A survey of imitation learning based on generative adversarial nets[J]. Chinese Journal of Computers, 2020, 43(2): 326−351 (in Chinese) doi: 10.11897/SP.J.1016.2020.00326
|
[9] |
Zhang Xin, Li Yanhua, Zhang Ziming, et al. f-gail: Learning f-divergence for generative adversarial imitation learning [C] //Advances in Neural Information Processing Systems 33. Cambridge, MA: MIT, 2020: 12805−12815
|
[10] |
Zhang Ming, Wang Yawei, Ma Xiaoteng, et al. Wasserstein distance guided adversarial imitation learning with reward shape exploration [C] //Proc of the 9th IEEE Data Driven Control and Learning Systems Conf. Piscataway, NJ: IEEE, 2020: 1165−1170
|
[11] |
Liu Yuchen, Shu Zhixin, Li Yijun, et al. Content-aware GAN compression [C] //Proc of the IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 12156−12166
|
[12] |
Armandpour M, Sadeghian A, Li Chunyuan, et al. Partition-guided gans [C] //Proc of the IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 5099−5109
|
[13] |
Li Wei, Fan Li, Wang Zhenyu, et al. Tackling mode collapse in multi-generator GANs with orthogonal vectors[J]. Pattern Recognition, 2021, 110: 107646 doi: 10.1016/j.patcog.2020.107646
|
[14] |
Merel J, Tassa Y, Tb D, et al. Learning human behaviors from motion capture by adversarial imitation [J]. arXiv preprint, arXiv: 1707.02201, 2017
|
[15] |
Lin Jiahao, Zhang Zongzhang. Acgail: Imitation learning about multiple intentions with auxiliary classifier gans [C] //Proc of the 15th Pacific Rim Int Conf on Artificial Intelligence. Berlin: Springer, 2018: 321−334
|
[16] |
Li Yunzhu, Song Jiaming, Ermon S. Infogail: Interpretable imitation learning from visual demonstrations [C] //Advances in Neural Information Processing Systems 30. Cambridge, MA: MIT, 2017: 3812−3822
|
[17] |
Wang Ziyu, Merel J S, Reed S E, et al. Robust imitation of diverse behaviors [C] //Advances in Neural Information Processing Systems 30. Cambridge, MA: MIT, 2017: 5320−5329
|
[18] |
Larsen A B L, Sønderby S K, Larochelle H, et al. Autoencoding beyond pixels using a learned similarity metric [C] //Proc of the 33rd Int Conf on Machine Learning. New York: ACM, 2016: 1558−1566
|
[19] |
Sutton R S, Barto A G. Reinforcement Learning: An Introduction [M]. Cambridge, MA: MIT Press, 2018
|
[20] |
刘全,翟建伟,章宗长,等. 深度强化学习综述[J]. 计算机学报,2018,41(1):1−27 doi: 10.11897/SP.J.1016.2019.00001
Quan Liu, Zhai Jianwei, Zhang Zongzhang, et al. A survey on deep reinforcement learning[J]. Chinese Journal of Computers, 2018, 41(1): 1−27 (in Chinese) doi: 10.11897/SP.J.1016.2019.00001
|
[21] |
Schulman J, Moritz P, Levine S, et al. High-dimensional continuous control using generalized advantage estimation [J]. arXiv preprint, arXiv: 1506.02438, 2015
|
[22] |
Fei Cong, Wang Bin, Zhuang Yuzheng, et al. Triple-gail: A multi-modal imitation learning framework with generative adversarial nets [C] //Proc of the 29th Int Joint Conf on Artificial Intelligence. San Francisco, CA: Morgan Kaufmann, 2020: 2929−2935
|
[23] |
Sion M. On general minimax theorems[J]. Pacific Journal of Mathematics, 1958, 8(1): 171−176 doi: 10.2140/pjm.1958.8.171
|
[24] |
Todorov E, Erez T, Tassa Y. MuJoCo: A physics engine for model-based control [C] //Proc of the 2012 IEEE/RSJ Int Conf on Intelligent Robots and Systems. Piscataway, NJ: IEEE, 2012: 5026−5033
|
[25] |
Zhu Jjiacheng, Jiang Chong. Tac-gail: A multi-modal imitation learning method [C] //Proc of the 27th Int Conf on Neural Information Processing. Berlin: Springer, 2020: 688−699
|
[26] |
Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor [C] //Proc of the 35th Int Conf on Machine Learning. New York: ACM, 2018: 1861−1870
|
[27] |
Schulman J, Levine S, Abbeel P, et al. Trust region policy optimization [C] //Proc of the 32nd Int Conf on Machine Learning. New York: ACM, 2015: 1889−1897
|
[28] |
谭宏卫,周林勇,王国栋,等. 生成式对抗网络的不稳定性分析及其处理技术[J]. 中国科学:信息科学,2021,51(4):602−617 doi: 10.1360/SSI-2019-0205
Tan Hongwei, Zhou Linyong, Wang Guodong, et al. Instability analysis for generative adversarial networks and its solving techniques[J]. SCIENTIA SINICA Informationis, 2021, 51(4): 602−617 (in Chinese) doi: 10.1360/SSI-2019-0205
|
[1] | Zhang Yuan, Cao Huawei, Zhang Jie, Shen Yue, Sun Yiming, Dun Ming, An Xuejun, Ye Xiaochun. Survey on Key Technologies of Graph Processing Systems Based on Multi-core CPU and GPU Platforms[J]. Journal of Computer Research and Development, 2024, 61(6): 1401-1428. DOI: 10.7544/issn1000-1239.202440073 |
[2] | Zhang Jun, Xie Jingcheng, Shen Fanfan, Tan Hai, Wang Lümeng, He Yanxiang. Performance Optimization of Cache Subsystem in General Purpose Graphics Processing Units: A Survey[J]. Journal of Computer Research and Development, 2020, 57(6): 1191-1207. DOI: 10.7544/issn1000-1239.2020.20200113 |
[3] | Duan Qiong, Tian Bo, Chen Zheng, Wang Jie, He Zengyou. CUDA-TP: A GPU-Based Parallel Algorithm for Top-Down Intact Protein Identification[J]. Journal of Computer Research and Development, 2018, 55(7): 1525-1538. DOI: 10.7544/issn1000-1239.2018.20170080 |
[4] | Feng Jiaying, Zhang Xiaowang, Feng Zhiyong. Parallel Algorithms for RDF Type-Isomorphism on GPU[J]. Journal of Computer Research and Development, 2018, 55(3): 651-661. DOI: 10.7544/issn1000-1239.2018.20160845 |
[5] | Su Huayou, Wen Wen, Li Dongsheng. Optimization and Parallelization Single Particle Cryo-EM Software RELION with GPU[J]. Journal of Computer Research and Development, 2018, 55(2): 409-417. DOI: 10.7544/issn1000-1239.2018.20160873 |
[6] | Zhang Heng, Zhang Libo, WuYanjun. Large-Scale Graph Processing on Multi-GPU Platforms[J]. Journal of Computer Research and Development, 2018, 55(2): 273-288. DOI: 10.7544/issn1000-1239.2018.20170697 |
[7] | Zheng Zhen, Zhai Jidong, Li Yan, Chen Wenguang. Workload Analysis for Typical GPU Programs Using CUPTI Interface[J]. Journal of Computer Research and Development, 2016, 53(6): 1249-1262. DOI: 10.7544/issn1000-1239.2016.20148354 |
[8] | Tang Liang, Luo Zuying, Zhao Guoxing, and Yang Xu. SOR-Based P/G Solving Algorithm of Linear Parallelism for GPU Computing[J]. Journal of Computer Research and Development, 2013, 50(7): 1491-1500. |
[9] | Cai Yong, Li Guangyao, and Wang Hu. Parallel Computing of Central Difference Explicit Finite Element Based on GPU General Computing Platform[J]. Journal of Computer Research and Development, 2013, 50(2): 412-419. |
[10] | Hu Wei and Qin Kaihuai. A New Rendering Technology of GPU-Accelerated Radiosity[J]. Journal of Computer Research and Development, 2005, 42(6): 945-950. |