Advanced Search
    Hao Shaopu, Liu Quan, Xu Ping’an, Zhang Lihua, Huang Zhigang. Multi-Modal Imitation Learning Method with Cosine Similarity[J]. Journal of Computer Research and Development, 2023, 60(6): 1358-1372. DOI: 10.7544/issn1000-1239.202220119
    Citation: Hao Shaopu, Liu Quan, Xu Ping’an, Zhang Lihua, Huang Zhigang. Multi-Modal Imitation Learning Method with Cosine Similarity[J]. Journal of Computer Research and Development, 2023, 60(6): 1358-1372. DOI: 10.7544/issn1000-1239.202220119

    Multi-Modal Imitation Learning Method with Cosine Similarity

    • Generative adversarial imitation learning is an inverse reinforcement learning (IRL) method based on generative adversarial framework to imitate expert policies from expert demonstrations. In practical tasks, expert demonstrations are often generated from multi-modal policies. However, most of the existing generative adversarial imitation learning (GAIL) methods assume that the expert demonstrations are generated from a single modal policy, which leads to the mode collapse problem where the generative adversarial imitation learning can only partially learn the modal policies. Therefore, the application of the method is greatly limited for multi-modal tasks. To address the mode collapse problem, we propose the multi-modal imitation learning method with cosine similarity (MCS-GAIL). The method introduces an encoder and a policy’s group, extracts the modal features of the expert demonstrations by the encoder, calculates the cosine similarity of the features between the sample of policy sampling and the expert demonstrations, and adds them to the loss function of the policy’s group to help the policy’s group learn the expert policies of the corresponding modalities. In addition, MCS-GAIL uses a new min-max game formulation for the policy’s group to learn different modal policies in a complementary way. Under the assumptions, we prove the convergence of MCS-GAIL by theoretical analysis. To verify the effectiveness of the method, MCS-GAIL is implemented on the Grid World and MuJoCo platforms and compared with the existing mode collapse methods. The experimental results show that MCS-GAIL can effectively learn multiple modal policies in all environments with high accuracy and stability.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return