Advanced Search
    Liu Xin, Wang Rui, Zhong Bineng, Wang Nannan. Cross Face-Voice Matching via Double-Stream Networks and Bi-Quintuple Loss[J]. Journal of Computer Research and Development, 2022, 59(3): 694-705. DOI: 10.7544/issn1000-1239.20200547
    Citation: Liu Xin, Wang Rui, Zhong Bineng, Wang Nannan. Cross Face-Voice Matching via Double-Stream Networks and Bi-Quintuple Loss[J]. Journal of Computer Research and Development, 2022, 59(3): 694-705. DOI: 10.7544/issn1000-1239.20200547

    Cross Face-Voice Matching via Double-Stream Networks and Bi-Quintuple Loss

    • Facial information and voice cues are the most natural and flexible ways in human-computer interaction, and some recent researchers are now paying more attention to the intelligent cross-modal perception between the face and voice modalities. Nevertheless, most existing methods often fail to perform well on some challenge cross-modal face-voice matching tasks, mainly due to the complex integration of semantic gap and modality heterogeneity. In this paper, we address an efficient cross-modal face-voice matching network by using double-stream networks and bi-quintuple loss, and the derived feature representations can be well utilized to adapt four challenging cross-modal matching tasks between faces and voices. First, we introduce a novel modality-shared multi-modal weighted residual network to model the face-voice association, by embedding it on the top layer of our double-stream network. Then, a bi-quintuple loss is newly proposed to significantly improve the data utilization, while enhancing the generalization ability of network model. Further, we learn to predict identity (ID) of each person during the training process, which can supervise the discriminative feature learning process. As a result, discriminative cross-modal representations can be well learned for different matching tasks. Within four different cross-modal matching tasks, extensive experiments have shown that the proposed approach performs better than the state-of-the-art methods, by a large margin reaching up to 5%.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return