• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Liu Xin, Wang Rui, Zhong Bineng, Wang Nannan. Cross Face-Voice Matching via Double-Stream Networks and Bi-Quintuple Loss[J]. Journal of Computer Research and Development, 2022, 59(3): 694-705. DOI: 10.7544/issn1000-1239.20200547
Citation: Liu Xin, Wang Rui, Zhong Bineng, Wang Nannan. Cross Face-Voice Matching via Double-Stream Networks and Bi-Quintuple Loss[J]. Journal of Computer Research and Development, 2022, 59(3): 694-705. DOI: 10.7544/issn1000-1239.20200547

Cross Face-Voice Matching via Double-Stream Networks and Bi-Quintuple Loss

Funds: This work was supported by the National Natural Science Foundation of China (61673185, 61922066, 61972167), the Project of State Key Laboratory of Integrated Services Networks (ISN20-11), the Natural Science Foundation of Fujian Province (2020J01084), and the Zhejiang Laboratory (2021KH0AB01).
More Information
  • Published Date: February 28, 2022
  • Facial information and voice cues are the most natural and flexible ways in human-computer interaction, and some recent researchers are now paying more attention to the intelligent cross-modal perception between the face and voice modalities. Nevertheless, most existing methods often fail to perform well on some challenge cross-modal face-voice matching tasks, mainly due to the complex integration of semantic gap and modality heterogeneity. In this paper, we address an efficient cross-modal face-voice matching network by using double-stream networks and bi-quintuple loss, and the derived feature representations can be well utilized to adapt four challenging cross-modal matching tasks between faces and voices. First, we introduce a novel modality-shared multi-modal weighted residual network to model the face-voice association, by embedding it on the top layer of our double-stream network. Then, a bi-quintuple loss is newly proposed to significantly improve the data utilization, while enhancing the generalization ability of network model. Further, we learn to predict identity (ID) of each person during the training process, which can supervise the discriminative feature learning process. As a result, discriminative cross-modal representations can be well learned for different matching tasks. Within four different cross-modal matching tasks, extensive experiments have shown that the proposed approach performs better than the state-of-the-art methods, by a large margin reaching up to 5%.
  • Cited by

    Periodical cited type(10)

    1. 何雪锋,周洁,陈德光,廖海. 自然语言处理的深度学习模型综述. 计算机应用与软件. 2025(02): 1-19+101 .
    2. 吴欢欢,谢瑞麟,乔塬心,陈翔,崔展齐. 基于可解释性分析的深度神经网络优化方法. 计算机研究与发展. 2024(01): 209-220 . 本站查看
    3. 桂韬,奚志恒,郑锐,刘勤,马若恬,伍婷,包容,张奇. 基于深度学习的自然语言处理鲁棒性研究综述. 计算机学报. 2024(01): 90-112 .
    4. 黄云,董天宇. 电力人工智能指标算法模型多场景鲁棒性评价方法. 吉林大学学报(信息科学版). 2024(01): 162-167 .
    5. 王小萌,张华,丁金扣,王稼慧. 一种随机束搜索文本攻击黑盒算法. 北京邮电大学学报. 2024(02): 24-29 .
    6. 王春东,孙嘉琪,杨文军. 基于矫正理解的中文文本对抗样本生成方法. 计算机工程. 2023(02): 37-45 .
    7. 王浩,唐桥虹,唐娜,郝烨,李澍,孟祥峰,李佳戈. 基于神经网络的心电分类算法抗扰性影响分析. 中国医疗设备. 2023(03): 61-65 .
    8. 刘颖,杨鹏飞,张立军,吴志林,冯元. 前馈神经网络和循环神经网络的鲁棒性验证综述. 软件学报. 2023(07): 3134-3166 .
    9. 吴舟婷,罗森林. 基于随机掩码和对抗训练的文本隐私保护实验. 实验技术与管理. 2023(08): 72-76 .
    10. 金志刚,周峻毅,何晓勇. 面向自然语言处理领域的对抗攻击研究与展望. 信息安全研究. 2022(03): 202-211 .

    Other cited types(17)

Catalog

    Article views (330) PDF downloads (228) Cited by(27)
    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return