• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

结合双流网络和双向五元组损失的跨人脸-语音匹配

柳欣, 王锐, 钟必能, 王楠楠

柳欣, 王锐, 钟必能, 王楠楠. 结合双流网络和双向五元组损失的跨人脸-语音匹配[J]. 计算机研究与发展, 2022, 59(3): 694-705. DOI: 10.7544/issn1000-1239.20200547
引用本文: 柳欣, 王锐, 钟必能, 王楠楠. 结合双流网络和双向五元组损失的跨人脸-语音匹配[J]. 计算机研究与发展, 2022, 59(3): 694-705. DOI: 10.7544/issn1000-1239.20200547
Liu Xin, Wang Rui, Zhong Bineng, Wang Nannan. Cross Face-Voice Matching via Double-Stream Networks and Bi-Quintuple Loss[J]. Journal of Computer Research and Development, 2022, 59(3): 694-705. DOI: 10.7544/issn1000-1239.20200547
Citation: Liu Xin, Wang Rui, Zhong Bineng, Wang Nannan. Cross Face-Voice Matching via Double-Stream Networks and Bi-Quintuple Loss[J]. Journal of Computer Research and Development, 2022, 59(3): 694-705. DOI: 10.7544/issn1000-1239.20200547
柳欣, 王锐, 钟必能, 王楠楠. 结合双流网络和双向五元组损失的跨人脸-语音匹配[J]. 计算机研究与发展, 2022, 59(3): 694-705. CSTR: 32373.14.issn1000-1239.20200547
引用本文: 柳欣, 王锐, 钟必能, 王楠楠. 结合双流网络和双向五元组损失的跨人脸-语音匹配[J]. 计算机研究与发展, 2022, 59(3): 694-705. CSTR: 32373.14.issn1000-1239.20200547
Liu Xin, Wang Rui, Zhong Bineng, Wang Nannan. Cross Face-Voice Matching via Double-Stream Networks and Bi-Quintuple Loss[J]. Journal of Computer Research and Development, 2022, 59(3): 694-705. CSTR: 32373.14.issn1000-1239.20200547
Citation: Liu Xin, Wang Rui, Zhong Bineng, Wang Nannan. Cross Face-Voice Matching via Double-Stream Networks and Bi-Quintuple Loss[J]. Journal of Computer Research and Development, 2022, 59(3): 694-705. CSTR: 32373.14.issn1000-1239.20200547

结合双流网络和双向五元组损失的跨人脸-语音匹配

基金项目: 国家自然科学基金项目(61673185,61922066,61972167);综合业务网理论及关键技术国家重点实验室基金项目(ISN20-11);福建省自然科学基金项目(2020J01084);之江实验室开放课题(2021KH0AB01)
详细信息
  • 中图分类号: TP18; TP391

Cross Face-Voice Matching via Double-Stream Networks and Bi-Quintuple Loss

Funds: This work was supported by the National Natural Science Foundation of China (61673185, 61922066, 61972167), the Project of State Key Laboratory of Integrated Services Networks (ISN20-11), the Natural Science Foundation of Fujian Province (2020J01084), and the Zhejiang Laboratory (2021KH0AB01).
  • 摘要: 面部视觉信息和语音信息是人机交互过程中最为直接和灵活的方式,从而基于智能方式的人脸和语音跨模态感知吸引了国内外研究学者的广泛关注.然而,由于人脸-语音样本的异质性以及语义鸿沟问题,现有方法并不能很好地解决一些难度比较高的跨人脸-语音匹配任务.提出了一种结合双流网络和双向五元组损失的跨人脸-语音特征学习框架,该框架学到的特征可直接用于4种不同的跨人脸-语音匹配任务.首先,在双流深度网络顶端引入一种新的权重共享的多模态加权残差网络,以挖掘人脸和语音模态间的语义关联;接着,设计了一种融合多种样本对构造策略的双向五元组损失,极大地提高了数据利用率和模型的泛化性能;最后,在模型训练中进行ID分类学习,以保证跨模态表示的可分性.实验结果表明,与现有方法相比,能够在4个不同跨人脸-语音匹配任务上取得效果的全面提升,某些评价指标效果提升近5%.
    Abstract: Facial information and voice cues are the most natural and flexible ways in human-computer interaction, and some recent researchers are now paying more attention to the intelligent cross-modal perception between the face and voice modalities. Nevertheless, most existing methods often fail to perform well on some challenge cross-modal face-voice matching tasks, mainly due to the complex integration of semantic gap and modality heterogeneity. In this paper, we address an efficient cross-modal face-voice matching network by using double-stream networks and bi-quintuple loss, and the derived feature representations can be well utilized to adapt four challenging cross-modal matching tasks between faces and voices. First, we introduce a novel modality-shared multi-modal weighted residual network to model the face-voice association, by embedding it on the top layer of our double-stream network. Then, a bi-quintuple loss is newly proposed to significantly improve the data utilization, while enhancing the generalization ability of network model. Further, we learn to predict identity (ID) of each person during the training process, which can supervise the discriminative feature learning process. As a result, discriminative cross-modal representations can be well learned for different matching tasks. Within four different cross-modal matching tasks, extensive experiments have shown that the proposed approach performs better than the state-of-the-art methods, by a large margin reaching up to 5%.
  • 期刊类型引用(3)

    1. 张令,马梦丹. 基于时间序列分析法的地震智能预警系统. 微型电脑应用. 2025(02): 206-209 . 百度学术
    2. 张付林,汪可可,张华,毛照平. 城市轨道交通应急演练决策模型可靠性分析. 计算机仿真. 2025(04): 198-201+385 . 百度学术
    3. 张嘉慧,陈智明,黄科,王晓琪,李子龙. 基于数据流聚类的多任务并行数据控制方法. 信息技术. 2024(03): 128-133 . 百度学术

    其他类型引用(7)

计量
  • 文章访问数:  332
  • HTML全文浏览量:  2
  • PDF下载量:  228
  • 被引次数: 10
出版历程
  • 发布日期:  2022-02-28

目录

    /

    返回文章
    返回