Cross Face-Voice Matching via Double-Stream Networks and Bi-Quintuple Loss
-
摘要: 面部视觉信息和语音信息是人机交互过程中最为直接和灵活的方式,从而基于智能方式的人脸和语音跨模态感知吸引了国内外研究学者的广泛关注.然而,由于人脸-语音样本的异质性以及语义鸿沟问题,现有方法并不能很好地解决一些难度比较高的跨人脸-语音匹配任务.提出了一种结合双流网络和双向五元组损失的跨人脸-语音特征学习框架,该框架学到的特征可直接用于4种不同的跨人脸-语音匹配任务.首先,在双流深度网络顶端引入一种新的权重共享的多模态加权残差网络,以挖掘人脸和语音模态间的语义关联;接着,设计了一种融合多种样本对构造策略的双向五元组损失,极大地提高了数据利用率和模型的泛化性能;最后,在模型训练中进行ID分类学习,以保证跨模态表示的可分性.实验结果表明,与现有方法相比,能够在4个不同跨人脸-语音匹配任务上取得效果的全面提升,某些评价指标效果提升近5%.Abstract: Facial information and voice cues are the most natural and flexible ways in human-computer interaction, and some recent researchers are now paying more attention to the intelligent cross-modal perception between the face and voice modalities. Nevertheless, most existing methods often fail to perform well on some challenge cross-modal face-voice matching tasks, mainly due to the complex integration of semantic gap and modality heterogeneity. In this paper, we address an efficient cross-modal face-voice matching network by using double-stream networks and bi-quintuple loss, and the derived feature representations can be well utilized to adapt four challenging cross-modal matching tasks between faces and voices. First, we introduce a novel modality-shared multi-modal weighted residual network to model the face-voice association, by embedding it on the top layer of our double-stream network. Then, a bi-quintuple loss is newly proposed to significantly improve the data utilization, while enhancing the generalization ability of network model. Further, we learn to predict identity (ID) of each person during the training process, which can supervise the discriminative feature learning process. As a result, discriminative cross-modal representations can be well learned for different matching tasks. Within four different cross-modal matching tasks, extensive experiments have shown that the proposed approach performs better than the state-of-the-art methods, by a large margin reaching up to 5%.
-
-
期刊类型引用(7)
1. 张佩瑶,付晓东. 防恶意竞价的众包多任务分配激励机制. 计算机应用. 2024(01): 261-268 . 百度学术
2. 刘俊岭,高新宇,孙焕良,许景科. 空间众包中隔离敏感的任务匹配算法. 计算机工程与应用. 2024(17): 252-262 . 百度学术
3. 邓清勇,左清华,李哲涛,王恩,郭斌. 基于区块链的群智感知双向信誉评估隐私保护. 计算机研究与发展. 2024(11): 2681-2692 . 本站查看
4. 黄黎,赵璐,陈嘉豪. 基于能力层次聚类和角色协同的众包任务分配. 计算机工程与设计. 2024(12): 3739-3748 . 百度学术
5. 周静,董红斌,郭田雨. 基于遗传算法的时空众包3类对象任务分配. 应用科技. 2023(06): 7-20 . 百度学术
6. 王珂. 物流货品转运设备集成单元控制技术与应用研究. 中国储运. 2022(07): 195-196 . 百度学术
7. 程维杰,李洪贵,范勇强,彭钰寒,甘戈. 时空众包技术综述. 无线电工程. 2022(08): 1456-1465 . 百度学术
其他类型引用(15)
计量
- 文章访问数: 332
- HTML全文浏览量: 2
- PDF下载量: 228
- 被引次数: 22