基于混合融合专家的多模态情感识别方法

陈海丰; 李溪; 李岩; 李健; 何浪; 蒋冬梅

doi:10.7544/issn1000-1239.202550544

基于混合融合专家的多模态情感识别方法

Multimodal Emotion Recognition Method Based on Mixture of Fusion Experts

摘要

摘要: 多模态情感识别旨在整合不同模态数据以推断情感状态。现有研究通常平等对待各模态或依赖单一模态设计固定策略的融合方法，难以应对模态数据贡献度不均衡的问题。为此，提出一种动态的多模态混合融合方法，基于混合专家模型框架，设计了自适应路由模块和多模态融合专家模块，动态评估各模态的贡献度并选择相应的融合策略。其中，自适应路由机制通过模态间相关性分析动态评估模态贡献度，并为融合专家的选择提供指导，同时引入专家引导损失，进一步优化融合专家选择；多模态融合专家则针对不同模态贡献度组合进行互补融合，并通过引入共享专家以避免全局信息丢失和参数冗余。基于MER2024，CMU-MOSEI，CH-SIMS数据集上的多模态情感识别任务的对比实验表明，相比其他先进的多模态融合方法，所提MoMFE方法在情感识别准确率（Acc-2）、F1分数等核心评估指标上均取得优异的性能。

Abstract: Multimodal emotion recognition aims to integrate data from multiple modalities for accurate inference of emotional states. Existing research either treats all modalities equally or adopts fixed fusion strategies based on a single modality, failing to adequately address the imbalance of modality contributions. To tackle this, we propose a dynamic multimodal fusion method based on the Mixture of Experts (MoE) framework, incorporating an adaptive router module and a multimodal fusion expert module. This method dynamically evaluates the contribution of each modality and selects appropriate fusion strategies accordingly. The adaptive router mechanism dynamically assesses modality contributions through inter-modal correlation analysis and dynamic weighting to guide the selection of fusion experts, while an expert-guided loss function is integrated to further optimize the expert selection process. The multimodal fusion experts perform complementary fusion for different modality contribution combinations and introduce a shared expert to mitigate the loss of global information and reduce parameter redundancy. Comparative experiments on three benchmark datasets (MER2024, CMU-MOSEI, and CH-SIMS) for multimodal emotion recognition demonstrate that the proposed MoMFE method outperforms state-of-the-art (SOTA) multimodal fusion methods in core metrics, including binary emotion recognition accuracy (Acc-2) and F1 score. Notably, the method achieves an average improvement of approximately 2% on the CH-SIMS dataset.

HTML全文

参考文献(47)

施引文献

资源附件(0)