• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Luo Yuanyi, Wu Rui, Liu Jiafeng, Tang Xianglong. Multimodal Sentiment Analysis Method for Sentimental Semantic Inconsistency[J]. Journal of Computer Research and Development, 2025, 62(2): 374-382. DOI: 10.7544/issn1000-1239.202330199
Citation: Luo Yuanyi, Wu Rui, Liu Jiafeng, Tang Xianglong. Multimodal Sentiment Analysis Method for Sentimental Semantic Inconsistency[J]. Journal of Computer Research and Development, 2025, 62(2): 374-382. DOI: 10.7544/issn1000-1239.202330199

Multimodal Sentiment Analysis Method for Sentimental Semantic Inconsistency

Funds: This work was supported by the National Natural Science Foundation of China (61672190).
More Information
  • Author Bio:

    Luo Yuanyi: born in 1996. PhD candidate. His main research interest includes multimodal learning

    Wu Rui: born in 1976. PhD, associate professor, PhD supervisor. His main research interests include pattern recognition and multimodal learning

    Liu Jiafeng: born in 1968. PhD, associate professor. His main research interests include pattern recognition and machine learning

    Tang Xianglong: born in 1960. PhD, professor, PhD supervisor. His main research interests include pattern recognition and computer vision

  • Received Date: March 26, 2023
  • Revised Date: January 22, 2024
  • Available Online: December 11, 2024
  • Multimodal sentiment analysis is a multimodal task that uses multiple modalities of subjective information to analyze sentiment. In some scenarios, the sentimental expression in different modalities is inconsistent, even contradictory, which will weaken the effect of multimodal collaborative decision-making. In this paper, a multimodal learning method is proposed to learn the modal feature representations with consistent sentimental semantics. In order to improve the common feature representation of different modalities and learn the dynamic interaction between modalities without affecting the original information, we first learn the common feature representation of each modality, and then use cross attention to enable one modality to effectively obtain auxiliary information from the common feature representations of other modalities. In multimodal fusion, we propose a multimodal attention, which is used to weighted concatenate modal feature representations, in order to increase the expression of contributed modalities and suppress the influence of weak modalities. The experimental results of the proposed method on the sentiment analysis datasets MOSI, MOSEI, CH-SIMS are better than those of the compared models, indicating the necessity and rationality of considering the problem of sentimental semantic inconsistency in multimodal sentiment analysis.

  • [1]
    陈龙,管子玉,何金红,等. 情感分类研究进展[J]. 计算机研究与发展,2017,54(6):1150−1170

    Chen Long, Guan Ziyu, He Jinhong, et al. A survey on sentiment classification[J]. Journal of Computer Research and Development, 2017, 54(6): 1150−1170 (in Chinese)
    [2]
    Yu Wenmeng, Xu Hua, Meng Fanyang, et al. CH-SIMS: A Chinese multimodal sentiment analysis dataset with fine-grained annotation of modality[C]//Proc of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2020: 3718−3727
    [3]
    Tsai Y, Bai S, Liang P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]//Proc of the 57th Int Conf on Association for Computational Linguistics. Stroudsburg, PA: ACL, 2019: 6558−6569
    [4]
    Bertasius G, Wang Heng, Torresani L. Is space-time attention all you need for video understanding[J]. arXiv preprint, arXiv: 2102.05095, 2021
    [5]
    Zellinger W, Grubinger T, Lughofer E, et al. Central moment discrepancy (CMD) for domain-invariant representation learning[J]. arXiv preprint, arXiv: 1702.08811, 2017
    [6]
    Sahay S, Kumar S H, Xia Rui, et al. Multimodal relational tensor network for sentiment and emotion classification[C]//Proc of the 1st Int Conf on Association for Computational Linguistics. Stroudsburg, PA: ACL, 2018: 20−27
    [7]
    Zadeh A, Chen Minghai, Poria S, et al. Tensor fusion network for multimodal sentiment analysis[C]//Proc of the Int Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2017: 1103−1114
    [8]
    Liu Zhun, Shen Ying, Lakshminarasimhan V B, et al. Efficient low-rank multimodal fusion with modality-specific factors[C]//Proc of the Int Conf on Association for Computational Linguistics. Stroudsburg, PA: ACL, 2018: 2247−2256
    [9]
    Hazarika D, Zimmermann R, Poria S. MISA: Modality-invariant and -specific representations for multimodal sentiment analysis[J]. arXiv preprint, arXiv: 2005.03545, 2020
    [10]
    Rahman W, Hasan M K, Lee S, et al. Integrating multimodal information in large pretrained transformers[C]//Proc of the 58th Conf on Association for Computational Linguistics. Stroudsburg, PA: ACL, 2020: 2359−2369
    [11]
    Yu Wenmeng, Xu Hua, Yuan Ziqi, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C]//Proc of the Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2021: 10790−10797
    [12]
    程艳,尧磊波,张光河,等. 基于注意力机制的多通道CNN和BiGRU的文本情感倾向性分析[J]. 计算机研究与发展,2020,57(12):2583−2595

    Cheng Yan, Yao Leibo, Zhang Guanghe, et al. Text sentiment orientation analysis of multi-channels CNN and BiGRU based on attention mechanism[J]. Journal of Computer Research and Development, 2020, 57(12): 2583−2595 (in Chinese)
    [13]
    Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint, arXiv: 1409.0473, 2014
    [14]
    Mnih V, Heess N, Graves A, et al. Recurrent models of visual attention[C]//Proc of the 27th Int Conf on Neural Information Processing Systems. Cambridge, MA: NeurIPS, 2014: 2204–2212
    [15]
    Zhang Zhizheng, Lan Cuiling, Zeng Wenjun, et al. Relation-aware global attention for person re-identification[C]//Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 3183–3192
    [16]
    Xu K, Ba J L, Kiros R, et al. Show, attend and tell: Neural image caption generation with visual attention[C]//Proc of the Int Conf on Machine Learning. New York: ACM, 2015: 2048−2057
    [17]
    Guo Menghao, Xu Tianxing, Liu Jiangjiang, et al. Attention mechanisms in computer vision: A survey[J]. Computational Visual Media, 2022, 8(3): 331−368 doi: 10.1007/s41095-022-0271-y
    [18]
    Li Jianing, Wang Jingdong, Gao Wen, et al. Global–local temporal representations for video person re-identification[C]//Proc of the IEEE Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2019: 3957–3966
    [19]
    Xiang Long, Gan Chuang, Melo G, et al. Multimodal keyless attention fusion for video classification[C]//Proc of the 32nd Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2018: 7202−7209
    [20]
    Ghosal D, Akhtar M S, Chauhan D, et al. Contextual inter-modal attention for multi-modal sentiment analysis[C]//Proc of the 2018 Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2018: 3454−3466
    [21]
    Ye Junjie, Zhou Jie, Tian Junfeng, et al. Sentiment-aware multimodal pre-training for multimodal sentiment analysis[J]. Knowledge-Based Systems, 2022, 258: 110021 doi: 10.1016/j.knosys.2022.110021
    [22]
    Sun Zhongkai, Sarma P, Sethares W, et al. Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis[C]//Proc of the Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 8992−8999
    [23]
    Devlin J, Chang Mingwei, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proc of the Conf on Association for Computational Linguistics. Stroudsburg, PA: ACL, 2019: 4171–4186
    [24]
    Zadeh A, Zellers R, Pincus E, et al. MOSI: Multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[J]. arXiv preprint, arXiv: 1606.06259, 2016
    [25]
    Zadeh A, Liang P P, Vanbriesen J, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C]//Proc of the 56th Conf on Association for Computational Linguistics. Stroudsburg, PA: ACL, 2018: 2236−2246
    [26]
    Sun Hao, Wang Hongyi, Liu Jiaqing, et al. CubeMLP: An MLP-based model for multimodal sentiment analysis and depression estimation[C]//Proc of the 30th ACM Int Conf on Multimedia. New York: ACM, 2022: 3722−3729

Catalog

    Article views (225) PDF downloads (101) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return