ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2019, Vol. 56 ›› Issue (6): 1312-1324.doi: 10.7544/issn1000-1239.2019.20180341

• 人工智能 • 上一篇    下一篇



  1. (广西可信软件重点实验室(桂林电子科技大学) 广西桂林 541004) (
  • 出版日期: 2019-06-01
  • 基金资助: 

A Hierarchical Deep Correlative Fusion Network for Sentiment Classification in Social Media

Cai Guoyong, Lü Guangrui, Xu Zhi   

  1. (Guangxi Key Laboratory of Trusted Software (Guilin University of Electronic Technology), Guilin, Guangxi 541004)
  • Online: 2019-06-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61763007, 66162014), the Natural Science Foundation of Guangxi Province of China (2017JJD160017), and the Project of the Guangxi Key Laboratory of Trusted Software (201503).

摘要: 现有的多数情感分析研究都是基于单一文本或视觉数据,效果还不够理想,多模态数据由于能够提供更丰富的信息,因此多模态情感分析正受到越来越多的关注.社交媒体上视觉数据常常和与之共现的文本数据存在较强的语义关联,因此混合图文的多模态情感分类为社交媒体情感分析提供了新的视角.为了解决图文之间的精细语义配准问题,提出了一种基于层次化深度关联融合网络的多媒体数据情感分类模型.该模型不仅利用图像的中层语义特征,还利用多模态深度多重判别性相关分析来学习最大相关的图像视觉特征表示和文本语义特征表示,而且使形成的视觉特征表示和语义特征表示均具有线性判别性.在此基础上,提出合并图像视觉特征表示和文本语义特征表示的多模态注意力融合网络,以进一步改进情感分类器.最后,在来自于社交网络的真实数据集上的大量实验结果表明,通过层次化捕获视觉情感特征和文本情感特征之间的内部关联,可以更准确地实现对图文融合社交媒体的情感分类预测.

关键词: 社交媒体, 情感分析, 深度关联, 判别性相关分析, 多模态注意力融合

Abstract: Most existing research of sentiment analysis are based on either textual or visual data and can not achieve satisfied results. As multi-modal data can provide richer information, multi-modal sentiment analysis is attracting more and more attentions and has become a hot research topic. Due to the strong semantic correlation between visual data and the co-occurrence textual data in social media, mixed data of texts and images provides a new view to learn better classifier for social media sentiment classification. A hierarchical deep correlative fusion network framework is proposed to jointly learn textual and visual sentiment representations from training samples for sentiment classification. In order to alleviate the problem of fine-grained semantic matching between image and text, both the middle level semantic features of images and the deep multi-modal discriminative correlation analysis are applied to learn the most relevant visual feature representation and semantic feature representation, meanwhile, keeping both the visual and semantic feature representations to be linear discriminable. Motivated by the successful use of attention mechanisms, we further propose a multi-modal attention fusion network by incorporating visual and semantic feature representations to train sentiment classifier. Experiments on the real-world datasets which come from social networks show that, the proposed method gets more accurate prediction on multi-media sentiment analysis by capturing the internal relations between text and image hierarchically.

Key words: social media, sentiment analysis, deep correlation, discriminant correlation analysis, multi-modal attention fusion