ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2018, Vol. 55 ›› Issue (5): 945-957.doi: 10.7544/issn1000-1239.2018.20170049

• 人工智能 • 上一篇    下一篇

基于多通道卷积神经网络的中文微博情感分析

陈珂1,梁斌2,柯文德1,许波1,曾国超1   

  1. 1(广东石油化工学院计算机科学与技术系 广东茂名 525000); 2(苏州大学计算机科学与技术学院 江苏苏州 215000) (chenke2001@163.com)
  • 出版日期: 2018-05-01
  • 基金资助: 
    国家自然科学基金项目(61272382,61672174);广东省自然科学基金项目(2016A030307049,2016A030307028);广东省科技计划项目(2014A010104016,2015B090903084)

Chinese Micro-Blog Sentiment Analysis Based on Multi-Channels Convolutional Neural Networks

Chen Ke1, Liang Bin2, Ke Wende1, Xu Bo1,Zeng Guochao1   

  1. 1(Department of Computer Science and Technology, Guangdong University of Petrochemical Technology, Maoming, Guangdong 525000); 2(School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215000)
  • Online: 2018-05-01

摘要: 近年来,深度学习在情感分析任务中的应用得到了越来越多的关注.针对以文本词向量作为输入的卷积神经网络无法充分利用情感分析任务中特有的情感特征信息,以及难以有效表示每个词语在句子中的重要程度等问题,提出一种基于多通道卷积神经网络(multi-channels convolutional neural networks, MCCNN)的中文微博情感分析模型.该模型针对情感分析任务中特有的情感信息来构建文本输入矩阵,使模型在训练过程中有效获取输入句子的情感特征信息.同时,该模型通过将不同特征信息结合形成不同的网络输入通道,使网络模型在训练过程中从多方面的特征表示来学习输入句子的情感信息,有效表示出每个词语在句子中的重要程度,获取更多的隐藏信息.最后在COAE2014数据集和微博语料数据上进行实验,取得了比普通卷积神经网络、结合情感信息的卷积神经网络和传统分类器更好的性能.

关键词: 情感分析, 深度学习, 卷积神经网络, 多通道, 自然语言处理

Abstract: Neural network-based architectures have been pervasively applied to sentiment analysis and achieved great success in recent years. However, most previous approaches usually classified with word feature only, which ignoring some characteristic features on the task of sentiment classification. One of the remaining challenges is to leverage the sentiment resources effectively because of the lack of length of Chinese micro-blog texts. To address this problem, we propose a novel sentiment classification method for Chinese micro-blog sentiment analysis based on multi-channels convolutional neural networks (MCCNN) to capture the characteristic information in micro-blog texts. With the help of the part of speech vector, the model could promote the full use of sentiment features through different part of speech tagging. Meanwhile, the position vector helps the model indicate the degree of importance of every word in the sentence, which impels the model to focus on the important words in the training process. Afterwards, a multi-channels architecture based on convolutional neural networks will be used to learn more feature information of micro-blog texts, and extract more hidden information through combining different vectors and original word embedding. Finally, the experiments on COAE2014 dataset and micro-blog dataset reveal better performance than the current main stream convolutional neural networks and traditional classifier.

Key words: sentiment analysis, deep learning, convolutional neural networks (CNN), multi-channels, natural language processing

中图分类号: