ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2018, Vol. 55 ›› Issue (11): 2439-2451.doi: 10.7544/issn1000-1239.2018.20170496

• 人工智能 • 上一篇    下一篇

多视角特征共享的空间对齐跨领域情感分类

贾熹滨1,2,靳亚1,2,陈军成1   

  1. 1(Faculty of Information Technology, Beijing University of Technology, Beijing 100124); 2(Beijing Municipal Key Laboratory of Multimedia and Intelligent Software Technology (Beijing University of Technology), Beijing 100124)
  • 出版日期: 2018-11-01
  • 基金资助: 
    国家重点研发计划项目(2017YFC0803300);国家自然科学基金项目(91546111,91646201,61672071);北京市教委重点项目(KZ201610005009)

Domain Alignment Based on Multi-Viewpoint Domain-Shared Feature for Cross-Domain Sentiment Classification

Jia Xibin1,2, Jin Ya1,2, Chen Juncheng1   

  1. 1(北京工业大学信息学部 北京 100124); 2(多媒体与智能软件技术北京市重点实验室(北京工业大学) 北京 100124) (jiaxibin@bjut.edu.cn)
  • Online: 2018-11-01

摘要: 大量有效样本标注是有监督学习性能的重要保证,但又存在耗时且人力成本高的问题.加之,在实际应用环境,很难在每个应用领域都有足够的标定样本数据支持分类器的训练.而将源领域所获的训练模型直接用于目标领域,又由于目标领域和源领域信息分布差异,会导致跨领域分类器应用准确率降低的问题.针对以上问题,提出一种基于多视角共享特征的领域空间对齐的跨领域情感分类(domain alignment based on multi-viewpoint domain-shared feature for cross-domain sentiment classification, DAMF)算法.该算法首先通过融合多个情感词典,消除通过互信息值所选择的领域共享特征中情感词的极性分歧问题.在此基础上,以领域间无歧义共享特征为桥梁,结合通过语法规则提取的各领域中有相同极性的情感词对和通过关联规则学习的各领域中有强关联关系的特征词对,进行领域间相同极性的专有情感词对和强关联关系的特征词对的提取,构建目标领域和源领域数据的统一特征表示空间,减小了领域间因极性分歧和特征分布不同造成的差异,实现不同领域空间对齐.同时在公共数据集上的跨领域实验表明,基于多视角共享特征的领域空间对齐跨领域倾向性分析算法一定程度上提高了跨领域情感分类的准确率.

关键词: 情感分类, 跨领域, 极性分歧, 关联规则, 统一特征表示空间, 领域空间对齐

Abstract: Plenty and well labeled training samples are significant foundation to make sure the good performance of supervising learning, whereas there is a problem of high labor-cost and time-consuming in the samples. Furthermore, it is not always feasible to get the plenty of well-labeled sample data in every application to support the classification training. Meanwhile, directly employing the trained model from the source domain to the target domain normally causes the problem of accuracy degradation, due to the information distribution discrepancy between the source domain and the target domain. Aiming to solve the above problems, we propose an algorithm named domain alignment based on multi-viewpoint domain-shared feature for cross-domain sentiment classification (DAMF). Firstly, we fuse three sentiment lexicons to eliminate the polarity divergence of domain-shared feature words that are chosen by mutual information value. On this basis, we extract the word pairs that have the same sentiment polarity in the same domain by utilizing four syntax rules and the word pairs that have strong association relation in the same domain by utilizing association rules algorithm. Then, we use the domain-shared words that have no polarity divergence as a bridge to establish an indirect mapping relationship between domain-specific words in different domains. By constructing the unified feature representation space of different domains, the domain alignment is achieved. Meanwhile, the experiments on four public data sets from Amazon product reviews corpora show the effectiveness of our proposed algorithm on cross-domain sentiment classification.

Key words: sentiment classification, cross-domain, polarity divergence, association rules, unified feature representation space, domain space alignment

中图分类号: