多视角特征共享的空间对齐跨领域情感分类

贾熹滨; 靳亚; 陈军成

doi:10.7544/issn1000-1239.2018.20170496

多视角特征共享的空间对齐跨领域情感分类

Domain Alignment Based on Multi-Viewpoint Domain-Shared Feature for Cross-Domain Sentiment Classification

摘要

摘要: 大量有效样本标注是有监督学习性能的重要保证，但又存在耗时且人力成本高的问题.加之，在实际应用环境，很难在每个应用领域都有足够的标定样本数据支持分类器的训练.而将源领域所获的训练模型直接用于目标领域，又由于目标领域和源领域信息分布差异，会导致跨领域分类器应用准确率降低的问题.针对以上问题，提出一种基于多视角共享特征的领域空间对齐的跨领域情感分类(domain alignment based on multi-viewpoint domain-shared feature for cross-domain sentiment classification, DAMF)算法.该算法首先通过融合多个情感词典，消除通过互信息值所选择的领域共享特征中情感词的极性分歧问题.在此基础上，以领域间无歧义共享特征为桥梁，结合通过语法规则提取的各领域中有相同极性的情感词对和通过关联规则学习的各领域中有强关联关系的特征词对，进行领域间相同极性的专有情感词对和强关联关系的特征词对的提取，构建目标领域和源领域数据的统一特征表示空间，减小了领域间因极性分歧和特征分布不同造成的差异，实现不同领域空间对齐.同时在公共数据集上的跨领域实验表明，基于多视角共享特征的领域空间对齐跨领域倾向性分析算法一定程度上提高了跨领域情感分类的准确率.

Abstract: Plenty and well labeled training samples are significant foundation to make sure the good performance of supervising learning, whereas there is a problem of high labor-cost and time-consuming in the samples. Furthermore, it is not always feasible to get the plenty of well-labeled sample data in every application to support the classification training. Meanwhile, directly employing the trained model from the source domain to the target domain normally causes the problem of accuracy degradation, due to the information distribution discrepancy between the source domain and the target domain. Aiming to solve the above problems, we propose an algorithm named domain alignment based on multi-viewpoint domain-shared feature for cross-domain sentiment classification (DAMF). Firstly, we fuse three sentiment lexicons to eliminate the polarity divergence of domain-shared feature words that are chosen by mutual information value. On this basis, we extract the word pairs that have the same sentiment polarity in the same domain by utilizing four syntax rules and the word pairs that have strong association relation in the same domain by utilizing association rules algorithm. Then, we use the domain-shared words that have no polarity divergence as a bridge to establish an indirect mapping relationship between domain-specific words in different domains. By constructing the unified feature representation space of different domains, the domain alignment is achieved. Meanwhile, the experiments on four public data sets from Amazon product reviews corpora show the effectiveness of our proposed algorithm on cross-domain sentiment classification.

HTML全文

参考文献(0)

施引文献

资源附件(0)