Abstract:
Plenty and well labeled training samples are significant foundation to make sure the good performance of supervising learning, whereas there is a problem of high labor-cost and time-consuming in the samples. Furthermore, it is not always feasible to get the plenty of well-labeled sample data in every application to support the classification training. Meanwhile, directly employing the trained model from the source domain to the target domain normally causes the problem of accuracy degradation, due to the information distribution discrepancy between the source domain and the target domain. Aiming to solve the above problems, we propose an algorithm named domain alignment based on multi-viewpoint domain-shared feature for cross-domain sentiment classification (DAMF). Firstly, we fuse three sentiment lexicons to eliminate the polarity divergence of domain-shared feature words that are chosen by mutual information value. On this basis, we extract the word pairs that have the same sentiment polarity in the same domain by utilizing four syntax rules and the word pairs that have strong association relation in the same domain by utilizing association rules algorithm. Then, we use the domain-shared words that have no polarity divergence as a bridge to establish an indirect mapping relationship between domain-specific words in different domains. By constructing the unified feature representation space of different domains, the domain alignment is achieved. Meanwhile, the experiments on four public data sets from Amazon product reviews corpora show the effectiveness of our proposed algorithm on cross-domain sentiment classification.