Abstract:
Nowadays, more and more people express their opinions on products, books, movies, etc. at review sites, forums, discussion groups, blogs and so on. Determining the opinion of a given document from Web (that is, opinion analysis) has drawn much attention. To guarantee the accuracy of opinion analysis, many methods for opinion analysis require abundant labeled data. But the labeled data in different domains are very imbalanced. So in recent years, some studies have been conducted to deal with cross-domain opinion analysis problems. However, most of the attempts rely on only the labeled documents or the labeled sentiment words, so this kind of methods fail to uncover the full knowledge between the documents and the sentiment words. This paper proposes an approach for cross-domain opinion analysis based on random-walk model by simultaneously utilizing documents and words from both source domain and target domain. The approach can make full use of the mutual reinforcement between documents and words by fusing four kinds of relationships between documents and words, that is, the relationships between documents, the relationships between words, the relationships between words and documents, and the relationships between documents and words. Experimental results indicate that the proposed algorithm could improve the performance of cross-domain opinion analysis dramatically. The average accuracy of the proposed approach is about 15% higher than traditional classifiers, and about 7% higher than the state-of-the-art method.