Abstract:
Identifying deceptive reviews has important theoretical meaning and practical value. While previous works focus on some heuristic rules or traditional supervised methods. Recent research has shown that humans cannot directly identify deceptive reviews by their prior knowledge. Human-annotated dataset must contain some mislabeled examples. Due to the difficulty of human labeling needed for supervised learning, the problem remains to be highly challenging. There are some ambiguous reviews (we call them spy examples), which are easily mislabeled. The key of identifying deceptive review is how to deal with these spy reviews. Based on some truthful reviews and a large amount of unlabeled reviews, a novel approach, called mixing population and individual nature PU learning, is proposed. Firstly, some reliable negative examples are identified from the unlabeled dataset. Secondly, some representative positive examples and negative examples are generated by integrating latent dirichlet allocation and K-means. Thirdly, all spy examples are clustered into many groups based on dirichlet process mixture model, and two schemes (population nature and individual nature) are mixed to determine the category label of spy examples. Finally, multiple kernel learning is presented to build the final classifier. Experimental results demonstrate that our proposed methods can effectively identify deceptive reviews, and outperform the current baselines.