ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2016, Vol. 53 ›› Issue (12): 2858-2866.doi: 10.7544/issn1000-1239.2016.20150614

Previous Articles     Next Articles

MTruths:An Approach of Multiple Truths Finding from Web Information

Ma Ruxia1,2, Meng Xiaofeng1, Wang Lu1, Shi Yingjie3   

  1. 1(School of Information, Renmin University of China, Beijing 100872); 2(Department of Education Technology, Capital Normal University, Beijing 100048); 3(School of Information Engineering, Beijing Institute of Fashion Technology, Beijing 100029)
  • Online:2016-12-01

Abstract: Web has been a massive information repository on which information is scattered in different data sources. It is common that different data sources provide conflicting information for the same entity. It is called the truth finding problem that how to find the truths from conflicting information. According to the number of attribute values, object attributes can be divided into two categories: single-valued attributes and multiple-valued attributes. Most of existing truth finding work is designed for truth finding on single-valued attributes. In this paper, a method called MTruths is proposed to resolve truth finding problem for multiple-valued attributes. We model the problem using an optimization problem. The objective is to maximize the total weight similarity between the truths and observations provided by data sources. In truth finding process, two methods are proposed to find the optimal solution: an enumeration algorithm and a greedy algorithm. Experiments on two real data sets show that the correctness of our approache and the efficiency of the greedy algorithm outperform the existing state-of-the-art techniques.

Key words: truth finding, data conflicting, single-valued attributes, multi-valued attributes, quality of data sources

CLC Number: