ISSN 1000-1239 CN 11-1777/TP

### An Approach for Reconciling Inconsistent Pairs Based on Factor Graph

Xu Yaoli, Li Zhanhuai, Chen Qun, Wang Yanyan, Fan Fengfeng

1. (School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an 710072) (Key Laboratory of Big Data Storage and Management (Northwestern Polytechnical University), Ministry of Industry and Information Technology, Xi’an 710129)
• Online:2020-01-01
• Supported by:
This work was supported by the National Key Research and Development Program (2018YFB1003403), the National Natural Science Foundation of China (61732014,61672432), and the Natural Science Basic Research Plan in Shaanxi Province of China (2018JM6086).

Abstract: Entity resolution (ER) is a critical and fundamental problem in data integration and data cleaning systems. Although there have been numerous methods proposed for entity resolution, those approaches explicitly or implicitly depend on ad-hoc assumptions or employ different strategies. Given an ER task, there exist many inconsistent pairs due to conflicting results resolved by these approaches. It is of great challenges of reconciling these pairs without any labeled data: 1)without labeled data, it is impractical to estimate the performance of existing approaches and pick out the best; 2)although an optional way is to reconcile these conflicting results for a better and consistent labeling solution, an effective reconciliation mechanism for combining all hints remains to be investigated. To this end, an approach for reconciling inconsistent pairs based on factor graph is proposed. It firstly achieves inconsistent and consistent pairs through conducting existing entity resolution approaches for a given ER task. Secondly, the features that can indicate the matching status of inconsistent pairs, are extracted by leveraging techniques like kernel density estimation and matching information transfer and so on. Then these features are modeled as factor functions of the factor graph, which represents a joint probability distribution with factor weights. Finally, the weight of each factor is estimated based on the maximum likelihood estimation, and the inconsistent pairs are reconciled according to the distribution represented by the factor graph. Experimental results on real-world datasets show our method is effective and can outperform the state-of-the-art approach.

CLC Number: