Incremental and Interactive Data Integration Approach for Hierarchical Data in Domain of Intelligent Livelihood

Xia Ding1,2, Wang Yasha1,3, Zhao Zipeng1,2, Cui Da1,2   

  1. 1(Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, Beijing 100871); 2(School of Electronics Engineering and Computer Science, Peking University, Beijing 100871); 3(National Engineering & Research Center of Software Engineering (Peking University), Beijing 100871)
  • Online:2017-03-01

Abstract: Intelligent livelihood is an important domain of the smart city. In this domain, there are many application systems that have accumulated a large number of multi-source hierarchical data. In order to form an overall and unified view of the multi-source data in the whole city, variant data schemas should be integrated. Considering the distinct characteristics of the data from intelligent livelihood domain, such as lacking support for semantic matching of Chinese labels, numerous quantities of schemas, missing element labels, the existing schema integration approaches are not suitable. Under such circumstances, we propose an incremental and iterative approach which can deduce the massive computation workload due to the big number of schemas. In each iteration, both meta information and instance data are used to create more precise results, and a similarity entropy based criteria is carefully introduced to control the human intervention. Experiments are also conducted based on real data of second-hand housing in Beijing fetched from multiple second-hand Web applications. The results show that our approach can get high matching accuracy with only little human interventions.

Key words: schema matching, schema integration, data integration, smart city, intelligent livelihood

