Abstract:
Intelligent livelihood is an important domain of the smart city. In this domain, there are many application systems that have accumulated a large number of multi-source hierarchical data. In order to form an overall and unified view of the multi-source data in the whole city, variant data schemas should be integrated. Considering the distinct characteristics of the data from intelligent livelihood domain, such as lacking support for semantic matching of Chinese labels, numerous quantities of schemas, missing element labels, the existing schema integration approaches are not suitable. Under such circumstances, we propose an incremental and iterative approach which can deduce the massive computation workload due to the big number of schemas. In each iteration, both meta information and instance data are used to create more precise results, and a similarity entropy based criteria is carefully introduced to control the human intervention. Experiments are also conducted based on real data of second-hand housing in Beijing fetched from multiple second-hand Web applications. The results show that our approach can get high matching accuracy with only little human interventions.