面向智慧民生领域的增量交互式数据集成方法

夏丁; 王亚沙; 赵梓棚; 崔达

doi:10.7544/issn1000-1239.2017.20151048

面向智慧民生领域的增量交互式数据集成方法

Incremental and Interactive Data Integration Approach for Hierarchical Data in Domain of Intelligent Livelihood

摘要

摘要: 智慧民生作为智慧城市的重点领域，包含众多应用系统，积累了大量层次结构数据.为了形成城市范围完整数据集，需要集成并统一异构的数据模式，向用户提供统一的数据视图.针对智慧民生领域的领域知识宽泛、缺乏中文语义匹配支持、模式数量众多、元素标签缺失但实例数据丰富等几方面特点，提出了一种增量交互式模式集成方法.该方法采用增量迭代的方式逐步完成多模式集成任务，大幅降低集成计算量；在模式匹配阶段，综合利用模式信息和实例数据构造了多种适用于中文且能力互补的匹配器，并通过相似度熵来度量机器的决策置信度，适度引入人工干预；在中介模式生成阶段，处理模式间可能出现的各种冲突，最终输出全局统一的中介模式.利用从互联网爬取的多源二手房数据设计并完成实验，实验结果表明:此方法在人工干预程度足够小的前提下，具有较好的模式匹配准确性.

Abstract: Intelligent livelihood is an important domain of the smart city. In this domain, there are many application systems that have accumulated a large number of multi-source hierarchical data. In order to form an overall and unified view of the multi-source data in the whole city, variant data schemas should be integrated. Considering the distinct characteristics of the data from intelligent livelihood domain, such as lacking support for semantic matching of Chinese labels, numerous quantities of schemas, missing element labels, the existing schema integration approaches are not suitable. Under such circumstances, we propose an incremental and iterative approach which can deduce the massive computation workload due to the big number of schemas. In each iteration, both meta information and instance data are used to create more precise results, and a similarity entropy based criteria is carefully introduced to control the human intervention. Experiments are also conducted based on real data of second-hand housing in Beijing fetched from multiple second-hand Web applications. The results show that our approach can get high matching accuracy with only little human interventions.

HTML全文

参考文献(0)

施引文献

资源附件(0)