面向混合类型关键词查询的非合作结构化深网数据源选择

万常选  邓  松  刘德喜  江腾蛟  刘喜平

面向混合类型关键词查询的非合作结构化深网数据源选择

万常选邓松刘德喜江腾蛟刘喜平

Non-Cooperative Structured Deep Web Selection Based on Hybrid Type Keyword Retrieval

Wan Changxuan, Deng Song, Liu Dexi, Jiang Tengjiao, and Liu Xiping

摘要

摘要: 为有效地利用深网中的资源，深网集成应运而生.为了提高深网集成的效率和返回结果的质量，数据源选择成为深网集成的关键技术.深网数据源大多数是结构化和非合作型的.当前已有的非合作结构化深网数据源选择的研究分为2类：一类是面向离散型关键词查询的源选择；另一类是面向字符型关键词查询的源选择，而未见面向混合类型关键词查询的结构化数据源选择的相关研究.基于此，将用户查询关键词分为检索型关键词和约束型关键词，基于主题词与主题词、主题词与特征词和直方图与直方图的关联特征构建了面向检索型、约束型混合关键词查询的层次化数据源摘要，有效地反映了非合作结构化深网数据源选择中检索型关键词的检索意图和约束型关键词的约束相关性，并依据此摘要给出了相应的数据源选择策略.实验结果表明，该方法在面向混合类型关键词查询的非合作结构化深网数据源选择时具有较好的记录召回率及准确率.

Abstract: In order to efficiently utilize the resources in deep Web, data integration of deep Web emerges as the times require. Data source selection becomes one of the key technologies in data integration of deep Web because it is helpful to improve the efficiency of deep Web integration and the quality of returned results. Most of deep Web data sources are structured and non-cooperative. Recent research findings of non-cooperative structured deep Web selection are divided into two categories, one is based on the discrete keyword retrieval, and the other is based on the character keyword retrieval. As far as I am concerned, there is no data source selection method considering above two type keywords. In this paper, user query keywords are divided into retrieval-type keywords and constraint-type keywords. We use the association feature between subject headings, the association feature between subject heading and feature word, and the association feature between histograms, to construct the hierarchical data source summary. The summary can deal with the hybrid type keyword retrieval, which is made of retrieval-type keywords and constraint-type keywords. The summary can reflect the search intent of retrieval-type keywords and the binding character of constraint-type keywords. Finally, we also give a corresponding data source selection strategy based on above summary. The experiment results show that our method has good performance of record recall ratio and precision.

HTML全文

参考文献(0)

施引文献

资源附件(0)