Abstract:
In order to efficiently utilize the resources in deep Web, data integration of deep Web emerges as the times require. Data source selection becomes one of the key technologies in data integration of deep Web because it is helpful to improve the efficiency of deep Web integration and the quality of returned results. Most of deep Web data sources are structured and non-cooperative. Recent research findings of non-cooperative structured deep Web selection are divided into two categories, one is based on the discrete keyword retrieval, and the other is based on the character keyword retrieval. As far as I am concerned, there is no data source selection method considering above two type keywords. In this paper, user query keywords are divided into retrieval-type keywords and constraint-type keywords. We use the association feature between subject headings, the association feature between subject heading and feature word, and the association feature between histograms, to construct the hierarchical data source summary. The summary can deal with the hybrid type keyword retrieval, which is made of retrieval-type keywords and constraint-type keywords. The summary can reflect the search intent of retrieval-type keywords and the binding character of constraint-type keywords. Finally, we also give a corresponding data source selection strategy based on above summary. The experiment results show that our method has good performance of record recall ratio and precision.