Advanced Search
    Tian Jianwei and Li Shijun. Retrieving Deep Web Data Based on Hierarchy Tree Model[J]. Journal of Computer Research and Development, 2011, 48(1): 94-102.
    Citation: Tian Jianwei and Li Shijun. Retrieving Deep Web Data Based on Hierarchy Tree Model[J]. Journal of Computer Research and Development, 2011, 48(1): 94-102.

    Retrieving Deep Web Data Based on Hierarchy Tree Model

    • While the Web provides a platform for information search and dissemination, massive information is hidden behind in the query restricted Web databases, which makes it difficult to obtain these high-quality data records. The current research on Deep Web search has focused on crawling the Deep Web data via Web interfaces with keywords queries. However, these keywords-based methods have inherent limitations because of the multi-attributes and top-k features of the Deep Web. This poses a great challenge for Web information search and retrieval. To address this problem, we propose an approach for siphoning structured data based on hierarchy tree, which can retrieve all the data non-repeatedly in the hidden databases. Firstly, we model the hidden database as a hierarchy tree. Under this theoretical framework, data retrieving is transformed into a traversing problem in the hierarchy tree. Secondly, we also propose techniques to narrow the query space and obtain the attribute values by sorting the attributes according to the ascending order. Thirdly, we leverage the mutual information to measure the attribute values dependency. Based on the attribute values dependency, we narrow the traversal space by using heuristic rule to guide the traversal process. Finally, we conduct extensive experiments over real Deep Web sites and controll databases to illustrate the coverage and efficiency of our techniques.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return