Abstract:
XML documents involve both contents and structures, and can be retrieved by means of not only content-only (CO) but also content-and-structure (CAS) queries. In this paper, a novel approach for CAS retrieval is proposed. The approach proceeds in three steps: it first decomposes a CAS query into a set of query fragments, and then processes each query fragment. Finally, it combines results on each query fragments. By this approach, on the one hand, the adverse effects of structural vagueness on answer nodes selection can be removed; on the other hand, the effect of structural constraints on scoring is incorporated properly. The features of this approach make it applicable in versatile homogeneous and heterogeneous data environments. To measure the relevance query results to a given CAS query, a novel scoring scheme is presented. In accordance with the query processing approach, the scoring method first computes the scores of a query result with respect to each query fragment, and then combines these partial scores to arrive at an overall score. The proposed scoring method considers the relevance of both contents and structures in the retrieval results, and thus reflects the users query intention and conforms to query semantics. Comprehensive experimental studies demonstrate the effectiveness of the proposed methods.