Abstract:
Firstly, formal definitions of XML tree data model, element extension, name path, etc are given in context environment. Secondly, an improved index structure, including type index set, name index set and extension index, is proposed to retrieve XML data based on the idea of the numbering scheme, the path index and name extension. The index structure solves the problem of the poor performance of XML query based on conventional index techniques. It can not only quickly determine ancestor/descendant relationships by supporting the structural join algorithm, but also quickly determine parent/child relationships by supporting the path join algorithm based on name extension, and meanwhile effectively process twig query including holding relationships. Finally, an extension join algorithm based on this index structure is proposed to process XPath path query efficiently, in which comparisons and analyses among the different processing approaches for complicated XPath path expressions with parent/child and holding relationships are conducted. For an XPath absolute path query with n nodes, the extension join is needed to execute for n/2-1 times at most by avoiding scanning each node which does not participate in the join via extension index based on structure information, etc. Experimental results show that the new index structure can effectively enhance the query performance for XML data.