高级检索
    王洪强 李建中 王宏志. 基于F&B索引的XML查询处理算法[J]. 计算机研究与发展, 2010, 47(5): 866-877.
    引用本文: 王洪强 李建中 王宏志. 基于F&B索引的XML查询处理算法[J]. 计算机研究与发展, 2010, 47(5): 866-877.
    Wang Hongqiang, Li Jianzhong, and Wang Hongzhi. Processing XPath over F&B-Index[J]. Journal of Computer Research and Development, 2010, 47(5): 866-877.
    Citation: Wang Hongqiang, Li Jianzhong, and Wang Hongzhi. Processing XPath over F&B-Index[J]. Journal of Computer Research and Development, 2010, 47(5): 866-877.

    基于F&B索引的XML查询处理算法

    Processing XPath over F&B-Index

    • 摘要: XML已成为信息交换和表示的标准.对XML数据的查询将返回满足特定约束的XML节点子集.对于大文件的XML数据的查询处理通常分为两步:1.为该XML数据建立一个索引;2.在索引上完成查询处理无需访问源文档.XML索引为查询处理提供了高效的帮助,其中F&B索引是已知的处理分枝查询最小的索引,但快速创建F&B索引和利用F&B索引完成查询处理的算法却很少有人研究.提出了一种素数序列标记法,这种标记法不仅有助于快速地建立F&B索引,更可以高效地完成F&B索引上的查询处理.此外,还给出了F&B索引上的区间标记法与CCPI的创建过程,这两种编码创建过程无需在建立F&B索引后二次创建,仅需与F&B索引创建过程一起对文档使用SAX解析器分析一次即可得到.这样,可以在F&B索引的区间标记法上使用TwigStack算法执行查询处理,在F&B索引的CCPI标记法上使用关联路径连接算法执行查询处理.还给出了基于素数序列标记法的查询处理算法,即素数整除匹配算法,该算法可以高效地判定某节点是否有某分枝子结构.实验表明基于素数序列标记法的F&B索引创建方法比SAM算法快,在多个数据集F&B索引上素数整除匹配算法优于关联路径连接算法和TwigStack算法.

       

      Abstract: XML is widely used as a standard of information exchange and representation. Queries on XML can retrieve a subset of XML data nodes satisfying certain constraints. Queries on large XML data are usually processed in two steps: 1. An index of XML nodes is created; 2. Queries are processed on that index without accessing XML data. XML index provides high efficiency for XML query processing. Particularly, F&B-index is the smallest index that supports twig query processing. However, few researches are proposed on how to efficiently create F&B-Index and how to process queries based on F&B-Index. Proposed in this paper is a new labeling scheme called prime sequence. This labeling scheme helps not only on creating an F&B-Index but also on efficient query processing. With prime sequence labeling, an F&B-index can be created by parsing the XML document only once with a SAX parser. Further, region encoding and CCPI on F&B-Index can be created during the creation of F&B-Index. Thus, TwigStack algorithm and related-path-join algorithm can be used to process queries on created F&B-Index. Also proposed is an efficient algorithm named division match over F&B-Index. The algorithm can efficiently judge relationship between two nodes based on a property of prime sequence labeling. Experiments show that prime sequence labeling provides high efficiency on creating F&B-Index and high efficiency on query processing on different datasets.

       

    /

    返回文章
    返回