    王 宇, 孟小峰, 王 珊. 基于直方图的XPath含值谓词路径选择性代价估计[J]. 计算机研究与发展, 2006, 43(2): 288-294.
    引用本文: 王 宇, 孟小峰, 王 珊. 基于直方图的XPath含值谓词路径选择性代价估计[J]. 计算机研究与发展, 2006, 43(2): 288-294.
    Wang Yu, Meng Xiaofeng, Wang Shan. Using Histograms to Estimate the Selectivity of XPath Expression with Value Predicates[J]. Journal of Computer Research and Development, 2006, 43(2): 288-294.
    Citation: Wang Yu, Meng Xiaofeng, Wang Shan. Using Histograms to Estimate the Selectivity of XPath Expression with Value Predicates[J]. Journal of Computer Research and Development, 2006, 43(2): 288-294.


    Using Histograms to Estimate the Selectivity of XPath Expression with Value Predicates

    • 摘要: 路径选择性代价估计是XML查询优化的基础,也是研究的热点.目前的方法采用大量正态分布和独立性分布假设是造成误差的根本原因.定义了一种新颖的值-位置直方图用于统计XML数据中的结构和值的分布情况,并提出了6种直方图运算.在此基础上,给出用直方图计算估计路径中任一结点选择性的方法.实验证明,这种方法无需独立性分布假设,也能在数据结构和数值分布不均匀的情况下,精确地估计路径选择性代价.


      Abstract: Selectivity estimation of path expressions is the basis of XML query optimization and also intense research interest. A path expression may contain multiple branches with value predicates. Some of the values and the nodes of the XML data are highly correlated. Previous methods of selectivity estimation rarely take that relation into consideration, and assume, instead, that the selectivity of attribute values on different nodes and structures is independent and uniform. In this paper, a novel value histogram is proposed, which captures the correlation between the structures and the values in the XML data. Also defined are six operations on the value histograms as well as on the traditional histograms that capture nodes positional distribution. Based on such operations, the selectivity of any node (or branch) in a path expression can be estimated. Experimental results indicate that the method provides accuracy especially in cases where the distribution of the value or structure of the data exhibit a certain correlation without any independent assumption.


