高级检索

    列存储数据查询中的连接策略优化方法

    Join Strategy Optimization in Column Storage Based Query

    • 摘要: 列存储数据查询优化的重点是列的连接策略.现有的列存储系统通过存储的改变来简化列的连接,致使列的连接缺少查询优化处理,策略单一且无法满足复杂查询.在剖析现有连接选择策略的基础上,提出一种新的连接策略优化方法,即首先利用基于规则的优化方法为列存储数据查询制定优化规则,过滤不可能产生最优计划的候选计划;然后设计了基于代价的优化算法,根据动态Huffman树和左深连接树原理对查询执行顺序进行改进,进一步减少候选计划的规模;根据列存储数据的特点将候选计划中每个连接节点的执行策略归纳为串行连接和并行连接两类,并在此基础上提出代价估计模型,进而可针对这两种连接策略进行代价估计和策略选择.最后在SSB数据集上通过实验证明了方法在列存储数据查询中的有效性.

       

      Abstract: The optimization of join strategies between columns is an important problem in column store based queries. Current column-oriented systems simplify the join strategy by changing storage structure, making the join strategy lack query optimization, which can not achieve the satisfied performance. On the basis of these problems, this paper presents a new join strategy optimization method with cost-based and rule-based method. Firstly, we use the rule-based optimization (RBO), setting the optimization rules to remove those candidate plans with too much cost. Then we design the cost-based optimization (CBO). We change the execution order by Huffman tree and left-deep tree principle. Then we summarize the execution strategies of each join node in the column-oriented query plan into pipeline strategy and parallel strategy. Based on that, a cost model is then proposed to select the better strategy. With small time and space complexity, the efficiency of the query execution in column-oriented systems is improved by focusing on estimating the cost of the pipeline and parallel strategies in this paper. The experimental results on the large-scale data warehouse benchmark data sets SSB verify the effectiveness of the proposed method.

       

    /

    返回文章
    返回