高级检索

    列存储系统面向列的连接顺序优化研究

    Column-Oriented Join Order Optimization in Column Store Systems

    • 摘要: 连接操作是影响列存储数据查询效率的重要操作之一.对于列存储系统中的连接操作优化,以往的研究工作大多专注于对数据组织结构的优化以及辅助物理结构的建立上,极少涉及逻辑层特别是早期的连接策略优化.为此,根据列存储数据的特点和分析型查询需求的特征,提出了一种新的列存储连接优化方法.该方法采用提早优化的策略,使用“事实表下推”的优化规则,并在多事实表查询条件下引入浓密树进行连接顺序决策,以较小的时空复杂度获得“最优”的连接执行顺序.使用代价估计模型对提出的连接策略优化方法进行了理论验证.同时,在大规模数据仓库基准数据集SSB上通过实验验证了提早优化机制及下推规则的有效性.

       

      Abstract: Join is one of the most important operations which can largely affect the efficiency of column store based queries. Most work on column-stores is focused on the improving of storage structure and the building of physical auxiliary structures, while the logical plan optimization, especially early join strategy optimization, has seldom been considered. On the basis of this problem, this paper presents a new join strategy optimization method according to the characteristic of column-oriented storage structure and analytical query. We adopt the early optimization strategy in our method and propose a “fact table push-down” rule. In particular, the bushy tree structure will be considered in the multi-fact-table case to receive a “best” join path with small time and space complexity. Then we provide a cost estimation to verify the correctness of the proposed join strategy optimization method. Finally, experimental results on the large-scale data warehouse benchmark data sets SSB also verify the effectiveness of the early optimization strategy and the proposed push-down rule.

       

    /

    返回文章
    返回