列存储数据仓库查询执行中重用缓冲区调度算法

张  琦; 王  梅; 乐嘉锦; 刘国华

列存储数据仓库查询执行中重用缓冲区调度算法

Scheduling Algorithm for the Reuse Buffers in Column-Store Data Warehouse Query Execution

摘要

摘要: 查询的中间结果重用是提高查询效率的重要手段.现有列存储系统主要关注多查询计划间的中间结果重用,忽略了单一查询计划执行过程中大量可重复访问的中间结果.单一查询中的中间结果具有确定性高、结果大小可估计的特征,非常适合作为重用的对象.为此,针对列存储数据仓库单一查询计划执行过程中的中间结果重用问题,提出了一个重用缓冲区空间的调度算法.首先,基于操作结点在给定物理执行计划树中的相对位置及其操作所产生的中间结果的大小对操作结点提出重用度估计模型.其次,设计了基于模型估计结果的缓冲区调度算法.在每一个查询计划的执行过程中,根据其模型估计结果执行缓冲区调度算法,使得其产生的中间结果中更重要的部分能够更久地驻留在内存中,以提升查询性能.在数据仓库基准数据集SSB上的实验结果验证了方法的有效性.

Abstract: Reusing intermediates is an important way to improve the performance of query execution. The current column-store systems mainly focus on the reusage of the intermediates in multiple query plans, while large quantities of reusable intermediates in a single query are neglected. The intermediates of a single query are suitable for reusing during the process of execution due to the characteristics of their high certainty and the evaluable amount. To deal with this problem, a novel scheduling algorithm for the reuse buffers is proposed. Firstly, we propose a reusability estimation model based on the relative position of the given operator node in the physical execution tree as well as the estimated volume of the intermediates it produces during execution. Then, we provide the reuse buffer scheduling algorithm based on the results of the reusability estimation model. In the process of query execution for each query plan, the scheduling algorithm is executed on the basis of the results of its reusability estimation model, making the more important intermediates stay longer in the memory than the others, leading the improvement of query performance. The experimental results on the benchmark data set SSB verify the effectiveness of the proposed method.

HTML全文

参考文献(0)

施引文献

资源附件(0)