Abstract:
The optimization of join strategies between columns is an important problem in column store based queries. Current column-oriented systems simplify the join strategy by changing storage structure, making the join strategy lack query optimization, which can not achieve the satisfied performance. On the basis of these problems, this paper presents a new join strategy optimization method with cost-based and rule-based method. Firstly, we use the rule-based optimization (RBO), setting the optimization rules to remove those candidate plans with too much cost. Then we design the cost-based optimization (CBO). We change the execution order by Huffman tree and left-deep tree principle. Then we summarize the execution strategies of each join node in the column-oriented query plan into pipeline strategy and parallel strategy. Based on that, a cost model is then proposed to select the better strategy. With small time and space complexity, the efficiency of the query execution in column-oriented systems is improved by focusing on estimating the cost of the pipeline and parallel strategies in this paper. The experimental results on the large-scale data warehouse benchmark data sets SSB verify the effectiveness of the proposed method.