并发内存OLAP查询优化技术研究

张延松; 焦敏; 张宇; 王珊

doi:10.7544/issn1000-1239.2016.20150613

并发内存OLAP查询优化技术研究

Concurrent In-Memory OLAP Query Optimization Techniques

摘要

摘要: 基于多核处理器硬件技术和高并发查询负载需求，近年来的研究不仅关注于一次一查询模式的查询优化技术，而且也关注于一次一组模式的查询优化技术.通过将并发查询转换为共享负载，一些低访问延迟的操作，如磁盘I/O、cache访问，可以被多个并发的查询所共享.当前的研究通常基于共享查询操作符，如扫描、连接、谓词处理等，通过生成全局执行计划优化并发查询.对于复杂的分析型负载，如何创建优化的执行计划是一个具有挑战性的问题.在广泛使用的星形模型的基础上提出一种模板OLAP查询执行计划来简化查询执行计划，以达到最大化查询操作符利用率的目标.1)提出了基于代理键的连接索引技术，将传统的基于值探测的连接操作转化为内存数组索引引用(AIR)，使连接操作的CPU效率更高并且支持聚集计算的后物化;2)并发查询的谓词处理简化为cache line敏感的谓词向量，在单次cache line访问中最大化并发查询谓词计算性能;3)通过多核并行实现技术在SSB基准上进行测试.实验结果表明：共享扫描和共享谓词处理能够将并发OLAP查询处理性能提升1倍.

Abstract: Recent researches not only focused on query-at-a-time query optimizations but also focused on group-at-a-time query optimizations due to the multicore hardware architecture support and highly concurrent workload requirements. By grouping concurrent queries into shared workload, some high latency operations, e.g., disk I/O, cache line access, can be shared for multiple queries. The existing approaches commonly lie in sharing query operators such as scan, join or predicate processing, and try to generate an optimized global executing plan for all the queries. For complex analytical workloads, how to generate an optimized shared execution plan is a challenging issue. In this paper, we present a template OLAP execution plan for widely adopted star schema to simplify execution plan for maximizing operator utilization. Firstly, we present a surrogate key oriented join index to transform traditional key probing based join operation to array index referencing (AIR) lookup to make join CPU efficient and support a lazy aggregation. Secondly, the predicate processing of concurrent queries is simplified as cache line conscious predicate vector to maximize concurrent predicate processing within single cache line access. Finally, we evaluate the concurrent template OLAP (on-line analytical processing) processing with multicore parallel implementation under the star schema benchmark(SSB), and the results prove that the shared scan and predicate processing can double the concurrent OLAP query performance.

HTML全文

参考文献(0)

施引文献

资源附件(0)