ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2016, Vol. 53 ›› Issue (12): 2836-2846.doi: 10.7544/issn1000-1239.2016.20150613

• 软件技术 • 上一篇    下一篇

并发内存OLAP查询优化技术研究

张延松1,2,3,焦敏1,2,张宇4,王珊1,2   

  1. 1(数据工程与知识工程教育部重点实验室(中国人民大学) 北京 100872); 2(中国人民大学信息学院 北京 100872); 3(中国调查与数据中心(中国人民大学) 北京 100872); 4(中国气象局国家卫星气象中心 北京 100081) (zhangys_ruc@hotmail.com)
  • 出版日期: 2016-12-01
  • 基金资助: 
    国家“八六三”高技术研究发展计划基金项目(2015AA015307);中国人民大学科学研究基金(中央高校基本科研业务费专项资金资助)项目(16XNLQ02)

Concurrent In-Memory OLAP Query Optimization Techniques

Zhang Yansong1,2,3, Jiao Min1,2, Zhang Yu4, Wang Shan1,2   

  1. 1(Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), Ministry of Education, Beijing 100872); 2(School of Information, Renmin University of China, Beijing 100872); 3(National Survey Research Center (Renmin University of China), Beijing 100872); 4(National Satellite Meteorological Center, China Meteorological Administration, Beijing 100081)
  • Online: 2016-12-01

摘要: 基于多核处理器硬件技术和高并发查询负载需求,近年来的研究不仅关注于一次一查询模式的查询优化技术,而且也关注于一次一组模式的查询优化技术.通过将并发查询转换为共享负载,一些低访问延迟的操作,如磁盘I/O、cache访问,可以被多个并发的查询所共享.当前的研究通常基于共享查询操作符,如扫描、连接、谓词处理等,通过生成全局执行计划优化并发查询.对于复杂的分析型负载,如何创建优化的执行计划是一个具有挑战性的问题.在广泛使用的星形模型的基础上提出一种模板OLAP查询执行计划来简化查询执行计划,以达到最大化查询操作符利用率的目标.1)提出了基于代理键的连接索引技术,将传统的基于值探测的连接操作转化为内存数组索引引用(AIR),使连接操作的CPU效率更高并且支持聚集计算的后物化;2)并发查询的谓词处理简化为cache line敏感的谓词向量,在单次cache line访问中最大化并发查询谓词计算性能;3)通过多核并行实现技术在SSB基准上进行测试.实验结果表明:共享扫描和共享谓词处理能够将并发OLAP查询处理性能提升1倍.

关键词: 并发OLAP查询处理, 数组索引引用, 模板OLAP查询处理, 连接索引, 过滤向量

Abstract: Recent researches not only focused on query-at-a-time query optimizations but also focused on group-at-a-time query optimizations due to the multicore hardware architecture support and highly concurrent workload requirements. By grouping concurrent queries into shared workload, some high latency operations, e.g., disk I/O, cache line access, can be shared for multiple queries. The existing approaches commonly lie in sharing query operators such as scan, join or predicate processing, and try to generate an optimized global executing plan for all the queries. For complex analytical workloads, how to generate an optimized shared execution plan is a challenging issue. In this paper, we present a template OLAP execution plan for widely adopted star schema to simplify execution plan for maximizing operator utilization. Firstly, we present a surrogate key oriented join index to transform traditional key probing based join operation to array index referencing (AIR) lookup to make join CPU efficient and support a lazy aggregation. Secondly, the predicate processing of concurrent queries is simplified as cache line conscious predicate vector to maximize concurrent predicate processing within single cache line access. Finally, we evaluate the concurrent template OLAP (on-line analytical processing) processing with multicore parallel implementation under the star schema benchmark(SSB), and the results prove that the shared scan and predicate processing can double the concurrent OLAP query performance.

Key words: concurrent OLAP query processing, array index referencing (AIR), template OLAP query processing, join index, filtering vector

中图分类号: