ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (2): 362-376.doi: 10.7544/issn1000-1239.2015.20140254

所属专题: 2015大数据管理

• 软件技术 • 上一篇    下一篇

基于GPU加速的超精简型编码数据库系统

骆歆远,陈刚,伍赛   

  1. (浙江大学计算机学院 杭州 310027) (wisp@zju.edu.cn)
  • 出版日期: 2015-02-01
  • 基金资助: 
    基金项目:国家科技支撑计划基金项目(2013BAG06B01);国家“八六三”高技术研究发展计划基金项目(SS2013AA040601);国家自然科学基金项目(61472348)

A GPU-Accelerated Highly Compact and Encoding Based Database System

Luo Xinyuan, Chen Gang, Wu Sai   

  1. (College of Computer Science, Zhejiang University, Hangzhou 310027)
  • Online: 2015-02-01

摘要: 在数据爆发式增长的今天,特别是通信、金融、互联网等领域产生的大规模数据,在存储和查询方面给业界带来了前所未有的压力.在这种背景下,当前的数据库和数据仓库系统通过对数据进行压缩编码,在节约空间的同时减少了数据表查询时所需的I/O,获得性能上的提升,但大部分系统在面对实际大规模企业数据应用时依然无法在压缩比、导入时间或查询性能上完全满足企业需求.通过基于一定的规则对数据重新进行编码和精简,实现了一种新型超精简型编码的数据库系统HEGA-STORE.采用行列混合存储的架构;提出基于列内和列间规则挖掘和编码的数据导入存储计划;同时在规则挖掘和编码中使用GPU作为协处理器并行处理算法从而提高效率.通过开发编解码原型系统,对大规模网易易信通信记录数据和网易后台日志数据的导入和查询分别进行了测试,并与其他压缩编码算法和数据库、数据仓库产品进行比较.对比实验结果表明,相比同类数据库和数据仓库产品,原型系统拥有极高的压缩比,并且在导入速度和全表扫描查询速度也处于领先地位,同时使用GPU和CPU协作进行数据处理时也能进一步提高系统性能,验证了提出的超精简型编码数据库系统的实际应用价值.

关键词: 数据库系统, 行列混合存储, 编码, 规则挖掘, GPU, CUDA

Abstract: In the big data era, business applications generate huge volumes of data, making it extremely challenging to store and manage those data. One possible solution adopted in previous database systems is to employ some types of encoding techniques, which can effectively reduce the size of data and consequential improve the query performance. However, existing encoding approaches still cannot make a good tradeoff between the compression ratio, importing time and query performance. In this paper, to address the problem, we propose a new encoding-based database system, HEGA-STORE, which adopts the hybrid row-oriented and column-oriented storage model. In HEGA-STORE, we design a GPU-assistant encoding scheme by combining the rule-based encoding and conventional compression algorithms. By exploiting the computation power of GPU, we efficiently improve the performance of encoding and decoding algorithms. To evaluate the performance of HEGA-STORE, it is deployed in Netease to support log analysis. We compare HEGA-STORE with other database systems and the results show that HEGA-STORE can provide better performance for data import and query processing. It is a much compact encoding database for big data applications.

Key words: database system, hybrid row-column storage, encoding, rule mining, GPU, CUDA

中图分类号: