高级检索

    适用于范围查询的列存储数据桶划分算法

    A Column-Store Based Bucket Partition Algorithm for Range Queries

    • 摘要: 范围查询是数据库中一项重要的操作.列存储数据库中,能否有效查找一个范围内的属性值,获取对应的行号集合,将极大影响元组重构的效率.与树型结构相比,Hash表对数据的精确查找具有更高的效率,但是范围查找的效率比较低.针对这种情况,提出了一种改进的可用于范围查询的数据桶划分算法.为了能够更好地对算法进行描述,首先提出了可用于范围查询的Hash存储模型(ranged Hash, RH),并给出了桶的值域和序列化的定义.其次针对列存储等“读优先”特性,在RH模型的基础上,提出一种改进的桶划分算法.该算法生成可序列化的哈希函数把属性值划分到桶中,能够同时提高属性值的范围查询效率和存储效率.最后,通过实验结果验证算法的有效性.

       

      Abstract: Range query is significant to databases. In a column-store database, using range queries on attribute values to obtain the resulting row-id set, would affect the performance of tuple reconstruction. Compared with tree structure, Hash tables are more effective in exact queries but less effective in range queries. With this situation, a bucket partition algorithm for range queries is proposed. Firstly, In order to give a good introduction to the algorithm, a Hash storage model used for range queries (ranged hash, RH) is proposed, along with the definition of the bucket range and the serialization. Then, according to the “read-optimized” feature of column store databases, an improved bucket partition algorithm used for range queries is proposed based on the RH model. The algorithm could generate serializable Hash functions to partition attribute values into buckets, and could improve not only the efficiency of range queries but also the storage efficiency. Finally, the experimental results prove the efficiency of the algorithm.

       

    /

    返回文章
    返回