Abstract:
Range query is significant to databases. In a column-store database, using range queries on attribute values to obtain the resulting row-id set, would affect the performance of tuple reconstruction. Compared with tree structure, Hash tables are more effective in exact queries but less effective in range queries. With this situation, a bucket partition algorithm for range queries is proposed. Firstly, In order to give a good introduction to the algorithm, a Hash storage model used for range queries (ranged hash, RH) is proposed, along with the definition of the bucket range and the serialization. Then, according to the “read-optimized” feature of column store databases, an improved bucket partition algorithm used for range queries is proposed based on the RH model. The algorithm could generate serializable Hash functions to partition attribute values into buckets, and could improve not only the efficiency of range queries but also the storage efficiency. Finally, the experimental results prove the efficiency of the algorithm.