Abstract:
In many applications, top-k query is an important operation to return a set of interesting points among potentially huge data space. Usually, a multi-dimensional selection is specified in top-k query. It is found that the existing algorithms cannot process the query on massive data efficiently. A sorted-list based algorithm TMS (top-k with multi-dimensional selection) is proposed to process top-k query with selection condition on massive data efficiently. TMS employs selection attribute lattice (SAL) of hierarchical structure to distribute tuples and obtains a horizontal partitioning of the table. Each partition is stored in column-oriented model and the lists of ranking attributes are arranged in descending order of attribute values. Given multi-dimensional selection, the relevant lattice cells are determined by SAL and this reduces the number of the retrieved tuples significantly. Double-sorting operation is devised to perform progressive selection evaluation. The efficient pruning is developed to discard the candidates which do not satisfy selection condition or score requirement. The extensive experimental results show that TMS has a significant performance advantage over the existing algorithms.