高级检索

    TMS:一种新的海量数据多维选择Top-k查询算法

    TMS: An Novel Algorithm for Top-k Queries with Multi-Dimensional Selections on Massive Data

    • 摘要: 在许多应用中,Top-k是一种十分重要的查询类型,它在潜在的巨大数据空间中返回用户感兴趣的少量数据.Top-k查询通常具有指定的多维选择条件.分析发现:现有算法无法有效处理海量数据的多维选择Top-k查询.提出了一个基于有序列表的TMS(top-k with multi-dimensional selection)算法,有效计算海量数据上的具有多维选择的Top-k结果.TMS算法利用层次化结构的选择属性网格对原数据表执行水平划分,每一个分片的元组以面向列的模式存储,并且度量属性的列表根据其属性值降序排列.给定多维选择条件,TMS算法利用选择属性网格确定相关网格单元,有效减少需要读取的元组数量,提出双排序方法执行多维选择的渐进评价,并提出有效剪切操作来剪切不满足多维选择条件和分数要求的候选元组.实验结果表明:TMS算法性能优于现有算法.

       

      Abstract: In many applications, top-k query is an important operation to return a set of interesting points among potentially huge data space. Usually, a multi-dimensional selection is specified in top-k query. It is found that the existing algorithms cannot process the query on massive data efficiently. A sorted-list based algorithm TMS (top-k with multi-dimensional selection) is proposed to process top-k query with selection condition on massive data efficiently. TMS employs selection attribute lattice (SAL) of hierarchical structure to distribute tuples and obtains a horizontal partitioning of the table. Each partition is stored in column-oriented model and the lists of ranking attributes are arranged in descending order of attribute values. Given multi-dimensional selection, the relevant lattice cells are determined by SAL and this reduces the number of the retrieved tuples significantly. Double-sorting operation is devised to perform progressive selection evaluation. The efficient pruning is developed to discard the candidates which do not satisfy selection condition or score requirement. The extensive experimental results show that TMS has a significant performance advantage over the existing algorithms.

       

    /

    返回文章
    返回