ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2017, Vol. 54 ›› Issue (3): 570-585.doi: 10.7544/issn1000-1239.2017.20150615

• 软件技术 • 上一篇    下一篇

TMS:一种新的海量数据多维选择Top-k查询算法

韩希先, 刘显敏, 李建中, 高宏   

  1. (哈尔滨工业大学计算机科学与技术学院 哈尔滨 150001) (xxhan1981@163.com)
  • 出版日期: 2017-03-01
  • 基金资助: 
    国家“九七三”重点基础研究发展计划基金项目(2012CB316200);国家自然科学基金项目(61502121,61402130,61272046,61190115,61173022,61033015);山东省自然科学基金项目(ZR2013FQ028);山东省科技重大专项基金项目(2015ZDXX0210B02)

TMS: An Novel Algorithm for Top-k Queries with Multi-Dimensional Selections on Massive Data

Han Xixian, Liu Xianmin, Li Jianzhong,Gao Hong   

  1. (School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001)
  • Online: 2017-03-01

摘要: 在许多应用中,Top-k是一种十分重要的查询类型,它在潜在的巨大数据空间中返回用户感兴趣的少量数据.Top-k查询通常具有指定的多维选择条件.分析发现:现有算法无法有效处理海量数据的多维选择Top-k查询.提出了一个基于有序列表的TMS(top-k with multi-dimensional selection)算法,有效计算海量数据上的具有多维选择的Top-k结果.TMS算法利用层次化结构的选择属性网格对原数据表执行水平划分,每一个分片的元组以面向列的模式存储,并且度量属性的列表根据其属性值降序排列.给定多维选择条件,TMS算法利用选择属性网格确定相关网格单元,有效减少需要读取的元组数量,提出双排序方法执行多维选择的渐进评价,并提出有效剪切操作来剪切不满足多维选择条件和分数要求的候选元组.实验结果表明:TMS算法性能优于现有算法.

关键词: TMS算法, 有序列表, 选择属性网格, 渐进选择评价, 剪切操作

Abstract: In many applications, top-k query is an important operation to return a set of interesting points among potentially huge data space. Usually, a multi-dimensional selection is specified in top-k query. It is found that the existing algorithms cannot process the query on massive data efficiently. A sorted-list based algorithm TMS (top-k with multi-dimensional selection) is proposed to process top-k query with selection condition on massive data efficiently. TMS employs selection attribute lattice (SAL) of hierarchical structure to distribute tuples and obtains a horizontal partitioning of the table. Each partition is stored in column-oriented model and the lists of ranking attributes are arranged in descending order of attribute values. Given multi-dimensional selection, the relevant lattice cells are determined by SAL and this reduces the number of the retrieved tuples significantly. Double-sorting operation is devised to perform progressive selection evaluation. The efficient pruning is developed to discard the candidates which do not satisfy selection condition or score requirement. The extensive experimental results show that TMS has a significant performance advantage over the existing algorithms.

Key words: top-k with multi-dimensional selection (TMS) algorithm, sorted list, selection attribute lattice (SAL), progressive selection evaluation, pruning

中图分类号: