高级检索

    高效的Top-k相互Skyline查询算法

    Efficient Top-k Query Processing on Mutual Skyline

    • 摘要: Top-k相互Skyline查询返回相互Skyline查询中的前k个对象.这种查询是数据分析者寻找有意义对象进行决策支持的一种重要直觉工具.然而,这种查询还没有引起研究社区足够的注意力.介绍了几种新颖的算法,包括Topk-TBBS,Topk-dMBBS,Topk-wMBBS.主要的思想是信息重用和高效的修剪策略.特别地,Topk-wMBBS算法由于完全重用了搜索中的节点信息,并利用了最好优先BF搜索策略.因而它获得了最好的性能.同时证明了该算法有最优的I/O访问效率.最后,使用了2个真实数据集和4个服从不同分布的合成数据集进行了集中实验.实验结果表明,提出的算法无论是变化参数k的大小、数据集的尺寸和Cache尺寸都是有效的,且具有很高的效率,尤其Topk-wMBBS具有最小的I/O访问次数.

       

      Abstract: The top-k mutual skyline query returns k data objects among mutual skyline. This query is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects. However, it has not received adequate attention from the research community. In this paper, several new algorithms are introduced, including Topk-TBBS (Topk two step branch and bound skyline), Topk-dMBBS (Topk mutual branch and bound skyline by dynamic skyline), and Topk-wMBBS (Topk mutual branch and bound skyline by Window Query). The main ideas are information reuse and some efficient pruning policies. Especially, Topk-wMBBS has the least number of node accesses as it fully reuses the information during the search and takes advantage of the best first (BF) search policy. Therefore, it obtains the best performance among all algorithms. Meanwhile, we also show that Topk-wMBBS has the optimal efficiency on I/O cost. Finally, we carry out the extensive experiments using two real datasets and four synthetic datasets which follow different distributions. The results show that the proposed algorithms are effective and have a higher efficiency, especially Topk-wMBBS which has at least I/O accesses, varying the number of parameter k, the cardinality of different datasets, and the cache size.

       

    /

    返回文章
    返回