Abstract:
The top-k mutual skyline query returns k data objects among mutual skyline. This query is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects. However, it has not received adequate attention from the research community. In this paper, several new algorithms are introduced, including Topk-TBBS (Topk two step branch and bound skyline), Topk-dMBBS (Topk mutual branch and bound skyline by dynamic skyline), and Topk-wMBBS (Topk mutual branch and bound skyline by Window Query). The main ideas are information reuse and some efficient pruning policies. Especially, Topk-wMBBS has the least number of node accesses as it fully reuses the information during the search and takes advantage of the best first (BF) search policy. Therefore, it obtains the best performance among all algorithms. Meanwhile, we also show that Topk-wMBBS has the optimal efficiency on I/O cost. Finally, we carry out the extensive experiments using two real datasets and four synthetic datasets which follow different distributions. The results show that the proposed algorithms are effective and have a higher efficiency, especially Topk-wMBBS which has at least I/O accesses, varying the number of parameter k, the cardinality of different datasets, and the cache size.