Skyline Computing on MapReduce with Hyperplane-Projections-Based Partition
-
Graphical Abstract
-
Abstract
Recently, Skyline computing has been playing a more and more important role in decision-making applications. Centralized processing has become relatively mature. Today with explosion of big data, Skyline computing faces the same problem of big data processing. MapReduce is a parallel model and it is widely used in data-intensive processing. As we all know, processing on MapReduce requires the task be decomposable. There are some partition methods for Skyline computing on MapReduce, such as grid partition, angle-based partition and so on. Grid partition can only get good performance on low dimensional dataset. Angle-based partition applies to both low dimensional and high dimensional dataset. But it needs a complex and time-consuming coordinates conversion process before partitioning. In this paper, we employ a method similar to angle-based partition method called hyperplane-projections-based partition to break down our dataset. It applies to both low dimensional and high dimensional dataset and at the same time the coordinates conversion process before partitioning is very simple. We propose an algorithm to process Skyline computing on MapReduce called MR-HPP(MapReduce with hyperplane-projections-based partition) based on hyperplane-projections partition. Moreover, we propose an effective filter method called PSF(presorting filter) in the filter period of MR-HPP. Extensive comparative experiments based on Hadoop have proved that our method is accurate, efficient and stable.
-
-