Abstract:
In recent years, many advanced technologies have been developed to collect and analyze massive data. In many cases, the data may contain errors or may be unreliable. Traditional methods often lead to unacceptable complexity when managing and mining these uncertain data, thus a lot of attention has been paid to the query evaluation on uncertain data. The probabilistic Skyline computation is to find the set of objects whose probabilities in skyline exceed a given threshold, which has a great value in multi-criteria optimization problem. Indices should be built in advance in existing algorithms. Building indices is frequently impracticable or not profitable when data has a high volume or large dimensionality or high update frequency, thus a non-indexed algorithm is necessary. In this paper, we propose the first non-indexing algorithm of probabilistic Skyline on existentially uncertain data. This algorithm dynamically maintains a probabilistic constrained space using objects scanned before. Future objects can be safely pruned if fall into that space. The algorithm is evaluated through extensive experiments on both real and benchmark synthetic data sets. It is shown that when the dimensionality is no more than 4 the pruning rate exceeds 99.8%, and the computation time is saved more than 50% over the nave algorithm.