存在级不确定数据上的概率Skyline计算

王晓伟; 贾  焰; 杨树强; 田  李

存在级不确定数据上的概率Skyline计算

Probabilistic Skyline Computation on Existentially Uncertain Data

摘要

摘要: 概率Skyline计算是在不确定对象集合中找出Skyline概率大于给定阈值的对象，在多目标决策应用中有重要价值.现有的存在级不确定数据上的概率Skyline算法均需要预先建立索引，在数据量很大、维度很高或数据频繁更新时，建立索引往往不可行或者不会带来性能的提升，因此有必要设计通用的非索引算法.提出了存在级不确定数据上概率Skyline的首个非索引算法，用已扫描的数据动态地维护一个概率约束空间，未来落入该空间的对象可以被安全地裁剪.在标准的模拟数据集上维度不超过4时裁剪比率超过99.8%，相比不用裁剪规则的基本算法，查询时间节省50%以上.

Abstract: In recent years, many advanced technologies have been developed to collect and analyze massive data. In many cases, the data may contain errors or may be unreliable. Traditional methods often lead to unacceptable complexity when managing and mining these uncertain data, thus a lot of attention has been paid to the query evaluation on uncertain data. The probabilistic Skyline computation is to find the set of objects whose probabilities in skyline exceed a given threshold, which has a great value in multi-criteria optimization problem. Indices should be built in advance in existing algorithms. Building indices is frequently impracticable or not profitable when data has a high volume or large dimensionality or high update frequency, thus a non-indexed algorithm is necessary. In this paper, we propose the first non-indexing algorithm of probabilistic Skyline on existentially uncertain data. This algorithm dynamically maintains a probabilistic constrained space using objects scanned before. Future objects can be safely pruned if fall into that space. The algorithm is evaluated through extensive experiments on both real and benchmark synthetic data sets. It is shown that when the dimensionality is no more than 4 the pruning rate exceeds 99.8%, and the computation time is saved more than 50% over the nave algorithm.

HTML全文

参考文献(0)

施引文献

资源附件(0)