一种局部相关不确定数据库快照集合上的概率频繁最近邻算法
An Algorithm on Probabilistic Frequent Nearest Neighbor Query over Snapshots of Uncertain Database with Locally Correlation
-
摘要: 局部相关空间不确定数据越来越受到许多实际应用的关注.提出了一种新颖的定义在不确定数据库的多个快照上的概率频繁近邻查询,目的是在多个快照数据上找到以一定概率频繁成为查询点最近邻的那些对象.应用现有的基于传统数据和基于不确定数据上的近邻查询算法直接处理这种查询会产生昂贵的开销.为了很好地解决这一问题,提出了一般的处理框架,其中包括相应的基于切尔诺夫界的过滤方法,以及对于概率质量函数的动态规划算法.给出了分别作用于两个阶段的两个过滤方法.在第1阶段,利用切尔诺夫界的上界推广形式可以过滤大量的候选目标,之后在第2阶段,利用切尔诺夫界的标准形式来进一步过滤候选目标.还讨论了用于处理扩展查询的动态规划算法以及相应的过滤条件.最后,在人工的和真实的数据上都进行了充分的实验,并验证了给出算法的有效性,为进一步的研究工作奠定了基础.Abstract: Research on locally correlated spatial uncertain data recently gets more and more attentions in many practical applications. This paper proposes a novel probabilistic frequent nearest neighbor query over snapshots of uncertain database with locally correlation, which retrievals those objects which can be seen as the nearest neighbor of query point with a certain probability frequently. The existing methods of processing nearest neighbor query over traditional data or uncertain data cannot be used to process this kind of query because of the expensive time consuming. In order to solve this problem, we propose a generic framework, including several filters based on chernoff bound and the dynamic programme used to compute the probability mass function. We provide two filtering methods for two phases. The extended form of chernoff upper bound is used to filter lots of candidates objects at the first phase; while at the second phase, we use the standard form of chernoff bound. Then we also discuss the filters and dynamic programme used to process the extended LC-PFNN query. At last, we conduct experiments both on real and synthetic data set, and the results demonstrate that our method is effective enough, which lays a solid foundation for further research.