高级检索

    不确定数据上两种查询的分布式聚集算法

    Distributed Aggregations for Two Queries over Uncertain Data

    • 摘要: 不确定数据查询技术在军事、金融、电信等领域中起到了越来越重要的作用.不确定性数据在传感器网络、分布式Web Server及P2P系统等分布式系统中广泛存在.从这些系统中收集所有数据进行集中式查询将带来巨大的通信开销、时间延迟和存储代价.同时,由于不确定数据的特点,大多数集中式不确定查询算法在分布式环境下并不适用.给出不确定数据的最大值和Top-k聚集查询定义,并分别提出了基于过滤策略的分布式聚集算法.算法根据给出的3个过滤策略,利用数据的分布区间和概率进行筛选概率上限的计算,尽可能将不影响查询结果的数据抛弃.同时,算法以相对较小的代价归并保存并传输了计算最终查询结果所需要的“不可丢弃”数据.实验结果表明,在各类系统和数据条件下,过滤算法都能够正确地得到查询结果并显著降低系统的数据通信开销.

       

      Abstract: The technique of querying uncertain data is playing an increasingly important role in the fields of military affairs, finance, and telecom. Actually, a lot of uncertain data is generated in distributed systems such as wireless sensor networks, distributed Web servers and P2P systems. Collecting all the uncertain data from such a system to perform centralized queries will lead to immense communication cost, time delay and storage cost. Also, most centralized query methods are not applicable to distributed systems due to the features of uncertain data. In this paper, distributed uncertain Max (Min) and Top-k queries are defined, and designed distributed aggregation algorithms are designed based on several filtering strategies. The algorithms compute the upper bound of the candidate probability for each data entry using the uncertain ranges and probability distributions of the data, and filter the data entries as much as possible, which have no chance to influence the query result. Also, the algorithms merge and keep a relatively small amount of necessary data in the transmission, to compute the final results. The correctness of the filtering strategies has been proved. Simulations show that the algorithms can process the queries correctly and reduce communication cost efficiently with various data and system circumstances.

       

    /

    返回文章
    返回