高级检索

    分布式数据流增量聚集

    Algorithms for Incremental Aggregation over Distributed Data Stream

    • 摘要: 分布式处理是数据流管理中的主流技术,聚集是分布式数据流系统中一种重要的连续查询类型.在分布式数据流环境中,由于需要连续计算聚集值,并且在分布式网络中连续传送聚集值,导致系统的通信开销非常大.为了有效地减少网络中数据流的传输量,提出了一种近似增量聚集算法(approximately incremental aggregate over distributed data stream,AIADDS).算法增量地计算网络中各个站点的聚集值,只有当聚集值的改变超出给定的阈值才向其他站点传送聚集改变量,这样,可以显著地降低网络的数据传输量.作为算法核心的VSB-Tree能够有效地合并、存储来自孩子站点的聚集值,同时增量地向它的父站点传送聚集改变量.理论分析和实验结果表明,算法是行之有效的.

       

      Abstract: Many stream-oriented systems are inherently geographically distributed, so distributed processing is a very promising route towards a more effective and adaptive data stream processing model. Aggregation over data streams is an important class of continuous operators for distributed processing. Because aggregation queries need be continuously computed and the result need be continuously transmitted, significant communication overhead is incurred for this model. Unfortunately, the continual transmission of a large number of rapid data streams can be impractical or expensive. So a new approximately incremental aggregation technique is proposed with provable guarantees on the approximation error for reducing the overhead. A new structure called the VSB-tree is introduced, which can effectively incorporate and store aggregation of all child stations. The VSB-tree also can incrementally transmit change of aggregation value to father station. The theory analysis and experimental results show the feasibility and effectiveness of the algorithm.

       

    /

    返回文章
    返回