高级检索

    数据流滑动窗口连接的卸载策略研究

    Load Shedding Strategies on Sliding Window Joins over Data Streams

    • 摘要: 随着数据流应用系统的快速发展,数据流管理系统对数据库技术提出了巨大挑战.针对数据流上的滑动窗口连接操作,提出一些新的卸载技术,使得系统在过载的情况下卸载连接结果少的元组,从而最大化输出结果.双窗口模型和辅助窗口统计信息的建立保证了预估连接结果的可靠性,同时应用线段树使卸载的判断更加高效.当流速过快、系统处理能力无法与之同步时,通过前端卸载和后端卸载的配合使用达到理想的语义卸载,得到最大子集的连接结果.实验验证该卸载策略的性能好于现有其他方法.

       

      Abstract: With the development of data stream application, data stream management system DSMS brings tremendous challenges in database techniques. As a data stream is continual and time-varying, it requires that DSMS should be adaptive. When the data arrival rate exceeds the system resource limit, the system performance degrades or system may even breaks down. Load shedding is one of the most promising ways to solve the problem. In this paper, several load shedding techniques over sliding window joins are addressed. Firstly, a dual window architectural model including aux-windows and join-windows is proposed. The former is used in the join of two streams, while the latter is used in building the statistics of the estimated join results. With the statistics, an effective load shedding strategy can produce maximum subset of join outputs. In order to accelerate the load shedding process, segment trees have been utilized to reduce the cost on shedding evaluation. Secondly, front-shedding will be cooperated with rear-shedding when streams have high arrival rates, in which the front-shedding adopts random shedding and rear-shedding adopts semantic shedding. Lastly, the experiments based on extensive experiments with synthetic data and real life data show that these new load shedding methods have superb performance of join outputs compared with dominates the existing strategies.

       

    /

    返回文章
    返回