Abstract:
With the development of data stream application, data stream management system DSMS brings tremendous challenges in database techniques. As a data stream is continual and time-varying, it requires that DSMS should be adaptive. When the data arrival rate exceeds the system resource limit, the system performance degrades or system may even breaks down. Load shedding is one of the most promising ways to solve the problem. In this paper, several load shedding techniques over sliding window joins are addressed. Firstly, a dual window architectural model including aux-windows and join-windows is proposed. The former is used in the join of two streams, while the latter is used in building the statistics of the estimated join results. With the statistics, an effective load shedding strategy can produce maximum subset of join outputs. In order to accelerate the load shedding process, segment trees have been utilized to reduce the cost on shedding evaluation. Secondly, front-shedding will be cooperated with rear-shedding when streams have high arrival rates, in which the front-shedding adopts random shedding and rear-shedding adopts semantic shedding. Lastly, the experiments based on extensive experiments with synthetic data and real life data show that these new load shedding methods have superb performance of join outputs compared with dominates the existing strategies.