Abstract:
Skew is one of the most important problems in parallel systems, which has a great impact on the parallel systems performance. The event stream system is the back-end data processing and analysis systems of data stream management systems (DSMS). It is different from the traditional database systems due to the new workload characterization. This kind of systems receive continuous, fast-coming and large volume of event stream data on one side, and supply quick response to the users’ queries on the other side. Under such a condition the common data redistribution solutions to data skew are not suitable any longer. In this paper a periodical counting based capability aware (PCCA) loading strategy is presented based on DBroker, which is a shared nothing event stream parallel database system for the backbone network monitoring application. This loading strategy not only keeps the event stream data being loaded fast and correctly, but also recognizes and prevents the system from the skew automatically, according to the loading capability of each node adaptively.What’s more, it forms a good data distribution foundation for query service. Finally PCCA loading strategy is proven to provide much better performance than the other three methods in both simulation model analysis and real system testing.