Abstract:
Data stream is a new data model that has attracted attentions in numerous applications such as traffic monitoring, telephone records management, sensor networks, stock-market analysis, Web click streams, etc. The importance of quantile estimation of data streams has been highlighted by more and more researchers in recent years. Due to the characteristics of continuity and boundlessness of streaming data, it is unfeasible to memorize the entire information of data streams and thus difficult to get the exact answer to the query on streaming data. In this paper, a novel synopsis data structure—Nord_Histogram for storing streaming data summary is designed to get a balance between the memory cost and the query accuracy, and a one-pass online approximate algorithm for quantile computation is presented. The algorithm implements the approximate quantile queries over data stream with the time and space requirements being linear with the number of the buckets, which has no business with the length of data streams, and therefore has great scalability. This method has very good performance on data with uniform distribution. The correlation between the computation accuracy and main memory requirement is also analyzed. Experimental results show that the algorithm brings about quite small relative errors and works well over data streams.