ISSN 1000-1239 CN 11-1777/TP

• Paper • Previous Articles     Next Articles

Research on an Algorithm for Approximate Quantile Computation over Data Streams

Yang Bei1,2 and Huang Houkuan1   

  1. 1(School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044) 2(School of Information Engineering, Zhengzhou University, Zhengzhou 450001)
  • Online:2008-02-15

Abstract: Data stream is a new data model that has attracted attentions in numerous applications such as traffic monitoring, telephone records management, sensor networks, stock-market analysis, Web click streams, etc. The importance of quantile estimation of data streams has been highlighted by more and more researchers in recent years. Due to the characteristics of continuity and boundlessness of streaming data, it is unfeasible to memorize the entire information of data streams and thus difficult to get the exact answer to the query on streaming data. In this paper, a novel synopsis data structure—Nord_Histogram for storing streaming data summary is designed to get a balance between the memory cost and the query accuracy, and a one-pass online approximate algorithm for quantile computation is presented. The algorithm implements the approximate quantile queries over data stream with the time and space requirements being linear with the number of the buckets, which has no business with the length of data streams, and therefore has great scalability. This method has very good performance on data with uniform distribution. The correlation between the computation accuracy and main memory requirement is also analyzed. Experimental results show that the algorithm brings about quite small relative errors and works well over data streams.

Key words: data stream, synopsis data structure, histogram, quantile, approximate algorithm