高级检索

    数据流上具有数据遗忘特性的小波概要

    Wavelet-Based Amnesic Synopses for Data Streams

    • 摘要: 动态地维护数据流的概要结构是数据流查询和挖掘等处理工作的基础.在许多数据流应用场合,数据的影响随时间衰减,流中数据被逐步遗忘,称为数据流的遗忘特性.在数据流概要的构造中,应体现这种特性.离散小波变换是一种应用得较多的数据流概要构造方法.将数据流的遗忘特性引入小波概要的构造中,提出了一种能反映数据流遗忘特性的小波概要结构:基于小波的分层遗忘概要,分别讨论了误差平方和及最大绝对误差两种误差度量标准下这种概要的构造方法.所进行的实验验证了该概要的有效性.

       

      Abstract: Maintaining a synopsis data structure dynamically from data stream is vital for a variety of streaming data applications, such as approximate query or data mining. In many cases, the significance of data item in streams decays with age: this item perhaps conveys critical information first, but, as time goes by, it gets less and less important until it eventually becomes useless. This feature is termed amnesic. Discrete wavelet transform is often used in construction of synopses for streaming data. Proposed in this paper is a wavelet-based hierarchical amnesic synopsis (W-HAS), which includes the amnesic feature of data stream in the generation of wavelet synopses. W-HAS can provide a better approximate representation for data streams with amnesic feature than conventional wavelet synopses. To maintain W-HAS online for evolving data streams, the authors first explore the merger process of two wavelet decompositions, and then implement the addition of data nodes in W-HAS structure based on the merger process. Using the addition of data nodes, W-HAS grows dynamically and hierarchically. The construction methods of W-HAS under sum of squared error (sse) and maximum absolute error metrics are discussed. Further, W-HAS with error control is also explore. Finally, experiments on real and synthetic datasets validated the proposed methods.

       

    /

    返回文章
    返回