高级检索
    程文聪 邹 鹏 贾 焰 杨 尹. 基于DTW距离的伪周期数据流异常检测[J]. 计算机研究与发展, 2010, 47(5): 893-902.
    引用本文: 程文聪 邹 鹏 贾 焰 杨 尹. 基于DTW距离的伪周期数据流异常检测[J]. 计算机研究与发展, 2010, 47(5): 893-902.
    Cheng Wencong, Zou Peng, Jia Yan, and Yang Yin. Anomaly Detection over Pseudo Period Data Streams Based on DTW Distance[J]. Journal of Computer Research and Development, 2010, 47(5): 893-902.
    Citation: Cheng Wencong, Zou Peng, Jia Yan, and Yang Yin. Anomaly Detection over Pseudo Period Data Streams Based on DTW Distance[J]. Journal of Computer Research and Development, 2010, 47(5): 893-902.

    基于DTW距离的伪周期数据流异常检测

    Anomaly Detection over Pseudo Period Data Streams Based on DTW Distance

    • 摘要: 伪周期数据流是一类常见的数据流,广泛出现于各种监测应用中.在这类数据流中出现的异常可能蕴涵了感兴趣的领域知识,因此有必要检测异常的发生以作为进一步深入分析的基础.DTW距离较之欧氏距离具有更好的鲁棒性,采用DTW距离作为伪周期数据流不同波段间相似性的度量可以有效检测出有较少历史相似波段的异常波段,继而在此基础上提出了一种基于聚类索引的快速近似异常波段检测方法用以加速检测过程,在真实数据集上的实验表明了所提方法的有效性.

       

      Abstract: Pseudo period data streams appear in a lot of applications, especially in monitoring domains. The anomalies detected over pseudo period data streams may possess significant domain knowledge which is worth to do further analysis. When Euclidean distance between time series changes greatly with the compared time series moving slightly along the time-axis, DTW (dynamic time warping) distance is suggested as a more robust distance than Euclidean distance. In this paper DTW distance is adopted as similarity measure of different wave sections in pseudo period data streams, and then the anomaly wave sections are defined, which have few historical similar counterparts based on that similarity measure. A nave algorithm is given to detect the anomaly wave sections by directly computing the DTW distance between the current wave section and all other wave sections in the historical dataset. However, the efficiency of the nave algorithm is very poor which limits its application. So a fast approximate algorithm based on the cluster index is proposed to speedup the nave method. Compared with the nave algorithm, this new method is much faster in speed and no big degrades in accuracy. Extensive experiments on the real dataset demonstrate the effectiveness of the proposed methods.

       

    /

    返回文章
    返回