高级检索
    程文聪 邹鹏 贾焰. 多维时序数据中的相似子序列搜索研究[J]. 计算机研究与发展, 2010, 47(3): 416-425.
    引用本文: 程文聪 邹鹏 贾焰. 多维时序数据中的相似子序列搜索研究[J]. 计算机研究与发展, 2010, 47(3): 416-425.
    Cheng Wencong, Zou Peng, and Jia Yan. Similar Sub-Sequences Search over Multi-Dimensional Time Series Data[J]. Journal of Computer Research and Development, 2010, 47(3): 416-425.
    Citation: Cheng Wencong, Zou Peng, and Jia Yan. Similar Sub-Sequences Search over Multi-Dimensional Time Series Data[J]. Journal of Computer Research and Development, 2010, 47(3): 416-425.

    多维时序数据中的相似子序列搜索研究

    Similar Sub-Sequences Search over Multi-Dimensional Time Series Data

    • 摘要: 由于动态时间弯曲距离较之欧氏距离有更好鲁棒性,因此被广泛用作时序数据相似子序列搜索研究领域中的相似性度量.在单一维度上的相似子序列搜索可能不能获得足够的匹配结果作为继续深入分析的依据,因此通过引入在多维数据分析中常用的数据立方体模型将相似子序列搜索问题扩展到了多维场景之下,从而在多个维度上得到搜索结果以获取更多有价值的知识.在此基础上利用数据立方体相邻层次单元间的相关性对基本的搜索算法进行了改进,在保证准确性的基础上提高了搜索效率.在真实网络安全数据集上的实验验证了所提方法的有效性.

       

      Abstract: When Euclidean distance between time series changes greatly with the compared time series moving slightly along the time-axis, a dynamic time warping distance is suggested as a more robust distance than Euclidean distance. Dynamic time warping distance is widely used as similarity measure in the domain of similar sub-sequences search over time series data. The similarity search in the single dimension may not get enough similar sub-sequences as the results to do further analysis and support the decision making. In this paper the problem is extended to the multi-dimensional scenario by introducing a data cube model which is well-studied in the multi-dimensional data analysis domain. Based on the data cube model the authors define the similar sub-sequences in multi-dimensional time series data and propose a nave algorithm to get more useful search results with extra valuable information. However, the efficiency of the nave algorithm is very poor which limits its application. So the efficiency of the nave algorithm is improved by studying the correlation of the cells among the neighboring levels in the data cube on the basis of keeping the accuracy of the search results. Extensive experiments based on the real network security dataset demonstrate the effectiveness of the proposed methods.

       

    /

    返回文章
    返回