ISSN 1000-1239 CN 11-1777/TP

• 论文 • 上一篇    下一篇

多维时序数据中的相似子序列搜索研究

程文聪 邹鹏 贾焰   

  1. (国防科学技术大学计算机学院 长沙 410073) (emailtocheng@yahoo.com.cn)
  • 出版日期: 2010-03-15

Similar Sub-Sequences Search over Multi-Dimensional Time Series Data

Cheng Wencong, Zou Peng, and Jia Yan   

  1. (College of Computer, National University of Defense Technology, Changsha 410073)
  • Online: 2010-03-15

摘要: 由于动态时间弯曲距离较之欧氏距离有更好鲁棒性,因此被广泛用作时序数据相似子序列搜索研究领域中的相似性度量.在单一维度上的相似子序列搜索可能不能获得足够的匹配结果作为继续深入分析的依据,因此通过引入在多维数据分析中常用的数据立方体模型将相似子序列搜索问题扩展到了多维场景之下,从而在多个维度上得到搜索结果以获取更多有价值的知识.在此基础上利用数据立方体相邻层次单元间的相关性对基本的搜索算法进行了改进,在保证准确性的基础上提高了搜索效率.在真实网络安全数据集上的实验验证了所提方法的有效性.

关键词: 时间序列, 相似子序列搜索, 多维, 数据立方体, 动态时间弯曲

Abstract: When Euclidean distance between time series changes greatly with the compared time series moving slightly along the time-axis, a dynamic time warping distance is suggested as a more robust distance than Euclidean distance. Dynamic time warping distance is widely used as similarity measure in the domain of similar sub-sequences search over time series data. The similarity search in the single dimension may not get enough similar sub-sequences as the results to do further analysis and support the decision making. In this paper the problem is extended to the multi-dimensional scenario by introducing a data cube model which is well-studied in the multi-dimensional data analysis domain. Based on the data cube model the authors define the similar sub-sequences in multi-dimensional time series data and propose a nave algorithm to get more useful search results with extra valuable information. However, the efficiency of the nave algorithm is very poor which limits its application. So the efficiency of the nave algorithm is improved by studying the correlation of the cells among the neighboring levels in the data cube on the basis of keeping the accuracy of the search results. Extensive experiments based on the real network security dataset demonstrate the effectiveness of the proposed methods.

Key words: time series, similar sub-sequences search, multi dimensions, data cube, dynamic time warping