ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2019, Vol. 56 ›› Issue (3): 594-610.doi: 10.7544/issn1000-1239.2019.20170741

• 人工智能 • 上一篇    下一篇

基于相似性连接的时间序列Shapelets提取

张振国1,2,王超2,温延龙2,袁晓洁3   

  1. 1(延边大学计算机科学与技术系 吉林延吉 133002); 2(南开大学计算机学院 天津 300350); 3(南开大学网络空间安全学院 天津 300350) (zhangzhenguo@dbis.nankai.edu.cn)
  • 出版日期: 2019-03-01
  • 基金资助: 
    国家自然科学基金项目(61772289);吉林省教育厅“十三五”科学技术项目(JJKH20191125KJ)

Time Series Shapelets Extraction via Similarity Join

Zhang Zhenguo1,2, Wang Chao2, Wen Yanlong2, Yuan Xiaojie3   

  1. 1(Department of Computer Science and Technology, Yanbian University, Yanji, Jilin 133002); 2(College of Computer Science, Nankai University, Tianjin 300350); 3(College of Cyber Science, Nankai University, Tianjin 300350)
  • Online: 2019-03-01

摘要: 在时间序列分类问题中,以Shapelets特征为基础的分类算法具有很高的分类准确率和良好的可解释性,因此,高辨别能力Shapelets的提取已成为时间序列研究领域重要的研究热点之一.对于Shapelets提取的研究已取得了很多优秀的成果,但仍存在一些问题,主要是由于通过遍历所有子序列来获取Shapelets的方式非常耗时.尽管可以采取剪枝策略优化该过程,但往往会损失分类准确率.为此,提出一种基于相似性连接的Shapelets提取方法,该方法舍弃逐一判断子序列分类能力的策略,而是以子序列为单位,通过相似性连接的思想构建时序数据间的相似性向量.对于不同类别的时序数据,计算每一对时序数据间的差异向量,进而得到表示时序数据集中不同类别间差异的候选矩阵,然后根据候选矩阵的数值差异,快速筛选出具有高分类能力的Shapelets集合.在真实数据集上的大量实验表明:相比于现有的Shapelets提取方法,这种相似性连接方法所得到的Shapelets在分类任务中不仅具有很好的时间效率,而且能保证高分类准确率.

关键词: 时间序列, Shapelets, 相似性连接, 差异向量, 候选矩阵

Abstract: For time series classification, the classifier built by Shapelets has high classification accuracy and, meanwhile, the classification results are easily interpretable. Therefore, the extraction of discriminative Shapelets has attracted a lot of attention in the field of time series data mining. Research on Shapelets extraction has obtained promising achievement, but there are still some problems. The main reason is that the traversal of all time series subsequences to find the discriminative Shapelets is extraordinarily time consuming. Although some pruning techniques can be applied to accelerate the extraction process, they usually reduce the classification accuracy. In this paper, we propose a novel Shapelets extraction method based on similarity join, which abandons the idea of computing each subsequence’s discriminative power. In the proposed method, each subsequence is considered as a basic computing unit and the similarity vector of two time series is obtained by the similarity join calculation of their subsequences. For the time series with different class label, we compute the difference vector of each time series pair and merge them into a candidate matrix which represents the differences between different time series class. Thus, we can easily obtain the eligible Shapelets from the candidate matrix. Extensive experimental results in real time series datasets show that, compared with the exist Shapelets extraction methods, the proposed method has high time efficiency while ensuring excellent classification accuracy.

Key words: time series, Shapelets, similarity join, difference vector, candidate matrix

中图分类号: