ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2019, Vol. 56 ›› Issue (3): 594-610.doi: 10.7544/issn1000-1239.2019.20170741

Previous Articles     Next Articles

Time Series Shapelets Extraction via Similarity Join

Zhang Zhenguo1,2, Wang Chao2, Wen Yanlong2, Yuan Xiaojie3   

  1. 1(Department of Computer Science and Technology, Yanbian University, Yanji, Jilin 133002); 2(College of Computer Science, Nankai University, Tianjin 300350); 3(College of Cyber Science, Nankai University, Tianjin 300350)
  • Online:2019-03-01

Abstract: For time series classification, the classifier built by Shapelets has high classification accuracy and, meanwhile, the classification results are easily interpretable. Therefore, the extraction of discriminative Shapelets has attracted a lot of attention in the field of time series data mining. Research on Shapelets extraction has obtained promising achievement, but there are still some problems. The main reason is that the traversal of all time series subsequences to find the discriminative Shapelets is extraordinarily time consuming. Although some pruning techniques can be applied to accelerate the extraction process, they usually reduce the classification accuracy. In this paper, we propose a novel Shapelets extraction method based on similarity join, which abandons the idea of computing each subsequence’s discriminative power. In the proposed method, each subsequence is considered as a basic computing unit and the similarity vector of two time series is obtained by the similarity join calculation of their subsequences. For the time series with different class label, we compute the difference vector of each time series pair and merge them into a candidate matrix which represents the differences between different time series class. Thus, we can easily obtain the eligible Shapelets from the candidate matrix. Extensive experimental results in real time series datasets show that, compared with the exist Shapelets extraction methods, the proposed method has high time efficiency while ensuring excellent classification accuracy.

Key words: time series, Shapelets, similarity join, difference vector, candidate matrix

CLC Number: