高级检索
    张振国, 王超, 温延龙, 袁晓洁. 基于相似性连接的时间序列Shapelets提取[J]. 计算机研究与发展, 2019, 56(3): 594-610. DOI: 10.7544/issn1000-1239.2019.20170741
    引用本文: 张振国, 王超, 温延龙, 袁晓洁. 基于相似性连接的时间序列Shapelets提取[J]. 计算机研究与发展, 2019, 56(3): 594-610. DOI: 10.7544/issn1000-1239.2019.20170741
    Zhang Zhenguo, Wang Chao, Wen Yanlong, Yuan Xiaojie. Time Series Shapelets Extraction via Similarity Join[J]. Journal of Computer Research and Development, 2019, 56(3): 594-610. DOI: 10.7544/issn1000-1239.2019.20170741
    Citation: Zhang Zhenguo, Wang Chao, Wen Yanlong, Yuan Xiaojie. Time Series Shapelets Extraction via Similarity Join[J]. Journal of Computer Research and Development, 2019, 56(3): 594-610. DOI: 10.7544/issn1000-1239.2019.20170741

    基于相似性连接的时间序列Shapelets提取

    Time Series Shapelets Extraction via Similarity Join

    • 摘要: 在时间序列分类问题中,以Shapelets特征为基础的分类算法具有很高的分类准确率和良好的可解释性,因此,高辨别能力Shapelets的提取已成为时间序列研究领域重要的研究热点之一.对于Shapelets提取的研究已取得了很多优秀的成果,但仍存在一些问题,主要是由于通过遍历所有子序列来获取Shapelets的方式非常耗时.尽管可以采取剪枝策略优化该过程,但往往会损失分类准确率.为此,提出一种基于相似性连接的Shapelets提取方法,该方法舍弃逐一判断子序列分类能力的策略,而是以子序列为单位,通过相似性连接的思想构建时序数据间的相似性向量.对于不同类别的时序数据,计算每一对时序数据间的差异向量,进而得到表示时序数据集中不同类别间差异的候选矩阵,然后根据候选矩阵的数值差异,快速筛选出具有高分类能力的Shapelets集合.在真实数据集上的大量实验表明:相比于现有的Shapelets提取方法,这种相似性连接方法所得到的Shapelets在分类任务中不仅具有很好的时间效率,而且能保证高分类准确率.

       

      Abstract: For time series classification, the classifier built by Shapelets has high classification accuracy and, meanwhile, the classification results are easily interpretable. Therefore, the extraction of discriminative Shapelets has attracted a lot of attention in the field of time series data mining. Research on Shapelets extraction has obtained promising achievement, but there are still some problems. The main reason is that the traversal of all time series subsequences to find the discriminative Shapelets is extraordinarily time consuming. Although some pruning techniques can be applied to accelerate the extraction process, they usually reduce the classification accuracy. In this paper, we propose a novel Shapelets extraction method based on similarity join, which abandons the idea of computing each subsequence’s discriminative power. In the proposed method, each subsequence is considered as a basic computing unit and the similarity vector of two time series is obtained by the similarity join calculation of their subsequences. For the time series with different class label, we compute the difference vector of each time series pair and merge them into a candidate matrix which represents the differences between different time series class. Thus, we can easily obtain the eligible Shapelets from the candidate matrix. Extensive experimental results in real time series datasets show that, compared with the exist Shapelets extraction methods, the proposed method has high time efficiency while ensuring excellent classification accuracy.

       

    /

    返回文章
    返回