Loading [MathJax]/jax/output/SVG/jax.js
  • 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

基于自适应k近邻的时间序列异常模式识别

王玲, 周南, 申鹏

王玲, 周南, 申鹏. 基于自适应k近邻的时间序列异常模式识别[J]. 计算机研究与发展, 2023, 60(1): 125-139. DOI: 10.7544/issn1000-1239.202111062
引用本文: 王玲, 周南, 申鹏. 基于自适应k近邻的时间序列异常模式识别[J]. 计算机研究与发展, 2023, 60(1): 125-139. DOI: 10.7544/issn1000-1239.202111062
Wang Ling, Zhou Nan, Shen Peng. Time Series Anomaly Pattern Recognition Based on Adaptive k Nearest Neighbor[J]. Journal of Computer Research and Development, 2023, 60(1): 125-139. DOI: 10.7544/issn1000-1239.202111062
Citation: Wang Ling, Zhou Nan, Shen Peng. Time Series Anomaly Pattern Recognition Based on Adaptive k Nearest Neighbor[J]. Journal of Computer Research and Development, 2023, 60(1): 125-139. DOI: 10.7544/issn1000-1239.202111062
王玲, 周南, 申鹏. 基于自适应k近邻的时间序列异常模式识别[J]. 计算机研究与发展, 2023, 60(1): 125-139. CSTR: 32373.14.issn1000-1239.202111062
引用本文: 王玲, 周南, 申鹏. 基于自适应k近邻的时间序列异常模式识别[J]. 计算机研究与发展, 2023, 60(1): 125-139. CSTR: 32373.14.issn1000-1239.202111062
Wang Ling, Zhou Nan, Shen Peng. Time Series Anomaly Pattern Recognition Based on Adaptive k Nearest Neighbor[J]. Journal of Computer Research and Development, 2023, 60(1): 125-139. CSTR: 32373.14.issn1000-1239.202111062
Citation: Wang Ling, Zhou Nan, Shen Peng. Time Series Anomaly Pattern Recognition Based on Adaptive k Nearest Neighbor[J]. Journal of Computer Research and Development, 2023, 60(1): 125-139. CSTR: 32373.14.issn1000-1239.202111062

基于自适应k近邻的时间序列异常模式识别

基金项目: 国家自然科学基金项目(62076025, 61572073)
详细信息
    作者简介:

    王玲: 1974年生.博士,教授.主要研究方向为数据挖掘和机器学习

    周南: 1997年生.硕士.主要研究方向为数据挖掘

    申鹏: 2000年生.硕士研究生.主要研究方向为数据挖掘

  • 中图分类号: TP391

Time Series Anomaly Pattern Recognition Based on Adaptive k Nearest Neighbor

Funds: This work was supported by the National Natural Science Foundation of China (62076025, 61572073).
  • 摘要:

    时间序列作为数据的典型代表,被广泛应用于许多研究领域. 时间序列异常模式代表了一种特殊情况的出现,在许多领域都具有重要意义. 现有的时间序列异常模式识别算法大多只是单纯检测异常子序列,忽略了异常子序列的类别区分问题,且许多参数都需要人为设置.为此提出了一种基于自适应k近邻的异常模式识别算法(anomaly pattern recognition algorithm based on adaptive k nearest neighbor, APAKN). 首先,确定各子序列的自适应k近邻值,引入自适应距离比计算子序列的相对密度,确定异常分数;然后提出一种基于最小方差的自适应阈值方法确定异常阈值,检测出所有异常子序列;最后,对异常子序列进行聚类,所得聚类中心即为具有不同变化趋势的异常模式. 整个算法过程在无需设置任何参数的情况下,不仅解决了密度不平衡问题,还精简了传统基于密度异常子序列检测算法的步骤,实现良好的异常模式识别效果. 在时间序列数据集合UCR的10个数据集上的实验结果表明, 提出算法在无需设置参数的情况下,在异常子序列检测和异常子序列聚类问题中都表现良好.

    Abstract:

    As a typical representative of data, time series is widely used in many research fields. The time series anomaly pattern represents the emergence of a special situation, and is of great significance in many fields. Most of the existing time series anomaly pattern recognition algorithms simply detect anomaly subsequences, ignoring the problem of distinguishing the types of anomaly subsequences, and many parameters need to be set manually. In this paper, an anomaly pattern recognition algorithm based on adaptive k nearest neighbor(APAKN) is proposed. Firstly, the adaptive neighbor value k of each subsequence is determined, and an adaptive distance ratio is introduced to calculate the relative density of the subsequence to determine the anomaly score. Then, an adaptive threshold method based on minimum variance is proposed to determine the anomaly threshold and detect all anomaly subsequences. Finally, the anomaly subsequences are clustered, and the obtained cluster centers are anomaly patterns with different changing trends. The whole algorithm process not only solves the density imbalance problem without setting any parameters, but also simplifies the steps of the traditional density-based anomaly subsequence detection algorithm to achieve a good anomaly pattern recognition effect. Experimental results on the 10 data sets of UCR show that the proposed algorithm performs well in detecting anomaly subsequences and clustering anomaly subsequences without setting parameters.

  • 图  1   子序列分布示意图

    Figure  1.   Schematic diagram of subsequence distribution

    图  2   APAKN的总体工作流程

    Figure  2.   The overall workflow of the APAKN

    图  3   子序列在不同r值时的近邻情况

    Figure  3.   Neighbors of subsequences at different values of r

    图  4   密度不平衡数据

    Figure  4.   Density imbalance data

    图  5   异常子序列分布示意图

    Figure  5.   Schematic diagram of anomaly subsequence distribution

    图  6   UCR部分时间子序列及其异常得分

    Figure  6.   UCR partial time subsequences and anomaly scores

    图  7   50words数据集在不同类别下的异常子序列示例和异常模式

    Figure  7.   Examples of anomaly subsequences and anomaly patterns in different categories of the 50words dataset

    图  8   CBF数据集在不同类别下的异常子序列示例和异常模式

    Figure  8.   Examples of anomaly subsequences and anomaly patterns in different categories of the CBF dataset

    表  1   k = 4时子序列的k近邻域与反向k近邻域

    Table  1   k Near Neighbors Domain and Reverse k Near Neighbors Domain of Subsequences when k = 4

    子序列k近邻域反向k近邻域反向k近邻个数
    s1{s3,s2,s5,s4}0
    s2{s4,s3,s5,s6}{s1,s3,s4,s6}4
    s3{s2,s5,s4,s7}{s1,s2,s4,s5,s7}5
    s4{s2,s5,s6,s3}{s1,s2,s3,s5,s6,s7}6
    s5{s4,s7,s3,s6}{s1,s2,s3,s4,s6,s7}6
    s6{s4,s5,s2,s7}{s2,s4,s5,s7}4
    s7{s5,s6,s4,s3}{s3,s5,s6}3
    下载: 导出CSV

    表  2   s8, s11, s14相对于其第8近邻的自适应距离比

    Table  2   Adaptive Distance Ratio of s8, s11, s14 Relative to Its 8th Nearest Neighbor

    子序列sisi第8近邻s8is8i自适应k近邻值k8is8i的第k8i近邻自适应距离比
    s8s96s10adr(s8,s9)=ddtw(s8,s9)ddtw(s9,s10)=1.833
    s11s123s13adr(s11,s12)=ddtw(s11,s12)ddtw(s12,s13)=4.266
    s14s156s16adr(s14,s15)=ddtw(s14,s15)ddtw(s15,s16)=1.722
    下载: 导出CSV

    表  3   实验数据集

    Table  3   Experimental Data Sets

    数据集子序列
    长度
    类别
    个数
    正常类
    别号
    异常类
    别号
    子序列
    个数
    异常率/%
    50words2705024,10,4390520
    ECG5000140512,5500010
    FaceAll1311412,3,5225015
    Lighting7319753,4,614335
    Trace275412,3,420030
    Two_Patterns128431,2,4111413
    CBF128312,393018
    Synthetic_Control60621,3,4,5,660025
    StarLightCurves1024331,292367
    ElectricDevices96751,2,3,4166372
    下载: 导出CSV

    表  4   APAKN的km值和对比算法的最佳参数值

    Table  4   km Value of APAKN and the Optimal Parameter Values of Comparison Algorithms

    数据集APAKNDBADSFODLDE
    kmtoptkopthopt
    50words1825924
    ECG500039312930
    FaceAll2520915
    Lighting717151615
    Trace15111016
    Two_Patterns54271418
    CBF66331730
    Synthetic_Control24161110
    StarLightCurves74867859
    ElectricDevices1569115388
    下载: 导出CSV

    表  5   各算法在 10 个数据集上的精确率和召回率对比

    Table  5   Comparison of Accuracy and Recall of Each Algorithm on 10 Data Sets

    数据集APAKNDBADSFODLDE
    精确率召回率精确率召回率精确率召回率精确率召回率
    50words1.0001.0000.8450.7420.4621.0001.0000.673
    ECG50000.8960.8750.6630.6190.6501.0000.8520.642
    FaceAll0.9121.0000.7960.8420.8571.0000.7430.685
    Lighting71.0000.9170.8740.9030.5790.9170.8450.683
    Trace0.9231.0000.9111.0000.7270.6671.0000.967
    Two_Patterns0.9540.9910.8520.8000.4440.5330.8450.667
    CBF1.0001.0000.8600.9140.5581.0001.0000.772
    Synthetic_Control1.0001.0000.7570.6370.7351.0001.0000.700
    StarLightCurves0.9120.8960.9120.8450.7740.6850.8750.689
    ElectricDevices0.9230.8510.8450.7980.8550.7620.7960.744
    注:黑体数值表示最佳值.
    下载: 导出CSV

    表  6   各算法在 10 个数据集上的准确率和F值对比

    Table  6   Comparison of Accuracy and F Value of Each Algorithm on 10 Data Sets

    数据集APAKNDBADSFODLDE
    准确率F准确率F准确率F准确率F
    50words1.0001.0000.9140.7900.8510.6320.9010.805
    ECG50000.9960.8850.9230.6400.9730.7880.8650.732
    FaceAll0.9800.9530.8020.8180.9610.9230.9930.713
    Lighting70.9740.9570.8260.8880.7690.7100.8750.755
    Trace0.9710.9600.8710.9530.8000.6960.9630.983
    Two_Patterns1.0000.9720.8360.8250.8760.4850.8960.746
    CBF1.0001.0000.9310.8860.9400.7160.8750.871
    Synthetic_Control1.0001.0000.8600.6910.8800.8470.6980.823
    StarLightCurves0.8950.9040.8890.8770.8520.7270.8570.771
    ElectricDevices0.9240.8860.8520.8210.8240.8060.8690.769
    注:黑体数值表示最佳值.
    下载: 导出CSV

    表  7   各算法在 10 个数据集上的ARI和NMI对比

    Table  7   Comparison of ARI and NMI of Each Algorithm on 10 Data Sets

    数据集APAKNkmlShapeDDCKShape
    EARIENMIEARIENMIEARIENMIEARIENMI
    50words0.8140.8250.5620.5280.5240.6580.7520.635
    ECG50000.8270.7680.6620.6120.5860.6930.8250.698
    FaceAll1.0001.0000.8140.8020.7250.5470.4080.662
    Lighting70.8360.9050.6350.6200.4330.6230.6890.814
    Trace0.5220.7340.5860.5320.4250.3890.3680.647
    Two_Patterns1.0001.0000.6330.3960.3620.4120.4530.504
    CBF0.9250.8140.7550.8520.6150.5780.8470.689
    Synthetic_Control0.4570.6670.6430.6140.3030.3390.4780.713
    StarLightCurves0.9560.7560.7890.8120.5460.6320.8740.684
    ElectricDevices0.8570.8590.4780.5320.7220.5660.6470.684
    注:黑体数值表示最佳值.
    下载: 导出CSV
  • [1]

    O'Leary B, Reiners J J, Xu Xiaohong, et al. Identification and influence of spatio-temporal outliers in urban air quality measurements[J]. Science of The Total Environment, 2016, 573: 55−65

    [2]

    Qin Yu, Lou Yuansheng. Hydrological time series anomaly pattern detection based on isolation forest[C] //Proc of the 3rd Information Technology, Networking, Electronic and Automation Control Conf (ITNEC). Piscataway, NJ: IEEE, 2019: 1706−1710

    [3]

    Wang Jibin, Wang Ping, Wang Suping. Automated detection of atrial fibrillation in ECG signals based on wavelet packet transform and correlation function of random process[J]. Biomedical Signal Processing and Control, 2020, 55: 101662

    [4]

    Liang Haoran, Song Lei, Wang Jianxing, et al. Robust unsupervised anomaly detection via multi-time scale DCGANs with forgetting mechanism for industrial multivariate time series[J]. Neurocomputing, 2021, 423: 444−462

    [5]

    Wang Xing, Lin J, Patel N, et al. Exact variable-length anomaly detection algorithm for univariate and multivariate time series[J]. Data Mining and Knowledge Discovery, 2018, 32(6): 1806−1844 doi: 10.1007/s10618-018-0569-7

    [6] 苏卫星,朱云龙,刘芳,等. 时间序列异常点及突变点的检测算法[J]. 计算机研究与发展,2014,51(4):781−788

    Su Weixing, Zhu Yunlong, Liu Fang, et al. A detection algorithm of outlier and mutation points in time series[J]. Journal of Computer Research and Development, 2014, 51(4): 781−788 (in Chinese)

    [7] 刘芳,齐建鹏,于彦伟,等. 基于密度的Top-n局部异常点快速检测算法[J]. 自动化学报,2019,45(9):1756−1771 doi: 10.16383/j.aas.c180425

    Liu Fang, Qi Jianpeng, Yu Yanwei, et al. A fast algorithm for density-based Top-n local outlier detection[J]. Acta Automatica Sinica, 2019, 45(9): 1756−1771 (in Chinese) doi: 10.16383/j.aas.c180425

    [8]

    Hyndman R J, Wang E, Laptev N. Large-scale unusual time series detection[C] //Proc of the 15th IEEE Int Conf on Data Mining Workshop (ICDMW). Piscataway, NJ: IEEE, 2015: 1616−1619

    [9]

    Senin P, Lin J, Wang Xing, et al. Time series anomaly discovery with grammar-based compression[C] //Proc of the 18th Int Conf on Extending Database Technology(EDBT). New York: ACM, 2015: 481−492

    [10]

    Boniol P, Palpanas T. Series2graph: Graph-based subsequence anomaly detection for time series[J]. Proceedings of the VLDB Endowment, 2020, 13(12): 1821−1834 doi: 10.14778/3407790.3407792

    [11]

    Cheng Kaiwen, Chen Yietarng, Fang Wenhsien. An efficient subsequence search for video anomaly detection and localization[J]. Multimedia Tools and Applications, 2016, 75: 15101−15122

    [12]

    Hawkins D. Identification of outliers[M] //Monographs on Applied Probability and Statistics. Berlin: Springer, 1980: 51−73

    [13]

    Huo Wunjun, Wang Wei, Li Wen. Anomalydetect: An online distance-based anomaly detection algorithm[C] //Proc of the Int Conf on Web Services. Berlin: Springer, 2019: 63−79

    [14]

    Zhang Pengcheng, Xiao Yan, Zhu Yuelong, et al. A new symbolization and distance measure based anomaly mining approach for hydrological time series[J]. International Journal of Web Services Research, 2016, 13(3): 26−45 doi: 10.4018/IJWSR.2016070102

    [15]

    Zheng Dequan, Li Fenghuan, Zhao Tiejun. Self-adaptive statistical process control for anomaly detection in time series[J]. Expert Systems with Applications, 2016, 57: 324−336

    [16]

    Izakian H, Pedrycz W. Anomaly detection in time series data using a fuzzy c-means clustering[C] //Proc of the Joint IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS). Piscataway, NJ: IEEE, 2013: 1513−1518

    [17]

    Medico R, Spina D, VandeGinste D, et al. Autoencoding density-based anomaly detection for signal integrity applications[C] //Proc of the 27th IEEE Conf on Electrical Performance of Electronic Packaging and Systems (EPEPS). Piscataway, NJ: IEEE, 2018: 47−49

    [18]

    Zhang Chunkai, Chen Yingyang, Yin Ao. Anomaly subsequence detection with dynamic local density for time series[G] //LNCS 11707: Proc of the 30th Int Conf on Database and Expert Systems Applications. Berlin: Springer, 2019: 291−305

    [19]

    Dunn J C. A fuzzy relative of the ISODATA process, and its use in detecting compact well-separated clusters[J]. Journal of Cybernetics, 1973, 3(3): 32−57 doi: 10.1080/01969727308546046

    [20]

    Pedrycz W, Oliveira V. A development of fuzzy encoding and decoding through fuzzy clustering[J]. IEEE Transactions on Instrumentation & Measurement, 2008, 57(4): 829−837

    [21]

    Genolini C, Ecochard R, Benghezal M, et al. kmlShape: An efficient method to cluster longitudinal data (time-series) according to their shapes[J]. PLoS ONE, 2016, 11(6): 1−24

    [22]

    Likas A, Vlassis N, Verbeek J J. The global k-means clustering algorithm[J]. Pattern Recognition, 2003, 36(2): 451−461 doi: 10.1016/S0031-3203(02)00060-2

    [23]

    Ma Ruizhe, Filali S B, Angryk R. Time series distance density cluster with statistical preprocessing[C] //Proc of the 20th Int Conf on Big Data Analytics and Knowledge Discovery. Berlin: Springer, 2018: 371−381

    [24]

    Paparrizos J, Gravano L. K-shape: Efficient and accurate clustering of time series[C] //Proc of the 42nd ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2015: 1855−1870

    [25]

    Breunig M M , Kriegel H P , Ng R T , et al. LOF: Identifying density-based local outliers[C] //Proc of the 28th ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2000: 93−104

    [26]

    Berndt D J, Clifford J. Using dynamic time warping to find patterns in time series[C] //Proc of the Workshop on Knowledge Discovery in Databases. New York: ACM, 1994: 229−248

    [27]

    Cover T, Hart P. Nearest neighbor pattern classification[J]. IEEE Transactions on Information Theory, 1967, 13(1): 21−27 doi: 10.1109/TIT.1967.1053964

    [28]

    Pukelsheim F. The three sigma rule[J]. The American Statistician, 1994, 48(2): 88−91

    [29]

    Pérez-Ortiz M, Durán-Rosal A M, Gutiérrez P A, et al. On the use of evolutionary time series analysis for segmenting paleoclimate data[J]. Neurocomputing, 2019, 326/327: 3−14

    [30]

    Dau H A, Bagnall A, Kamgar K, et al. The UCR time series archive[J]. IEEE/CAA Journal of Automatica Sinica, 2019, 6(6): 1293−1305 doi: 10.1109/JAS.2019.1911747

    [31]

    Qian Pengjiang, Chung Fulai, Wang Shitong, et al. Fast graph-based relaxed clustering for large data sets using minimal enclosing ball[J]. IEEE Transactions on Systems, Man, and Cybernetics, 2012, 42(3): 672−687 doi: 10.1109/TSMCB.2011.2172604

    [32]

    Wang Yangtao, Chen Lihui. K-MEAP: Multiple exemplars affinity propagation with specified K clusters[J]. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(12): 2670−2682 doi: 10.1109/TNNLS.2015.2495268

  • 期刊类型引用(2)

    1. 刘阳,鲁圆圆,郭成城. 基于优先级的数据中心任务优化调度算法设计. 计算机仿真. 2025(01): 497-500+507 . 百度学术
    2. 骆海霞. 基于递推估计的Web前端偶发任务能耗感知方法. 黑龙江工业学院学报(综合版). 2023(10): 115-120 . 百度学术

    其他类型引用(1)

图(8)  /  表(7)
计量
  • 文章访问数:  221
  • HTML全文浏览量:  37
  • PDF下载量:  147
  • 被引次数: 3
出版历程
  • 收稿日期:  2021-10-26
  • 修回日期:  2022-06-06
  • 网络出版日期:  2023-02-10
  • 刊出日期:  2022-12-31

目录

    /

    返回文章
    返回