基于多窗口划分集成学习的多维时间序列异常检测

王泽南; 王意洁; 周小晖; 熊旭东

doi:10.7544/issn1000-1239.202440573

基于多窗口划分集成学习的多维时间序列异常检测

Anomaly Detection for Multivariate Time Series Based on Multi-Window Segmentation Ensemble Learning

摘要

摘要: 大语言模型时代下，大语言模型的训练和推理需要算力资源的支撑，其中针对算力资源指标数据的异常检测能够有效保障大语言模型的正常训练和推理. 随着大语言模型参数的增加，大语言模型使用的算力资源规模日益扩大，其中反映算力运行状态的多类指标数据随着时间推移呈现出更复杂的时序周期性变化. 现有的多维时序异常检测方法通常采用预设的窗口大小对多维时序数据进行滑动切片，但忽略不同维度周期特性的统一窗口划分会截断部分维度时序数据的完整周期性模式，阻碍了异常检测模型对多维时序数据正常模式的学习，导致异常检测效果变差. 为了解决这一问题，提出了一种基于多窗口划分集成学习的无监督多维时序异常检测方法SELAD. 具体地，首先利用傅里叶变换提取多维时序数据中各维度的周期性模式，并鉴于此信息进行多窗口划分，以保留各维度数据的完整周期性模式. 在模型训练过程中，大语言模型参数量巨大的特点可以解决传统模型在滑动窗口增大后出现记忆瓶颈导致的学习效果变差的问题. 通过设计一种混合专家模型将保留完整周期划分的时序数据输入融合大模型和LSTM模型的集成学习框架进行训练，以学习并重构各维度的正常时序模式. 最终，基于重构误差检测多维时间序列数据中的异常. 通过在4个多维时间序列数据集上实验，SELAD在平均F1分数上相比现有方法提升了17.87~90.77个百分点.

Abstract: In the era of large language models, the training and inference of large language models need the support of arithmetic resources, in which the anomaly detection of arithmetic resource data can effectively guarantee the training and inference of large language models. As the parameters of the large language model increase, the scale of the arithmetic resources used by the large language model grows, in which the data of multiple types of metrics reflecting the operating state of computility show more complex temporal changes over time. Existing multivariate time series anomaly detection methods typically use a preset window size to perform sliding slicing on multivariate time series data. However, a unified window that ignores the periodic characteristics of different dimensions may truncate the complete periodic patterns of time series data in some dimensions, hindering the anomaly detection model from learning the normal patterns of multivariate time series data and resulting in poor anomaly detection performance. To address this issue, we propose an unsupervised multivariate time series anomaly detection method SELAD based on ensemble learning with multi-window extraction. Specifically, we first extract the periodic patterns of each dimension in the multivariate time series data based on the Fourier frequency method, and then perform multi-window extraction to preserve the complete periodic patterns of each dimension. In the process of model training, the huge number of parameters of the large language model can solve the problem that the traditional model has a memory bottleneck when the sliding window increases, which leads to the deterioration of the learning effect. Subsequently, by designing a mixed expert models (MoEs), the time series data from multiple partitioned windows are input into an ensemble learning framework that integrates large language models and LSTM models for training, in order to learn and identify the normal temporal patterns of each dimension. Finally, anomaly detection is performed based on reconstruction scores. In this study, experiment results on four real-world multivariate time series datasets demonstrate that SELAD improves the F1 score by 17.87% to 90.77% compared with existing methods.

HTML全文

参考文献(28)

施引文献

资源附件(0)