Survey of Machine Learning-Based KPI Anomaly Detection on Internet-Based Services

Shang Shuyi; Li Hongjia; Song Chen; Lu Zhitong; Wang Liming; Xu Zhen

doi:10.7544/issn1000-1239.202330577

Shang Shuyi, Li Hongjia, Song Chen, Lu Zhitong, Wang Liming, Xu Zhen. Survey of Machine Learning-Based KPI Anomaly Detection on Internet-Based ServicesJ. Journal of Computer Research and Development, 2025, 62(1): 207-231. DOI: 10.7544/issn1000-1239.202330577

Citation:

Survey of Machine Learning-Based KPI Anomaly Detection on Internet-Based Services

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Key performance indicator (KPI) anomaly detection is a fundamental technology for artificial intelligence for IT operations (AIOps) of Internet-based services. To improve the efficiency and accuracy of KPI anomaly detection, machine learning-based KPI anomaly detection has become a hotspot in both academia and industry recently. Through synthetically analyzing prior arts in this field, we first provide a technical framework of KPI anomaly detection for Internet-based services. Then, from the perspective of mining KPI’s dependency patterns in different domains (including time domain, metric domain and entity domain), we explore the motivation for model selection of KPI anomaly detection on three KPI types (including univariate KPI, multivariate KPIs and matrix-variate KPIs). Furthermore, guided by the detection performance objectives, we elaborate on KPI anomaly detection techniques from two perspectives: accuracy-centric anomaly detection techniques which focus on how to improve the accuracy of KPI anomaly detection models and multi-objective balancing-centric anomaly detection techniques which focus on how to balance theoretical performance with actual application objectives. Finally, we sort out five challenges on machine learning-based KPI anomaly detection, including KPI monitoring and KPI pre-processing, generality of the model, interpretability of the model, alarm management of anomalies, and limitations of KPI anomaly detection; and we also point out the corresponding potential research directions.

FullText(HTML)

References (126)

Cited By

Turn off MathJax

Article Contents

Survey of Machine Learning-Based KPI Anomaly Detection on Internet-Based Services

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content