Abstract:
Key performance indicator (KPI) anomaly detection is a fundamental technology for artificial intelligence for IT operations (AIOps) of Internet-based services. To improve the efficiency and accuracy of KPI anomaly detection, machine learning-based KPI anomaly detection has become a hotspot in both academia and industry recently. Through synthetically analyzing prior arts in this field, we first provide a technical framework of KPI anomaly detection for Internet-based services. Then, from the perspective of mining KPI’s dependency patterns in different domains (including time domain, metric domain and entity domain), we explore the motivation for model selection of KPI anomaly detection on three KPI types (including univariate KPI, multivariate KPIs and matrix-variate KPIs) respectively. Furthermore, guided by the detection performance objectives, we elaborate on KPI anomaly detection techniques from two perspectives: accuracy-centric anomaly detection techniques which focus on how to improve the accuracy of KPI anomaly detection models and multi-objective balancing-centric anomaly detection techniques which focus on how to balance theoretical performance with actual application objectives. Finally, we sort out five challenges on machine learning-based KPI anomaly detection, including KPI monitoring and KPI pre-processing, generality of the model, interpretability of the model, alarm management of anomalies and limitations of KPI anomaly detection; and we also point out the corresponding potential research directions.