一种基于深度学习的微服务性能异常检测方法

方浩天; 李春花; 王清; 周可

doi:10.7544/issn1000-1239.202330543

一种基于深度学习的微服务性能异常检测方法

A Method of Microservice Performance Anomaly Detection Based on Deep Learning

摘要

摘要: 微服务架构因具有良好的可扩展性和可维护性越来越受到云应用软件的青睐. 与此同时，微服务之间复杂的交互使得系统的性能异常检测变得更加困难. 现有的微服务性能异常检测方法均不能很好地建立跨不同调用路径的微服务及其对应的响应时间之间的复杂关系，导致异常检测准确率不高、根因定位不准确. 提出了一种基于Transformer的微服务性能异常检测与根因定位方法TTEDA（Transformer trace explore data analysis）. 首先将调用链构建为微服务调用序列和对应的响应时间序列，然后借助自注意力机制捕捉微服务之间的调用关系，并通过编码器-解码器建立微服务的响应时间与其调用路径之间的关联关系，从而获得微服务在不同的调用链上的正常响应时间分布. 基于学习到的正常模式判断调用链的异常，并可将异常精确到微服务级别. 进一步地，利用微服务之间的调用关系以及异常的传播方式，对出现性能异常的微服务进行反向拓扑排序，实现了准确快速的根因定位. 在开源基准微服务系统 Train-Ticket的数据集和AIops挑战赛数据集评估了TTEDA的有效性，相比于同类异常检测方法AEVB，Multi-LSTM，TraceAnomaly，精确率平均提高了48.6%，30.2%，3.5%，召回率平均提高了34.7%，1.1%，4.1%. 相比于根因定位算法 MonitorRank和TraceAnomaly，根因定位的准确率分别提高了35.4个百分点和6.1个百分点.

Abstract: Microservice architecture is increasingly favored by cloud applications due to its good scalability and maintainability. Meanwhile, the complex interactions among microservices make it more difficult to detect performance anomalies in the system. Existing methods cannot adequately establish the complex relationship among microservices cross different call paths and their corresponding response time, resulting in low accuracy of anomaly detection and inaccurate root cause positioning. In this paper, we propose a Transformer based microservice performance anomaly detection and root cause positioning method TTEDA (Transformer trace explore data analysis), which constructs a call chain with microservice call sequence and its response time series, then captures the call relationship among microservices via self-attention mechanism, and the correlation between the response time of microservice and its call path is established through an encoder-decoder architecture, thus the normal response time distribution of microservice across different call chains is obtained. Based on the learned normal pattern, TTEDA can achieve accurate call chain anomaly detection and pinpoint the anomalies at the microservice level. Further, TTEDA uses the relationships among microservices and the propagation of anomalies to perform reverse topological sorting on abnormal microservices, achieving accurate and fast root cause localization. The effectiveness of TTEDA is evaluated on the dataset of the open source benchmark microservice system Train-Ticket and AIops Challenge dataset. Compared with similar methods AEVB, Multi-LSTM, and TraceAnomaly, TTEDA has an average precision improvement of 48.6%, 30.2%, and 3.5%, and an average recall improvement of 34.7%, 11.1%, and 4.1%. Compared with the root localization algorithms MonitorRank and TraceAnomaly, the accuracy of root localization is improved by 35.4% and 6.1%.

HTML全文

参考文献(20)

施引文献

资源附件(0)