Abstract:
Microservice architecture is increasingly favored by cloud applications due to its good scalability and maintainability. Meanwhile, the complex interactions among microservices make it more difficult to detect performance anomalies in the system. Existing methods cannot adequately establish the complex relationship among microservices cross different call paths and their corresponding response time, resulting in low accuracy of anomaly detection and inaccurate root cause positioning. In this paper, we propose a Transformer based microservice performance anomaly detection and root cause positioning method TTEDA (Transformer trace explore data analysis), which constructs a call chain with microservice call sequence and its response time series, then captures the call relationship among microservices via self-attention mechanism, and the correlation between the response time of microservice and its call path is established through an encoder-decoder architecture, thus the normal response time distribution of microservice across different call chains is obtained. Based on the learned normal pattern, TTEDA can achieve accurate call chain anomaly detection and pinpoint the anomalies at the microservice level. Further, TTEDA uses the relationships among microservices and the propagation of anomalies to perform reverse topological sorting on abnormal microservices, achieving accurate and fast root cause localization. The effectiveness of TTEDA is evaluated on the dataset of the open source benchmark microservice system Train-Ticket and AIops Challenge dataset. Compared with similar methods AEVB, Multi-LSTM, and TraceAnomaly, TTEDA has an average precision improvement of 48.6%, 30.2%, and 3.5%, and an average recall improvement of 34.7%, 11.1%, and 4.1%. Compared with the root localization algorithms MonitorRank and TraceAnomaly, the accuracy of root localization is improved by 35.4% and 6.1%.