Citation: | Fang Haotian, Li Chunhua, Wang Qing, Zhou Ke. A Method of Microservice Performance Anomaly Detection Based on Deep Learning[J]. Journal of Computer Research and Development, 2024, 61(3): 600-613. DOI: 10.7544/issn1000-1239.202330543 |
Microservice architecture is increasingly favored by cloud applications due to its good scalability and maintainability. Meanwhile, the complex interactions among microservices make it more difficult to detect performance anomalies in the system. Existing methods cannot adequately establish the complex relationship among microservices cross different call paths and their corresponding response time, resulting in low accuracy of anomaly detection and inaccurate root cause positioning. In this paper, we propose a Transformer based microservice performance anomaly detection and root cause positioning method TTEDA (Transformer trace explore data analysis), which constructs a call chain with microservice call sequence and its response time series, then captures the call relationship among microservices via self-attention mechanism, and the correlation between the response time of microservice and its call path is established through an encoder-decoder architecture, thus the normal response time distribution of microservice across different call chains is obtained. Based on the learned normal pattern, TTEDA can achieve accurate call chain anomaly detection and pinpoint the anomalies at the microservice level. Further, TTEDA uses the relationships among microservices and the propagation of anomalies to perform reverse topological sorting on abnormal microservices, achieving accurate and fast root cause localization. The effectiveness of TTEDA is evaluated on the dataset of the open source benchmark microservice system Train-Ticket and AIops Challenge dataset. Compared with similar methods AEVB, Multi-LSTM, and TraceAnomaly, TTEDA has an average precision improvement of 48.6%, 30.2%, and 3.5%, and an average recall improvement of 34.7%, 11.1%, and 4.1%. Compared with the root localization algorithms MonitorRank and TraceAnomaly, the accuracy of root localization is improved by 35.4% and 6.1%.
[1] |
He Zhang, Li Shanshan, Jia Zijia, et al. Microservice architecture in reality: An industrial inquiry[C]//Proc of the 2019 IEEE Int Conf on Software Architecture (ICSA). Piscataway, NJ: IEEE, 2019: 51−60
|
[2] |
罗睿辞,叶蔚,刘学洋,等. 基于拥塞博弈的微服务运行时资源管理方法[J]. 电子学报,2019,47(7):1497−1505 doi: 10.3969/j.issn.0372-2112.2019.07.013
Luo Ruici, Ye Wei, Liu Xueyang, et al. A runtime resource management approach of microservices based on congestion game[J]. Acta Electronica Sinica, 2019, 47(7): 1497−1505(in Chinese) doi: 10.3969/j.issn.0372-2112.2019.07.013
|
[3] |
裴丹,张圣林,裴昶华,等. 基于机器学习的智能运维[J]. 中国计算机学会通讯,2017,13(12):68−72
Pei Dan, Zhang Shenglin, Pei Changhua, et al. Intelligent operation and maintenance based on machine learning[J]. Communications of the CCF, 2017, 13(12): 68−72(in Chinese)
|
[4] |
Liu Ping, Xu Haowen, Ouyang Qianyu, et al. Unsupervised detection of microservice trace anomalies through service-level deep Bayesian networks[C]//Proc of the 31st IEEE Int Symp on Software Reliability Engineering (ISSRE). Piscataway, NJ: IEEE, 2020: 48−58
|
[5] |
Nedelkoski S, Cardoso J, Kao O. Anomaly detection and classification using distributed tracing and deep learning[C]//Proc of the 19th IEEE/ACM Int Symp on Cluster, Cloud and Grid Computing (CCGRID). Piscataway, NJ: IEEE, 2019: 241−250
|
[6] |
Nedelkoski S, Cardoso J, Kao O. Anomaly detection from system tracing data using multimodal deep learning[C]//Proc of the 12th IEEE Int Conf on Cloud Computing (CLOUD). Piscataway, NJ: IEEE, 2019: 179−186
|
[7] |
陈兴蜀,金逸灵,王玉龙,等. 基于长短期记忆神经网络的容器内进程异常行为检测[J]. 电子学报,2021,49(1):149−156 doi: 10.12263/DZXB.20190220
Chen Xingshu, Jin Yiling, Wang Yulong, et al. Anomaly detection of processes behavior in container based on LSTM neural network[J]. Acta Electronica Sinica, 2021, 49(1): 149−156(in Chinese) doi: 10.12263/DZXB.20190220
|
[8] |
张攀,高丰,周逸,等. 一种在线实时微服务调用链异常检测方法[J]. 计算机工程,2022,48(11):161−169
Zhang Pan, Gao Feng, Zhou Yi, et al. An online anomaly detection method using on microservice call chain[J]. Computer Engineering, 2022, 48(11): 161−169(in Chinese)
|
[9] |
吴佳洁,吴绍岭,王伟. 基于TCN和注意力机制的异常检测和定位算法[J]. 信息网络安全,2021,32(11):85−94 doi: 10.3969/j.issn.1671-1122.2021.11.010
Wu Jiajie, Wu Shaoling, Wang Wei. Anomaly detection and position algorithm based on TCN and attention mechanism[J]. Netinfo Security, 2021, 32(11): 85−94(in Chinese) doi: 10.3969/j.issn.1671-1122.2021.11.010
|
[10] |
Bogatinovski J, Nedelkoski S, Cardoso J, et al. Self-supervised anomaly detection from distributed traces[C]//Proc of the 13th IEEE/ACM Int Conf on Utility and Cloud Computing (UCC). Piscataway, NJ: IEEE, 2020: 342−347
|
[11] |
Jin Mingxu, Lv A, Zhu Yuanpeng, et al. An anomaly detection algorithm for microservice architecture based on robust principal component analysis[J]. IEEE Access, 2020, 8: 226397−226408 doi: 10.1109/ACCESS.2020.3044610
|
[12] |
Li Kexin, Li Jing, Liu Shuji, et al. GA-iForest: An efficient isolated forest framework based on genetic algorithm for numerical data outlier detection[J]. Transactions of Nanjing University of Aeronautics & Astronautics, 2020, 36(6): 1026−1038
|
[13] |
Amer M, Goldstein M, Abdennadher S. Enhancing one-class support vector machines for unsupervised anomaly detection[C]//Proc of the ACM SIGKDD Workshop on Outlier Detection and Description. New York: ACM, 2013: 8−15
|
[14] |
Chandola V, Banerjee A, Kumar V. Anomaly detection: A survey[J]. ACM Computing Surveys, 2009, 41(3): 1−58
|
[15] |
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J/OL]//Advances in Neural Information Processing Systems. 2017[2023-12-24].https://proceedings.neurips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
|
[16] |
Lawrence R L, Wright A. Rule-based classification systems using classification and regression tree analysis[J]. Photogrammetric Engineering and Remote Sensing, 2001, 67(10): 1137−1142
|
[17] |
Zhou Xiang, Peng Xin, Xie Tao, et al. Latent error prediction and fault localization for microservice applications by learning from system trace logs[C]//Proc of the 27th ACM Joint Meeting on European Software Engineering Conf and Symp on the Foundations of Software Engineering. New York: ACM, 2019: 683−694
|
[18] |
Zhou Xiang, Peng Xin, Xie Tao, et al. Fault analysis and debugging of microservice systems: Industrial survey, benchmark system, and empirical study[J]. IEEE Transactions on Software Engineering, 2018, 47(2): 243−260
|
[19] |
Hou Xiaofeng, Liu Jiacheng, Li Chao, et al. Unleashing the scalability potential of power-constrained data center in the microservice era[C/OL]//Proc of the 48th Int Conf on Parallel Processing. 2019 [2023-12-23].https://dl.acm.org/doi/abs/10.1145/3337821.3337857
|
[20] |
Yu Guangba, Chen Pengfei, Chen Hongyang, et al. MicroRank: End-to-end latency issue localization with extended spectrum analysis in microservice environments[C]//Proc of the 2021 Web Conf . New York: ACM, 2021: 3087−3098
|
[1] | Lv Qianru, Xu Jinwei, Jiang Jingfei, Li Dongsheng. DAQ: Divide-and-Conquer Based Adaptive Low-Bit Quantization for Vision Transformer[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550145 |
[2] | Wei Xuechao, Zhou Zhe, Xu Yinghui, Zhang Jiejing, Xie Yuan, Sun Guangyu. PetS: A Scalable Inference Serving System for Parameter-Efficient Transformers[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440206 |
[3] | Cheng Xiaotian, Ding Weiping, Geng Yu, Huang Jiashuang, Ju Hengrong, Guo Jing. Transformer Interpretation Method Based on Sequential Three-Way Mask and Attention Fusion[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440382 |
[4] | Liu Weixin, Guan Yewei, Huo Jiarong, Ding Yuanchao, Guo Hua, Li Bo. A Fast and Secure Transformer Inference Scheme with Secure Multi-Party Computation[J]. Journal of Computer Research and Development, 2024, 61(5): 1218-1229. DOI: 10.7544/issn1000-1239.202330966 |
[5] | Wang Zhenyan, Jiang Shengcheng, Song Qihong, Liu Bo, Bi Xiuli, Xiao Bin. Transformer-Based Image Restoration Method for Cultural Relics[J]. Journal of Computer Research and Development, 2024, 61(3): 748-761. DOI: 10.7544/issn1000-1239.202220623 |
[6] | Fan Wei, Liu Yong. Social Network Information Diffusion Prediction Based on Spatial-Temporal Transformer[J]. Journal of Computer Research and Development, 2022, 59(8): 1757-1769. DOI: 10.7544/issn1000-1239.20220064 |
[7] | He Xiao, Ma Zhiyi, Shao Weizhong, Hu Changjun. A Graphical Modeling Language for Model Transformations[J]. Journal of Computer Research and Development, 2015, 52(9): 2145-2162. DOI: 10.7544/issn1000-1239.2015.20148187 |
[8] | Zhao Xiaoming, Ye Xijian. A New Approach to Ridgelet Transform[J]. Journal of Computer Research and Development, 2008, 45(5): 915-922. |
[9] | Wen Guihua. Relative Transformation for Machine Learning[J]. Journal of Computer Research and Development, 2008, 45(4): 612-618. |
[10] | Jia Jian, Jiao Licheng. Implementation of Digital Ridgelet Transform and a New Method[J]. Journal of Computer Research and Development, 2006, 43(1): 115-119. |