• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Fang Haotian, Li Chunhua, Wang Qing, Zhou Ke. A Method of Microservice Performance Anomaly Detection Based on Deep Learning[J]. Journal of Computer Research and Development, 2024, 61(3): 600-613. DOI: 10.7544/issn1000-1239.202330543
Citation: Fang Haotian, Li Chunhua, Wang Qing, Zhou Ke. A Method of Microservice Performance Anomaly Detection Based on Deep Learning[J]. Journal of Computer Research and Development, 2024, 61(3): 600-613. DOI: 10.7544/issn1000-1239.202330543

A Method of Microservice Performance Anomaly Detection Based on Deep Learning

Funds: This work was supported by the Key Program of the National Natural Science Foundation of China (62232007) and the Innovation Group Project of the National Natural Science Foundation of China (61821003).
More Information
  • Author Bio:

    Fang Haotian: born in 1999. Master. His main research interests include AIops and AI for storage

    Li Chunhua: born in 1971. PhD, associate professor. Member of CCF. Her main research interests include edge storage, KV storage system, intelligent data management, and AIops

    Wang Qing: born in 2001. Master. His main research interests include KV storage system and intelligent data management

    Zhou Ke: born in 1974. PhD, professor. His main research interests include AI for storage, big data processing, and AIops

  • Received Date: June 20, 2023
  • Revised Date: December 27, 2023
  • Available Online: January 15, 2024
  • Microservice architecture is increasingly favored by cloud applications due to its good scalability and maintainability. Meanwhile, the complex interactions among microservices make it more difficult to detect performance anomalies in the system. Existing methods cannot adequately establish the complex relationship among microservices cross different call paths and their corresponding response time, resulting in low accuracy of anomaly detection and inaccurate root cause positioning. In this paper, we propose a Transformer based microservice performance anomaly detection and root cause positioning method TTEDA (Transformer trace explore data analysis), which constructs a call chain with microservice call sequence and its response time series, then captures the call relationship among microservices via self-attention mechanism, and the correlation between the response time of microservice and its call path is established through an encoder-decoder architecture, thus the normal response time distribution of microservice across different call chains is obtained. Based on the learned normal pattern, TTEDA can achieve accurate call chain anomaly detection and pinpoint the anomalies at the microservice level. Further, TTEDA uses the relationships among microservices and the propagation of anomalies to perform reverse topological sorting on abnormal microservices, achieving accurate and fast root cause localization. The effectiveness of TTEDA is evaluated on the dataset of the open source benchmark microservice system Train-Ticket and AIops Challenge dataset. Compared with similar methods AEVB, Multi-LSTM, and TraceAnomaly, TTEDA has an average precision improvement of 48.6%, 30.2%, and 3.5%, and an average recall improvement of 34.7%, 11.1%, and 4.1%. Compared with the root localization algorithms MonitorRank and TraceAnomaly, the accuracy of root localization is improved by 35.4% and 6.1%.

  • [1]
    He Zhang, Li Shanshan, Jia Zijia, et al. Microservice architecture in reality: An industrial inquiry[C]//Proc of the 2019 IEEE Int Conf on Software Architecture (ICSA). Piscataway, NJ: IEEE, 2019: 51−60
    [2]
    罗睿辞,叶蔚,刘学洋,等. 基于拥塞博弈的微服务运行时资源管理方法[J]. 电子学报,2019,47(7):1497−1505 doi: 10.3969/j.issn.0372-2112.2019.07.013

    Luo Ruici, Ye Wei, Liu Xueyang, et al. A runtime resource management approach of microservices based on congestion game[J]. Acta Electronica Sinica, 2019, 47(7): 1497−1505(in Chinese) doi: 10.3969/j.issn.0372-2112.2019.07.013
    [3]
    裴丹,张圣林,裴昶华,等. 基于机器学习的智能运维[J]. 中国计算机学会通讯,2017,13(12):68−72

    Pei Dan, Zhang Shenglin, Pei Changhua, et al. Intelligent operation and maintenance based on machine learning[J]. Communications of the CCF, 2017, 13(12): 68−72(in Chinese)
    [4]
    Liu Ping, Xu Haowen, Ouyang Qianyu, et al. Unsupervised detection of microservice trace anomalies through service-level deep Bayesian networks[C]//Proc of the 31st IEEE Int Symp on Software Reliability Engineering (ISSRE). Piscataway, NJ: IEEE, 2020: 48−58
    [5]
    Nedelkoski S, Cardoso J, Kao O. Anomaly detection and classification using distributed tracing and deep learning[C]//Proc of the 19th IEEE/ACM Int Symp on Cluster, Cloud and Grid Computing (CCGRID). Piscataway, NJ: IEEE, 2019: 241−250
    [6]
    Nedelkoski S, Cardoso J, Kao O. Anomaly detection from system tracing data using multimodal deep learning[C]//Proc of the 12th IEEE Int Conf on Cloud Computing (CLOUD). Piscataway, NJ: IEEE, 2019: 179−186
    [7]
    陈兴蜀,金逸灵,王玉龙,等. 基于长短期记忆神经网络的容器内进程异常行为检测[J]. 电子学报,2021,49(1):149−156 doi: 10.12263/DZXB.20190220

    Chen Xingshu, Jin Yiling, Wang Yulong, et al. Anomaly detection of processes behavior in container based on LSTM neural network[J]. Acta Electronica Sinica, 2021, 49(1): 149−156(in Chinese) doi: 10.12263/DZXB.20190220
    [8]
    张攀,高丰,周逸,等. 一种在线实时微服务调用链异常检测方法[J]. 计算机工程,2022,48(11):161−169

    Zhang Pan, Gao Feng, Zhou Yi, et al. An online anomaly detection method using on microservice call chain[J]. Computer Engineering, 2022, 48(11): 161−169(in Chinese)
    [9]
    吴佳洁,吴绍岭,王伟. 基于TCN和注意力机制的异常检测和定位算法[J]. 信息网络安全,2021,32(11):85−94 doi: 10.3969/j.issn.1671-1122.2021.11.010

    Wu Jiajie, Wu Shaoling, Wang Wei. Anomaly detection and position algorithm based on TCN and attention mechanism[J]. Netinfo Security, 2021, 32(11): 85−94(in Chinese) doi: 10.3969/j.issn.1671-1122.2021.11.010
    [10]
    Bogatinovski J, Nedelkoski S, Cardoso J, et al. Self-supervised anomaly detection from distributed traces[C]//Proc of the 13th IEEE/ACM Int Conf on Utility and Cloud Computing (UCC). Piscataway, NJ: IEEE, 2020: 342−347
    [11]
    Jin Mingxu, Lv A, Zhu Yuanpeng, et al. An anomaly detection algorithm for microservice architecture based on robust principal component analysis[J]. IEEE Access, 2020, 8: 226397−226408 doi: 10.1109/ACCESS.2020.3044610
    [12]
    Li Kexin, Li Jing, Liu Shuji, et al. GA-iForest: An efficient isolated forest framework based on genetic algorithm for numerical data outlier detection[J]. Transactions of Nanjing University of Aeronautics & Astronautics, 2020, 36(6): 1026−1038
    [13]
    Amer M, Goldstein M, Abdennadher S. Enhancing one-class support vector machines for unsupervised anomaly detection[C]//Proc of the ACM SIGKDD Workshop on Outlier Detection and Description. New York: ACM, 2013: 8−15
    [14]
    Chandola V, Banerjee A, Kumar V. Anomaly detection: A survey[J]. ACM Computing Surveys, 2009, 41(3): 1−58
    [15]
    Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J/OL]//Advances in Neural Information Processing Systems. 2017[2023-12-24].https://proceedings.neurips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
    [16]
    Lawrence R L, Wright A. Rule-based classification systems using classification and regression tree analysis[J]. Photogrammetric Engineering and Remote Sensing, 2001, 67(10): 1137−1142
    [17]
    Zhou Xiang, Peng Xin, Xie Tao, et al. Latent error prediction and fault localization for microservice applications by learning from system trace logs[C]//Proc of the 27th ACM Joint Meeting on European Software Engineering Conf and Symp on the Foundations of Software Engineering. New York: ACM, 2019: 683−694
    [18]
    Zhou Xiang, Peng Xin, Xie Tao, et al. Fault analysis and debugging of microservice systems: Industrial survey, benchmark system, and empirical study[J]. IEEE Transactions on Software Engineering, 2018, 47(2): 243−260
    [19]
    Hou Xiaofeng, Liu Jiacheng, Li Chao, et al. Unleashing the scalability potential of power-constrained data center in the microservice era[C/OL]//Proc of the 48th Int Conf on Parallel Processing. 2019 [2023-12-23].https://dl.acm.org/doi/abs/10.1145/3337821.3337857
    [20]
    Yu Guangba, Chen Pengfei, Chen Hongyang, et al. MicroRank: End-to-end latency issue localization with extended spectrum analysis in microservice environments[C]//Proc of the 2021 Web Conf . New York: ACM, 2021: 3087−3098
  • Related Articles

    [1]Lv Qianru, Xu Jinwei, Jiang Jingfei, Li Dongsheng. DAQ: Divide-and-Conquer Based Adaptive Low-Bit Quantization for Vision Transformer[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550145
    [2]Wei Xuechao, Zhou Zhe, Xu Yinghui, Zhang Jiejing, Xie Yuan, Sun Guangyu. PetS: A Scalable Inference Serving System for Parameter-Efficient Transformers[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440206
    [3]Cheng Xiaotian, Ding Weiping, Geng Yu, Huang Jiashuang, Ju Hengrong, Guo Jing. Transformer Interpretation Method Based on Sequential Three-Way Mask and Attention Fusion[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440382
    [4]Liu Weixin, Guan Yewei, Huo Jiarong, Ding Yuanchao, Guo Hua, Li Bo. A Fast and Secure Transformer Inference Scheme with Secure Multi-Party Computation[J]. Journal of Computer Research and Development, 2024, 61(5): 1218-1229. DOI: 10.7544/issn1000-1239.202330966
    [5]Wang Zhenyan, Jiang Shengcheng, Song Qihong, Liu Bo, Bi Xiuli, Xiao Bin. Transformer-Based Image Restoration Method for Cultural Relics[J]. Journal of Computer Research and Development, 2024, 61(3): 748-761. DOI: 10.7544/issn1000-1239.202220623
    [6]Fan Wei, Liu Yong. Social Network Information Diffusion Prediction Based on Spatial-Temporal Transformer[J]. Journal of Computer Research and Development, 2022, 59(8): 1757-1769. DOI: 10.7544/issn1000-1239.20220064
    [7]He Xiao, Ma Zhiyi, Shao Weizhong, Hu Changjun. A Graphical Modeling Language for Model Transformations[J]. Journal of Computer Research and Development, 2015, 52(9): 2145-2162. DOI: 10.7544/issn1000-1239.2015.20148187
    [8]Zhao Xiaoming, Ye Xijian. A New Approach to Ridgelet Transform[J]. Journal of Computer Research and Development, 2008, 45(5): 915-922.
    [9]Wen Guihua. Relative Transformation for Machine Learning[J]. Journal of Computer Research and Development, 2008, 45(4): 612-618.
    [10]Jia Jian, Jiao Licheng. Implementation of Digital Ridgelet Transform and a New Method[J]. Journal of Computer Research and Development, 2006, 43(1): 115-119.
  • Cited by

    Periodical cited type(17)

    1. 贾熹滨,魏心岚. 异常行为敏感的学生行为时序建模及心理健康预测方法. 北京工业大学学报. 2024(08): 939-947 .
    2. 杨坤融,熊余,张健,储雯. 面向长短期混合数据的MOOC辍学预测策略研究. 计算机工程与应用. 2023(04): 130-138 .
    3. 戴宇睿,安俊秀,陶全桧. 融合双通路注意力与VT-LSTM的金融时序预测. 计算机工程与应用. 2023(12): 157-165 .
    4. 张文奇,王海瑞,朱贵富. 基于因果推断和多头自注意力机制的学生成绩预测. 现代电子技术. 2023(17): 111-116 .
    5. 罗文劼,肖梓良. 结合图卷积的在线编程系统成绩预测模型. 计算机工程与设计. 2023(09): 2769-2776 .
    6. 罗文劼,肖梓良. 融合知识点与图卷积的在线编程题目推荐算法. 小型微型计算机系统. 2023(10): 2331-2337 .
    7. 刘彤,齐慧冉,倪维健. 基于多层特征融合的学生成绩预测模型. 计算机工程与设计. 2023(10): 2973-2978 .
    8. 张文娟,张彬,杨皓哲. 基于双注意力机制的成绩预测. 南京师大学报(自然科学版). 2023(04): 103-113 .
    9. 徐小玉. 基于异构信息网络的学生成绩预测与预警模型研究. 信息技术与网络安全. 2022(01): 84-89 .
    10. 马超. 基于历史数据驱动的运动员成绩估计研究. 微型电脑应用. 2022(02): 145-148 .
    11. 李琪. 基于XGBoost的科目分类方法的学生成绩预测研究. 信息与电脑(理论版). 2022(05): 244-246+250 .
    12. 李崇照,王法玉. 基于循环门单元和注意力机制的学生学习积极性预测模型. 天津理工大学学报. 2022(02): 14-19 .
    13. 李菲,曹阳,顾问. 基于秩相关性分析的学生在线学习效果预测方法. 信息技术与信息化. 2022(09): 99-102 .
    14. 王丹萍,王忠,梁宏涛. 基于深度学习的知识追踪研究综述. 计算机测量与控制. 2022(12): 1-10 .
    15. 崔立志,何泽彬,李璇. 基于注意力的R-GCN-GRU的在线学生绩效预测. 电子测量技术. 2021(19): 69-75 .
    16. 何雪锋. 基于机器学习的“软助”证书挂科生分类预测研究. 河北软件职业技术学院学报. 2021(04): 6-10 .
    17. 靳现凯,宋威. 基于DNN的大学生学业成绩预测方法研究——以北京市某高校电子信息类专业为例. 北方工业大学学报. 2021(05): 134-140 .

    Other cited types(34)

Catalog

    Article views (422) PDF downloads (180) Cited by(51)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return