• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

面向云数据中心多语法日志通用异常检测机制

张圣林, 李东闻, 孙永谦, 孟伟彬, 张宇哲, 张玉志, 刘莹, 裴丹

张圣林, 李东闻, 孙永谦, 孟伟彬, 张宇哲, 张玉志, 刘莹, 裴丹. 面向云数据中心多语法日志通用异常检测机制[J]. 计算机研究与发展, 2020, 57(4): 778-790. DOI: 10.7544/issn1000-1239.2020.20190875
引用本文: 张圣林, 李东闻, 孙永谦, 孟伟彬, 张宇哲, 张玉志, 刘莹, 裴丹. 面向云数据中心多语法日志通用异常检测机制[J]. 计算机研究与发展, 2020, 57(4): 778-790. DOI: 10.7544/issn1000-1239.2020.20190875
Zhang Shenglin, Li Dongwen, Sun Yongqian, Meng Weibin, Zhang Yuzhe, Zhang Yuzhi, Liu Ying, Pei Dan. Unified Anomaly Detection for Syntactically Diverse Logs in Cloud Datacenter[J]. Journal of Computer Research and Development, 2020, 57(4): 778-790. DOI: 10.7544/issn1000-1239.2020.20190875
Citation: Zhang Shenglin, Li Dongwen, Sun Yongqian, Meng Weibin, Zhang Yuzhe, Zhang Yuzhi, Liu Ying, Pei Dan. Unified Anomaly Detection for Syntactically Diverse Logs in Cloud Datacenter[J]. Journal of Computer Research and Development, 2020, 57(4): 778-790. DOI: 10.7544/issn1000-1239.2020.20190875
张圣林, 李东闻, 孙永谦, 孟伟彬, 张宇哲, 张玉志, 刘莹, 裴丹. 面向云数据中心多语法日志通用异常检测机制[J]. 计算机研究与发展, 2020, 57(4): 778-790. CSTR: 32373.14.issn1000-1239.2020.20190875
引用本文: 张圣林, 李东闻, 孙永谦, 孟伟彬, 张宇哲, 张玉志, 刘莹, 裴丹. 面向云数据中心多语法日志通用异常检测机制[J]. 计算机研究与发展, 2020, 57(4): 778-790. CSTR: 32373.14.issn1000-1239.2020.20190875
Zhang Shenglin, Li Dongwen, Sun Yongqian, Meng Weibin, Zhang Yuzhe, Zhang Yuzhi, Liu Ying, Pei Dan. Unified Anomaly Detection for Syntactically Diverse Logs in Cloud Datacenter[J]. Journal of Computer Research and Development, 2020, 57(4): 778-790. CSTR: 32373.14.issn1000-1239.2020.20190875
Citation: Zhang Shenglin, Li Dongwen, Sun Yongqian, Meng Weibin, Zhang Yuzhe, Zhang Yuzhi, Liu Ying, Pei Dan. Unified Anomaly Detection for Syntactically Diverse Logs in Cloud Datacenter[J]. Journal of Computer Research and Development, 2020, 57(4): 778-790. CSTR: 32373.14.issn1000-1239.2020.20190875

面向云数据中心多语法日志通用异常检测机制

基金项目: 国家重点研发计划项目(2018YFB0204304)
详细信息
  • 中图分类号: TP391

Unified Anomaly Detection for Syntactically Diverse Logs in Cloud Datacenter

Funds: This work was supported by the National Key Research and Development Plan of China (2018YFB0204304).
  • 摘要: 得益于自然语言处理和机器学习方法的快速发展,基于日志对云数据中心软硬件系统进行自动异常检测变得越来越流行.无监督学习方法不需要标记异常日志,但通常存在准确性较低、仍需标注大量正常日志的问题.尽管有监督学习方法的准确性较高,但由于不同软硬件系统产生不同类型的、语法各异的日志,导致有监督学习方法需要为每一类型日志标注足够多的异常日志以训练相应的异常检测模型,这极大地增加了标注异常日志的人力成本.与此同时,不同类型日志在发生异常时往往具有相同或相似的语义.因此,提出了一种跨日志类型的通用异常检测机制——LogMerge.该机制通过学习多语法日志的语义相似性,可实现日志异常模式的跨日志类型迁移,从而大大减少了异常标注开销.LogMerge采用词嵌入方法先后构建单词和模板的向量,然后使用聚类方法将语义相同或相近的模板聚成一类,解决了不同类型日志语法不同带来的挑战.此外,LogMerge结合CNN与LSTM方法构建异常检测模型,既有效提取了日志序列的前后依赖性,又显著降低了日志序列中噪声带来的影响.使用公开日志数据集的实验表明,相比于当前的有监督学习方法和无监督学习方法,LogMerge取得了更高的准确性.实验还验证了LogMerge能够显著减少异常标注工作量——在目标类型日志异常标注较少时,依然能够取得较高的准确性.
    Abstract: Benefit from the rapid development of natural language processing and machine learning methods, log based automatic anomaly detection is becoming increasingly popular for the software and hardware systems in cloud datacenters. Current unsupervised learning methods, requiring no labelled anomalies, still need to obtain a large number of normal logs and generally suffer from low accuracy. Although current supervised learning methods are accurate, they need much labelling efforts. This is because the syntax of different types of logs generated by different software/hardware systems varies greatly, and thus for each type of logs, supervised methods need sufficient anomaly labels to train its corresponding anomaly detection model. Meanwhile, different types of logs usually have the same or similar semantics when anomalies occur. In this paper, we propose LogMerge, which learns the semantic similarity among different types of logs and then transfers anomaly patterns across these logs. In this way, labelling efforts are reduced significantly. LogMerge employs a word embedding method to construct the vectors of words and templates, and then utilizes a clustering technique to group templates based on semantics, addressing the challenge that different types of logs are different in syntax. In addition, LogMerge combines CNN and LSTM to build an anomaly detection model, which not only effectively extracts the sequential feature of logs, but also minimizes the impact of noises in logs. We have conducted extensive experiments on publicly available datasets, which demonstrates that compared with the current supervised/unsupervised learning methods, LogMerge achieves higher accuracy. Moreover, LogMerge achieves high accuracy when there are few anomaly labels in the target type of logs, which therefore significantly reduces labelling efforts.
  • 期刊类型引用(9)

    1. 霍纬纲,侯振环. 基于多尺度卷积自注意力的多维时间序列预测. 计算机工程与设计. 2023(04): 1250-1258 . 百度学术
    2. 董红斌,韩爽,付强. 基于AR与DNN联合模型的地理传感器时间序列预测. 计算机科学. 2023(11): 41-48 . 百度学术
    3. 许丹丹,徐洋,张思聪,付子爔. 基于DCNN-GRU模型的XSS攻击检测方法. 计算机应用与软件. 2022(02): 324-329 . 百度学术
    4. 刘琳岚,肖庭忠,舒坚,牛明晓. 基于门控循环单元的链路质量预测. 工程科学与技术. 2022(06): 51-58 . 百度学术
    5. 吴蕾,曾慧平,王海威. 网络非平稳流量多尺度时间序列预测数学建模. 计算机仿真. 2021(08): 356-359+434 . 百度学术
    6. 罗佩,袁景凌,陈旻骋,盛德明. 面向教学资源的均值惩罚随机森林非平稳时序预测方法. 小型微型计算机系统. 2021(10): 2089-2094 . 百度学术
    7. 张冬梅,李金平,李江,余想,宋凯旋. 基于门控权重单元的多变量时间序列预测. 湖南大学学报(自然科学版). 2021(10): 105-112 . 百度学术
    8. 朱海浩,祝永新,汪辉. 基于深度置信网络的多变量时间序列分类方法. 计算机仿真. 2021(12): 262-266 . 百度学术
    9. 杜圣东,李天瑞,杨燕,王浩,谢鹏,洪西进. 一种基于序列到序列时空注意力学习的交通流预测模型. 计算机研究与发展. 2020(08): 1715-1728 . 本站查看

    其他类型引用(24)

计量
  • 文章访问数:  1382
  • HTML全文浏览量:  3
  • PDF下载量:  597
  • 被引次数: 33
出版历程
  • 发布日期:  2020-03-31

目录

    /

    返回文章
    返回