调试中基于文法编码的日志异常检测算法

王  楠; 韩冀中; 方金云

调试中基于文法编码的日志异常检测算法

1(中国科学院计算技术研究所高性能计算机研究中心北京 100190) 2(中国科学院大学北京 100190) 3(中国科学院信息工程研究所北京 100195) (wangnan06@ict.ac.cn)

计量
- 文章访问数: 0962
- HTML全文浏览量: 0
- PDF下载量: 0723
出版历程
- 发布日期: 2013-04-14

A Log Anomaly Detection Algorithm for Debugging Based on Grammar-Based Codes

1(High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190) 2(Graduate University of Chinese Academy of Sciences, Beijing 100090) 3(Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100195)

摘要

摘要: 调试软件中的非确定错误对软件开发有重要意义.近年来,随着云计算系统的快速发展和对录制重放调试方法研究的深入,使用异常检测方法从大量文本日志或控制流日志等数据中找出异常的信息对调试愈发重要.传统的异常检测算法大多是为检测和防范攻击而设计的,它们很多基于马尔可夫假设,对事件流上的剧烈变化很敏感.但是新的问题要求异常检测能够检出语义级别的异常行为.实验表明现有的基于马尔可夫假设的异常检测算法在这方面表现不佳.提出了一种新的基于文法编码的异常检测算法.该算法不依赖于统计模型、概率模型、机器学习及马尔可夫假设,设计和实现都极为简单.实验表明在检测高层次的语义异常方面,该算法比传统方法有优势.
- 调试 /
- 异常检测 /
- 文法编码 /
- 数据挖掘 /
- 录制重放
Abstract: Debugging non-deterministic bugs has long been an important research area in software development. In recent years, with the rapid emerging of large cloud computing systems and the development of record replay debugging, the key of such debugging problem becomes mining anomaly information from text console logs andor execution flow logs. Anomaly detection algorithms can therefore be used in this area. However, although many approaches have been proposed, traditional anomaly detection algorithms are designed for detecting network attacking and not suitable for the new problems. One important reason is the Markov assumption on which many traditional anomaly detection methods are based. Markov-based methods are sensitive to harshly trashing in event transitions. In contrast, the new problems in system diagnosing require the abilities of detecting semantic misbehaviors. Experiment results show the powerless of Markov-based methods on those problems. This paper presents a novel anomaly detection algorithm which is based on grammar-based codes. Different from previous approaches, our algorithm is a non-Markov approach. It doesnt rely on statistic modeling, probability modeling or machine learning. Its principle is simple, and the algorithm is easy to implement. The new algorithm is tested on both generated sequences and real logs, and all tests results are positive. Compared with traditional methods, it is more sensitive to semantic misbehaviors.
- debugging /
- anomaly detection /
- grammar-based codes /
- data mining /
- record replay