

    A Log Anomaly Detection Algorithm for Debugging Based on Grammar-Based Codes

    • 摘要: 调试软件中的非确定错误对软件开发有重要意义.近年来,随着云计算系统的快速发展和对录制重放调试方法研究的深入,使用异常检测方法从大量文本日志或控制流日志等数据中找出异常的信息对调试愈发重要.传统的异常检测算法大多是为检测和防范攻击而设计的,它们很多基于马尔可夫假设,对事件流上的剧烈变化很敏感.但是新的问题要求异常检测能够检出语义级别的异常行为.实验表明现有的基于马尔可夫假设的异常检测算法在这方面表现不佳.提出了一种新的基于文法编码的异常检测算法.该算法不依赖于统计模型、概率模型、机器学习及马尔可夫假设,设计和实现都极为简单.实验表明在检测高层次的语义异常方面,该算法比传统方法有优势.


      Abstract: Debugging non-deterministic bugs has long been an important research area in software development. In recent years, with the rapid emerging of large cloud computing systems and the development of record replay debugging, the key of such debugging problem becomes mining anomaly information from text console logs andor execution flow logs. Anomaly detection algorithms can therefore be used in this area. However, although many approaches have been proposed, traditional anomaly detection algorithms are designed for detecting network attacking and not suitable for the new problems. One important reason is the Markov assumption on which many traditional anomaly detection methods are based. Markov-based methods are sensitive to harshly trashing in event transitions. In contrast, the new problems in system diagnosing require the abilities of detecting semantic misbehaviors. Experiment results show the powerless of Markov-based methods on those problems. This paper presents a novel anomaly detection algorithm which is based on grammar-based codes. Different from previous approaches, our algorithm is a non-Markov approach. It doesnt rely on statistic modeling, probability modeling or machine learning. Its principle is simple, and the algorithm is easy to implement. The new algorithm is tested on both generated sequences and real logs, and all tests results are positive. Compared with traditional methods, it is more sensitive to semantic misbehaviors.


