Abstract:
Debugging non-deterministic bugs has long been an important research area in software development. In recent years, with the rapid emerging of large cloud computing systems and the development of record replay debugging, the key of such debugging problem becomes mining anomaly information from text console logs andor execution flow logs. Anomaly detection algorithms can therefore be used in this area. However, although many approaches have been proposed, traditional anomaly detection algorithms are designed for detecting network attacking and not suitable for the new problems. One important reason is the Markov assumption on which many traditional anomaly detection methods are based. Markov-based methods are sensitive to harshly trashing in event transitions. In contrast, the new problems in system diagnosing require the abilities of detecting semantic misbehaviors. Experiment results show the powerless of Markov-based methods on those problems. This paper presents a novel anomaly detection algorithm which is based on grammar-based codes. Different from previous approaches, our algorithm is a non-Markov approach. It doesnt rely on statistic modeling, probability modeling or machine learning. Its principle is simple, and the algorithm is easy to implement. The new algorithm is tested on both generated sequences and real logs, and all tests results are positive. Compared with traditional methods, it is more sensitive to semantic misbehaviors.