Protocol Reverse Engineering Using Grammatical Inference
-
Graphical Abstract
-
Abstract
To deeply understand procedures of various network applications, and to automatically classify, recognize, trace and control them, protocol state machine that represents the application sessions have to be obtained in advance. A novel approach is presented to reversely infer protocol state machine from collected application layer data. Protocol state machine is derived with a method of error-correcting grammatical inference based on the state sequences that appear in the application sessions. To richly mine and bring into play the performance of error-collecting, a criterion of best-matching path is presented to solve the difficulty of path selection during the error-correcting process. A method with regard to abnormal indegree discrimination and pruning on the basis of statistical probability is proposed. Moreover, negative example sets with similar tokens are adopted to reinforce the error-collecting performance. In order to solve the state expansion during the reconstruction of the state machine, a simplifying measure to obtain a compact protocol state machine that expresses the internal operating mechanism of the protocol accurately is used based on state merging with removal of the identical token and model reduction with a similar behavioral semantic. The experiments conducted in a real network, containing a number of real applications with several application layer protocols, validate this method.
-
-