ISSN 1000-1239 CN 11-1777/TP

• Paper • Previous Articles     Next Articles

Protocol Reverse Engineering Using Grammatical Inference

Xiao Mingming1,2 and Yu Shunzheng1   

  1. 1(School of Information Science and Technology, Sun Yat-sen University, Guangzhou 510006) 2(School of Information Science and Technology, Zhongkai University of Agriculture and Engineering, Guangzhou 510225)
  • Online:2013-10-15

Abstract: To deeply understand procedures of various network applications, and to automatically classify, recognize, trace and control them, protocol state machine that represents the application sessions have to be obtained in advance. A novel approach is presented to reversely infer protocol state machine from collected application layer data. Protocol state machine is derived with a method of error-correcting grammatical inference based on the state sequences that appear in the application sessions. To richly mine and bring into play the performance of error-collecting, a criterion of best-matching path is presented to solve the difficulty of path selection during the error-correcting process. A method with regard to abnormal indegree discrimination and pruning on the basis of statistical probability is proposed. Moreover, negative example sets with similar tokens are adopted to reinforce the error-collecting performance. In order to solve the state expansion during the reconstruction of the state machine, a simplifying measure to obtain a compact protocol state machine that expresses the internal operating mechanism of the protocol accurately is used based on state merging with removal of the identical token and model reduction with a similar behavioral semantic. The experiments conducted in a real network, containing a number of real applications with several application layer protocols, validate this method.

Key words: protocol reverse engineering, protocol state machine inference, protocol analysis, error-correcting grammatical inference, network security