Abstract:
To deeply understand procedures of various network applications, and to automatically classify, recognize, trace and control them, protocol state machine that represents the application sessions have to be obtained in advance. A novel approach is presented to reversely infer protocol state machine from collected application layer data. Protocol state machine is derived with a method of error-correcting grammatical inference based on the state sequences that appear in the application sessions. To richly mine and bring into play the performance of error-collecting, a criterion of best-matching path is presented to solve the difficulty of path selection during the error-correcting process. A method with regard to abnormal indegree discrimination and pruning on the basis of statistical probability is proposed. Moreover, negative example sets with similar tokens are adopted to reinforce the error-collecting performance. In order to solve the state expansion during the reconstruction of the state machine, a simplifying measure to obtain a compact protocol state machine that expresses the internal operating mechanism of the protocol accurately is used based on state merging with removal of the identical token and model reduction with a similar behavioral semantic. The experiments conducted in a real network, containing a number of real applications with several application layer protocols, validate this method.