Abstract:
Format parsing for unknown security protocols is a critical problem that needs to be solved in the information security field. However, previous network-trace-based format parsing methods have only considered the plaintext format of payload data, and have not been suitable for security protocols which include a large number of ciphertext data. In this paper, to infer the message format of unknown security protocols from a large mount of network traces, we propose a novel format parsing approach-named SPFPA (security protocols format parsing approach). SPFPA presents a hierarchical method to extract the protocol keywords sequences using sequential pattern mining for the first time, which provides a new idea for plaintext format parsing. On this basis, SPFPA introduces a set of heuristics to search the possible ciphertext length fields, and then identifies ciphertext length fields and the corresponding ciphertext fields by using the randomness feature of ciphertext data. Finally we evaluate SPFPA on four classical security protocols, i.e. SSL protocol, SSH protocol, Needham-Schroeder (NS) public key protocol and sof protocol. Our experimental results show that without using dynamic binary analysis, SPFPA can parse true protocol format effectively, i.e. invariant fields, variable fields, ciphertext length fields and ciphertext fields, purely from network traces, and the inferred formats are highly accurate in identifying the protocols.