Abstract:
As text steganography becomes a new research hotspot of security communication recently, text steganalysis, whose purpose is to detect the presence of secret message in text, has attracted more and more attention. At present, all existing methods concerning text steganography can be roughly divided into three categories: those based on invisible characters, those based on format, and those based on natural language processing. The important major technique shared by most of the natural language processing based steganographic methods is utilizing synonym substitution, which embeds secret information by substituting the synonyms selectively. Since it has the advantages of good imperceptibility, robustness, it is much more difficult for the steganalysis researchers to detect the existence of the hidden information embedded using this type of approaches. Nevertheless, it is found that the synonym substitution based steganography can lead to an obvious increase in the probability of synonym pairs in the carrier text. In the light of this observation, a steganalysis algorithm which makes use of the number of synonym pairs to decide whether the hidden information exists in text or not is proposed. Experimental results show that the proposed algorithm can efficiently break the text steganography lying on synonym substitution. The achieved false negative rate is approximately 4% and the false positive rate is approximately 9.8%.