Abstract:
With the number of the official reported vulnerabilities is exponentially increasing, the researches aiming at the techniques of vulnerability detection is arising. The diversity of vulnerability types and the unicity of detection methods result in the limitation of the vulnerability detection achievement. The main streams of the research on vulnerability detection methods are static detection and dynamic detection. Static detection includes document analysis, cross validation, and program analysis, etc. With the natural language processing is rising and the knowledge is booming, the researchers explore the possibility of vulnerability detection on multiple data resources with the help of natural language processing technique. In this paper, the literatures are classified into four parts which are official document, code, code comment and the vulnerability-related information based on the sources of information. Firstly, we extract the technical details and classify the research achievement by conducting an investigation on the related researches of the vulnerability detection methods based on natural language processing in recent 10 years, and then we summarize the relative merits of the research achievement by comparing and analyzing the researches originated from various data sources. Finally, through conducting cross comparison and in-depth exploration researches, we conclude eight types of limitations of vulnerability detection methods based on natural language processing and then discuss the solutions on the level of data, technique and effect, and meanwhile propose the future research trends.