高级检索

    基于自然语言处理的漏洞检测方法综述

    Vulnerability Detection Methods Based on Natural Language Processing

    • 摘要: 随着官方发布的漏洞数量呈现指数的增长趋势,针对漏洞检测技术的研究应运而生.漏洞种类的多样性以及检测方法的单一性导致漏洞检测结果呈现一定的局限性.当前漏洞检测技术主要集中在静态检测和动态检测2方面.其中静态检测分析又分为文档分析法、交叉验证法以及程序分析方法等3类.随着自然语言处理技术的兴起和专家知识的不断扩展,研究人员探索了在多个数据源上利用自然语言处理技术辅助进行漏洞检测研究的可行性.根据信息类型的不同,分别从官方文档、代码、代码注释以及漏洞相关信息4部分内容出发,对基于自然语言处理的漏洞检测相关研究成果进行调研.首先,通过对近10年来基于自然语言处理技术的漏洞检测相关文献进行梳理,对相关成果进行分类并提取技术细节;接着,对不同数据源下的研究成果进行横向对比,总结当前基于自然语言处理技术的漏洞检测成果的优缺点;最后,通过交叉对比并深入分析,总结当前基于自然语言处理的漏洞检测方法中存在的8类问题,从数据、技术以及效果3方面进行解决方案的讨论,同时提出了未来研究方向.

       

      Abstract: With the number of the official reported vulnerabilities is exponentially increasing, the researches aiming at the techniques of vulnerability detection is arising. The diversity of vulnerability types and the unicity of detection methods result in the limitation of the vulnerability detection achievement. The main streams of the research on vulnerability detection methods are static detection and dynamic detection. Static detection includes document analysis, cross validation, and program analysis, etc. With the natural language processing is rising and the knowledge is booming, the researchers explore the possibility of vulnerability detection on multiple data resources with the help of natural language processing technique. In this paper, the literatures are classified into four parts which are official document, code, code comment and the vulnerability-related information based on the sources of information. Firstly, we extract the technical details and classify the research achievement by conducting an investigation on the related researches of the vulnerability detection methods based on natural language processing in recent 10 years, and then we summarize the relative merits of the research achievement by comparing and analyzing the researches originated from various data sources. Finally, through conducting cross comparison and in-depth exploration researches, we conclude eight types of limitations of vulnerability detection methods based on natural language processing and then discuss the solutions on the level of data, technique and effect, and meanwhile propose the future research trends.

       

    /

    返回文章
    返回