基于自然语言处理的漏洞检测方法综述

杨伊; 李滢; 陈恺

doi:10.7544/issn1000-1239.20210627

基于自然语言处理的漏洞检测方法综述

杨伊^1,2,
李滢^1,2,
陈恺^1,2,3

¹(信息安全国家重点实验室(中国科学院信息工程研究所) 北京 100093)
²(中国科学院大学网络空间安全学院北京 100049)
³(北京智源人工智能研究院北京 100084)

基金项目: 国家重点研发计划项目(2020AAA0105200)；国家自然科学基金项目(U1836211)；北京市自然科学基金项目(JQ18011)；中国科学院青年创新促进会资助项目；北京智源人工智能研究院资助项目(BAAI2020ZJ0402)

详细信息

中图分类号: TP391
计量
- 文章访问数: 800
- HTML全文浏览量: 31
- PDF下载量: 477
出版历程
- 发布日期: 2022-11-30

Vulnerability Detection Methods Based on Natural Language Processing

Yang Yi^1,2,
Li Ying^1,2,
Chen Kai^1,2,3

¹(State Key Laboratory of Information Security (Institute of Information Engineering, Chinese Academy of Sciences) Beijing 100093)
²(School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049)
³(Beijing Academy of Artificial Intelligence, Beijing 100084)

Funds: This work was supported by the National Key Research and Development Program of China (2020AAA0105200), the National Natural Science Foundation of China (U1836211), the Beijing Natural Science Foundation (JQ18011), the Youth Innovation Promotion Association CAS, and the Project of Beijing Academy of Artificial Intelligence (BAAI2020ZJ0402).

摘要

摘要: 随着官方发布的漏洞数量呈现指数的增长趋势,针对漏洞检测技术的研究应运而生.漏洞种类的多样性以及检测方法的单一性导致漏洞检测结果呈现一定的局限性.当前漏洞检测技术主要集中在静态检测和动态检测2方面.其中静态检测分析又分为文档分析法、交叉验证法以及程序分析方法等3类.随着自然语言处理技术的兴起和专家知识的不断扩展,研究人员探索了在多个数据源上利用自然语言处理技术辅助进行漏洞检测研究的可行性.根据信息类型的不同,分别从官方文档、代码、代码注释以及漏洞相关信息4部分内容出发,对基于自然语言处理的漏洞检测相关研究成果进行调研.首先，通过对近10年来基于自然语言处理技术的漏洞检测相关文献进行梳理，对相关成果进行分类并提取技术细节；接着，对不同数据源下的研究成果进行横向对比，总结当前基于自然语言处理技术的漏洞检测成果的优缺点；最后，通过交叉对比并深入分析，总结当前基于自然语言处理的漏洞检测方法中存在的8类问题，从数据、技术以及效果3方面进行解决方案的讨论，同时提出了未来研究方向.
- 漏洞检测 /
- 自然语言处理 /
- 静态检测 /
- 安全 /
- 综述
Abstract: With the number of the official reported vulnerabilities is exponentially increasing, the researches aiming at the techniques of vulnerability detection is arising. The diversity of vulnerability types and the unicity of detection methods result in the limitation of the vulnerability detection achievement. The main streams of the research on vulnerability detection methods are static detection and dynamic detection. Static detection includes document analysis, cross validation, and program analysis, etc. With the natural language processing is rising and the knowledge is booming, the researchers explore the possibility of vulnerability detection on multiple data resources with the help of natural language processing technique. In this paper, the literatures are classified into four parts which are official document, code, code comment and the vulnerability-related information based on the sources of information. Firstly, we extract the technical details and classify the research achievement by conducting an investigation on the related researches of the vulnerability detection methods based on natural language processing in recent 10 years, and then we summarize the relative merits of the research achievement by comparing and analyzing the researches originated from various data sources. Finally, through conducting cross comparison and in-depth exploration researches, we conclude eight types of limitations of vulnerability detection methods based on natural language processing and then discuss the solutions on the level of data, technique and effect, and meanwhile propose the future research trends.
- vulnerability detection /
- natural language processing /
- static detection /
- security /
- survey