ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2021, Vol. 58 ›› Issue (8): 1668-1685.doi: 10.7544/issn1000-1239.2021.20210297

所属专题: 2021人工智能前沿进展专题

• 人工智能 • 上一篇    下一篇

基于代码属性图和Bi-GRU的软件脆弱性检测方法

肖添明,管剑波,蹇松雷,任怡,张建锋,李宝   

  1. (国防科技大学计算机学院 长沙 410073) (xiaotianm19@nudt.edu.cn)
  • 出版日期: 2021-08-01
  • 基金资助: 
    国家自然科学基金项目(61872444,U19A2060);国家重点研发计划项目(2018YFB0204301)

Software Vulnerability Detection Method Based on Code Property Graph and Bi-GRU

Xiao Tianming, Guan Jianbo, Jian Songlei, Ren Yi, Zhang Jianfeng, Li Bao   

  1. (College of Computer Science and Technology, National University of Defense Technology, Changsha 410073)
  • Online: 2021-08-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61872444, U19A2060) and the National Key Research and Development Program of China (2018YFB0204301).

摘要: 现在软件规模越来越庞大和复杂,脆弱性形式也更趋向多样化,传统的脆弱性检测方法存在人工参与度高、对未知脆弱性检测能力弱的缺点,已无法满足对多样化脆弱性的检测要求.为了提高对未知脆弱性的检测效果,大量机器学习方法被应用到软件脆弱性检测领域.由于现有方法在代码表征过程中存在着较高的语法和语义信息的损失,导致误报率和漏报率较高.针对这一问题,提出了一种基于代码属性图和Bi-GRU的软件脆弱性检测方法.该方法通过从函数的代码属性图中提取出抽象语法树序列、控制流图序列作为函数表征的表征方式,减少代码表征过程中的信息的损失,并通过选取Bi-GRU来构建特征提取模型,提高对脆弱性代码的特征提取能力.实验结果表明,与以抽象语法树为表征方式的方法相比,该方法最大可提高35%的精确率和22%的召回率,可改善面向多个软件源代码混合的真实数据集的脆弱性检测效果,有效降低误报率和漏报率.

关键词: 脆弱性检测, 代码属性图, 代码表征, 机器学习, Bi-GRU

Abstract: For large-scale and complex software nowadays, the forms of vulnerability code tend to be more diversified. Traditional vulnerability detection methods can not meet the requirements of diverse vulnerabilities because of their high degree of human participation and weak ability of unknown vulnerability detection. In order to improve the detection effect of unknown vulnerability, a large number of machine learning methods have been applied to the field of software vulnerability detection. Due to the high loss of syntax and semantic information in code representation, the false positive rate and false negative rate are high. To solve this issue, a software vulnerability detection method based on code property graph and Bi-GRU is proposed. This method extracts the abstract syntax tree sequence and the control flow graph sequence from the code property graph of the function as the representation method of the function representation. The representation method can reduce the loss of information in the code representation. At the same time, the method selects Bi-GRU to build feature extraction model. It can improve the feature extraction ability of vulnerability code. Experimental results show that, compared with the method represented by abstract syntax tree, this method can improve the accuracy and recall by 35% and 22%. It can improve the vulnerability detection effect of real data set for multiple software source code mixing, and effectively reduce the false positive rate and false negative rate.

Key words: vulnerability detection, code property graph, code representation, machine learning, Bi-GRU

中图分类号: