ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2022, Vol. 59 ›› Issue (1): 197-208.doi: 10.7544/issn1000-1239.20200492

• 软件技术 • 上一篇    下一篇

一种Linux安全漏洞修复补丁自动识别方法

周鹏1,2,武延军1,3,赵琛1,3   

  1. 1(中国科学院软件研究所 北京 100190);2(中国科学院大学 北京 100049);3(计算机科学国家重点实验室(中国科学院软件研究所) 北京 100190) (zhoupengwork01@163.com)
  • 出版日期: 2022-01-01
  • 基金资助: 
    国家重点研发计划项目(2018YFB0803600);中国科学院战略性先导科技专项(Y8XD373105);中国科学院前沿科学重点研究计划项目(ZDBS-LY-JSC038)

Identify Linux Security Vulnerability Fix Patches Automatically

Zhou Peng1,2, Wu Yanjun1,3, Zhao Chen1,3   

  1. 1(Institute of Software, Chinese Academy of Sciences, Beijing 100190);2(University of Chinese Academy of Sciences, Beijing 100049);3(State Key Laboratory of Computer Science (Institute of Software, Chinese Academy of Sciences), Beijing 100190)
  • Online: 2022-01-01
  • Supported by: 
    This work was supported by the National Key Research and Development Program of China (2018YFB0803600), the Strategic Priority Research Program of Chinese Academy of Sciences (Y8XD373105), and the Key Research Program of Frontier Sciences, CAS (ZDBS-LY-JSC038).

摘要: 及时获取并应用安全漏洞修复补丁对保障服务器用户的安全至关重要.但是,学者和机构研究发现开源软件维护者经常悄无声息地修复安全漏洞,比如维护者88%的情况在发布软件新版本时才在发行说明中告知用户修复了安全漏洞,并且只有9%的漏洞修复补丁明确给出对应的CVE(common vulnerabilities and exposures)标号,只有3%的修复会及时主动通知安全监控服务提供者.这导致在很多情况下,安全工程师不能通过补丁的代码和描述信息直接区分漏洞修复、Bug修复、功能性补丁.造成漏洞修复补丁不能被用户及时识别和应用,同时用户从大量的补丁提交中识别漏洞修复补丁代价很高.以代表性Linux内核为例,给出一种自动识别漏洞修复补丁的方法,该方法为补丁的代码和描述部分分别定义特征,构建机器学习模型,训练学习可区分安全漏洞补丁的分类器.实验表明,该方法可以取得91.3%的精确率、92%的准确率、87.53%的召回率,并将误报率降低到5.2%,性能提升明显.

关键词: 漏洞修复补丁自动识别, 安全漏洞修复, Linux内核, 机器学习, 开源软件社区

Abstract: It is critical to catch and apply the vulnerability fix patches in time to ensure the security of information system. However, it is found that open source software maintainers often silently fix security vulnerabilities. For example, 88% of maintainers delay informing users to fix vulnerabilities in the release notes of new software version, and only 9% of the bug fixes clearly give the corresponding CVE ID, and only 3% of the fixes will actively notify the security service provider in time. In many cases, security engineers can’t directly distinguish vulnerability fixes, bug fixes, and feature patches from the code and log message of patches. As a result, vulnerability fixes can’t be identified and applied by users timely. At the same time, it is costly for users to identify vulnerability fixes from a large number of patch submissions. Taking Linux as an example, this paper presents a method of identifying vulnerability patches automatically. This method defines features for the code and log message from patches, builds machine learning model, and trains to learn classifiers that can distinguish vulnerability patches. Experiments indicate that our approach is effective, which can get 91.3% precision, 92% accuracy, 87.53% recall rate, and reduce the false positive rate to 5.2%.

Key words: identify vulnerability fix patches automatically, security vulnerability fixes, Linux kernel, machine learning, open-source software community

中图分类号: