ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2022, Vol. 59 ›› Issue (1): 197-208.doi: 10.7544/issn1000-1239.20200492

Previous Articles     Next Articles

Identify Linux Security Vulnerability Fix Patches Automatically

Zhou Peng1,2, Wu Yanjun1,3, Zhao Chen1,3   

  1. 1(Institute of Software, Chinese Academy of Sciences, Beijing 100190);2(University of Chinese Academy of Sciences, Beijing 100049);3(State Key Laboratory of Computer Science (Institute of Software, Chinese Academy of Sciences), Beijing 100190)
  • Online:2022-01-01
  • Supported by: 
    This work was supported by the National Key Research and Development Program of China (2018YFB0803600), the Strategic Priority Research Program of Chinese Academy of Sciences (Y8XD373105), and the Key Research Program of Frontier Sciences, CAS (ZDBS-LY-JSC038).

Abstract: It is critical to catch and apply the vulnerability fix patches in time to ensure the security of information system. However, it is found that open source software maintainers often silently fix security vulnerabilities. For example, 88% of maintainers delay informing users to fix vulnerabilities in the release notes of new software version, and only 9% of the bug fixes clearly give the corresponding CVE ID, and only 3% of the fixes will actively notify the security service provider in time. In many cases, security engineers can’t directly distinguish vulnerability fixes, bug fixes, and feature patches from the code and log message of patches. As a result, vulnerability fixes can’t be identified and applied by users timely. At the same time, it is costly for users to identify vulnerability fixes from a large number of patch submissions. Taking Linux as an example, this paper presents a method of identifying vulnerability patches automatically. This method defines features for the code and log message from patches, builds machine learning model, and trains to learn classifiers that can distinguish vulnerability patches. Experiments indicate that our approach is effective, which can get 91.3% precision, 92% accuracy, 87.53% recall rate, and reduce the false positive rate to 5.2%.

Key words: identify vulnerability fix patches automatically, security vulnerability fixes, Linux kernel, machine learning, open-source software community

CLC Number: