开源软件缺陷预测方法综述

田笑; 常继友; 张弛; 荣景峰; 王子昱; 张光华; 王鹤; 伍高飞; 胡敬炉; 张玉清

doi:10.7544/issn1000-1239.202221046

开源软件缺陷预测方法综述

Survey of Open-Source Software Defect Prediction Method

摘要

摘要: 开源软件缺陷预测通过挖掘软件历史仓库的数据，利用与软件缺陷相关的度量元或源代码本身的语法语义特征，借助机器学习或深度学习方法提前发现软件缺陷，从而减少软件修复成本并提高产品质量. 漏洞预测则通过挖掘软件实例存储库来提取和标记代码模块，预测新的代码实例是否含有漏洞，减少漏洞发现和修复的成本. 通过对2000年至2022年12月软件缺陷预测研究领域的相关文献调研，以机器学习和深度学习为切入点，梳理了基于软件度量和基于语法语义的预测模型. 基于这2类模型，分析了软件缺陷预测和漏洞预测之间的区别和联系，并针对数据集来源与处理、代码向量的表征方法、预训练模型的提高、深度学习模型的探索、细粒度预测技术、软件缺陷预测和漏洞预测模型迁移六大前沿热点问题进行了详尽分析，最后指出了软件缺陷预测未来的发展方向.

Abstract: Open-source software defect prediction reduces software repair costs and improves product quality by mining data from software history warehouses, using the syntactic semantic features of metrics related to software defects or the source code itself, and utilizing machine learning or deep learning methods to find software defects in advance. Vulnerability prediction extracts and tags code modules by mining software instance repositories to predict whether new code instances contain vulnerabilities in order to reduce the cost of vulnerability discovery and fixing. We investigate and analyze the relevant literatures in the field of software defect prediction from 2000 to December 2022. Taking machine learning and deep learning as the starting point, we sort out two types of prediction models which are based on software metrics and grammatical semantics. Based on the two types of models, the difference and connection between software defect prediction and vulnerability prediction are analyzed. Moreover, six frontier hot issues such as dataset source and processing, code vector representation method, pre-training model improvement, deep learning model exploration, fine-grained prediction technology, software defect prediction and vulnerability prediction model migration are analyzed in detail. Finally, the future development direction of software defect prediction is pointed out.

HTML全文

参考文献(124)

施引文献

资源附件(0)