ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2017, Vol. 54 ›› Issue (3): 633-641.doi: 10.7544/issn1000-1239.2017.20151052

• 软件技术 • 上一篇    下一篇

基于组合机器学习算法的软件缺陷预测模型

傅艺绮,董威,尹良泽,杜雨晴   

  1. (国防科学技术大学计算机学院 长沙 410073) (fu_303503@163.com)
  • 出版日期: 2017-03-01
  • 基金资助: 
    国家“九七三”重点基础研究发展计划基金项目(2014CB340703);国家自然科学基金项目(91318301,61690203)

Software Defect Prediction Model Based on the Combination of Machine Learning Algorithms

Fu Yiqi, Dong Wei, Yin Liangze,Du Yuqing   

  1. (College of Computer, National University of Defense Technology, Changsha 410073)
  • Online: 2017-03-01

摘要: 软件缺陷预测是根据软件产品中提取的度量信息和已经发现的缺陷来尽早地预测软件可能还存在的缺陷,基于预测结果可合理分配测试和验证资源.基于机器学习的缺陷预测技术能够较全面地、自动地学习模型来发现软件中的缺陷,已经成为缺陷预测的主要方法.为了提高预测的效率和准确性,对机器学习算法的选择和研究是很关键的.对不同的机器学习缺陷预测方法进行对比分析,发现各算法在不同评价指标上有不同的优势,利用这些优势并结合机器学习中的stacking集成学习方法提出了将不同预测算法的预测结果作为软件度量并进行再次预测的基于组合机器学习算法的软件缺陷预测模型,最后用该模型对Eclipse数据集进行实验,表明了该模型的有效性.

关键词: 软件缺陷预测, 机器学习, 集成学习, 组合, Eclipse预测数据集

Abstract: According to the metrics information and defects found in a software product, we can use software defect prediction technology to predict more defects that may also exist as early as possible, then testing and validation resources are allocated based on the prediction result appropriately. Defect prediction based on machine learning techniques can find software defects comprehensively and automatically, and it is becoming one of the main methods of current defect prediction technologies. In order to improve the efficiency and accuracy of prediction, selection and research of machine learning algorithms is the critical part. In this paper, we do comparative analysis to different machine learning defect prediction methods, and find that different algorithms have both advantages and disadvantages in different evaluation indexes. Taking these advantages, we refer to the stacking integration learning method and present a combined software defect prediction model. In this model, we first predict once, then add the prediction results of different methods in the original dataset as new software metrics, and then predict again. Finally, we make experiments on Eclipse dataset. Experimental results show that this model is technical feasibility, and can decrease the cost of time and improve the accuracy.

Key words: software defect prediction, machine learning, ensemble learning, combination, Eclipse prediction dataset

中图分类号: