基于组合机器学习算法的软件缺陷预测模型

傅艺绮; 董威; 尹良泽; 杜雨晴

doi:10.7544/issn1000-1239.2017.20151052

基于组合机器学习算法的软件缺陷预测模型

(国防科学技术大学计算机学院长沙 410073) (fu_303503@163.com)

基金项目: 国家“九七三”重点基础研究发展计划基金项目(2014CB340703)；国家自然科学基金项目(91318301，61690203)

详细信息

中图分类号: TP311
计量
- 文章访问数: 2063
- HTML全文浏览量: 2
- PDF下载量: 1329
出版历程
- 发布日期: 2017-02-28

Software Defect Prediction Model Based on the Combination of Machine Learning Algorithms

(College of Computer, National University of Defense Technology, Changsha 410073)

摘要

摘要: 软件缺陷预测是根据软件产品中提取的度量信息和已经发现的缺陷来尽早地预测软件可能还存在的缺陷，基于预测结果可合理分配测试和验证资源.基于机器学习的缺陷预测技术能够较全面地、自动地学习模型来发现软件中的缺陷，已经成为缺陷预测的主要方法.为了提高预测的效率和准确性，对机器学习算法的选择和研究是很关键的.对不同的机器学习缺陷预测方法进行对比分析，发现各算法在不同评价指标上有不同的优势，利用这些优势并结合机器学习中的stacking集成学习方法提出了将不同预测算法的预测结果作为软件度量并进行再次预测的基于组合机器学习算法的软件缺陷预测模型,最后用该模型对Eclipse数据集进行实验，表明了该模型的有效性.
- 软件缺陷预测 /
- 机器学习 /
- 集成学习 /
- 组合 /
- Eclipse预测数据集
Abstract: According to the metrics information and defects found in a software product, we can use software defect prediction technology to predict more defects that may also exist as early as possible, then testing and validation resources are allocated based on the prediction result appropriately. Defect prediction based on machine learning techniques can find software defects comprehensively and automatically, and it is becoming one of the main methods of current defect prediction technologies. In order to improve the efficiency and accuracy of prediction, selection and research of machine learning algorithms is the critical part. In this paper, we do comparative analysis to different machine learning defect prediction methods, and find that different algorithms have both advantages and disadvantages in different evaluation indexes. Taking these advantages, we refer to the stacking integration learning method and present a combined software defect prediction model. In this model, we first predict once, then add the prediction results of different methods in the original dataset as new software metrics, and then predict again. Finally, we make experiments on Eclipse dataset. Experimental results show that this model is technical feasibility, and can decrease the cost of time and improve the accuracy.
- software defect prediction /
- machine learning /
- ensemble learning /
- combination /
- Eclipse prediction dataset

HTML全文

参考文献(0)

施引文献(27)

期刊类型引用(12)

1.	杨兴耀，肖瑞，卢进堂. 新疆维吾尔语口音普通话短文的语音识别研究. 东北师大学报(自然科学版). 2024(04): 72-80 . 百度学术
2.	闫凯，宋烨，刘瑜，杨莉，张浩源. 老龄化背景下居家养老系统方言识别算法应用研究——以粤语为例. 信息与电脑(理论版). 2023(02): 120-122 . 百度学术
3.	蒋若怡，韦永壮，王慧娇. 基于深度学习的差分神经区分器求解方法. 计算机工程与设计. 2023(06): 1629-1634 . 百度学术
4.	赵建川，杨浩铨，徐勇，吴恋，崔忠伟. 基于对比预测编码模型的多任务学习语种识别方法. 数据采集与处理. 2022(02): 288-297 . 百度学术
5.	万苗，任杰，马苗，曹瑞. 多任务学习在中国方言分类中的应用研究. 计算机技术与发展. 2022(04): 109-115 . 百度学术
6.	郝焕香. 基于深度学习的方言语音识别模型构建. 自动化与仪器仪表. 2022(04): 48-51 . 百度学术
7.	王瑶，龙华，邵玉斌，杜庆治. 可变时长的短时广播语音多语种识别. 云南大学学报(自然科学版). 2022(03): 490-496 . 百度学术
8.	付英，刘增力，汤辉. 基于CNN-BiGRU的方言语种识别. 通信技术. 2022(06): 712-719 . 百度学术
9.	王瑶，龙华，邵玉斌，杜庆治，王延凯. 基于CRNN混合神经网络的多语种识别. 光电子·激光. 2022(06): 620-628 . 百度学术
10.	张允耀，黄鹤鸣，张会云. 复杂噪声环境下语音识别研究. 计算机与现代化. 2021(09): 68-74 . 百度学术
11.	辛强伟，唐云凯. 多维度数据组合的人工智能系统性能优化分析. 数字技术与应用. 2020(10): 36-38 . 百度学术
14.	顾佳，黄明，关岳. 高速列车牵引变流器故障诊断研究. 振动.测试与诊断. 2020(05): 997-1002+1029 . 百度学术