开源软件缺陷预测方法综述

田笑; 常继友; 张弛; 荣景峰; 王子昱; 张光华; 王鹤; 伍高飞; 胡敬炉; 张玉清

doi:10.7544/issn1000-1239.202221046

开源软件缺陷预测方法综述

田笑^{1, 2,},
常继友³,
张弛²,
荣景峰^{2, 6},
王子昱³,
张光华³,
王鹤^{1, 2},
伍高飞^{1, 2, 4},
胡敬炉⁵,
张玉清^{1, 2, 6, 7, ,}

1.
西安电子科技大学网络与信息安全学院　西安　710126
2.
国家计算机网络入侵防范中心（中国科学院大学）　北京　101408
3.
河北科技大学信息科学与工程学院　石家庄　050018
4.
广西密码学与信息安全重点实验室（桂林电子科技大学）　广西桂林　541000
5.
早稻田大学情报生产系统研究科　日本　808-0135
6.
海南大学网络空间安全学院　海口　570228
7.
中关村实验室　北京　100094

基金项目: 先进密码技术与系统安全四川省重点实验室开放课题(SKLACSS-202205)；海南省重点研发计划项目(GHYF2022010, ZDYF202012)；国家自然科学基金项目(U1836210)；陕西省自然科学基础研究计划(2021JQ-192)；广西密码学与信息安全重点实验室课题(GCIS202123)

详细信息

作者简介:
田笑: 1999年生. 硕士研究生. 主要研究方向为网络与信息安全

常继友: 1999年生. 硕士研究生. 主要研究方向为网络与信息安全

张弛: 2002年生. 硕士研究生. 主要研究方向为人工智能与安全

荣景峰: 1986年生. 博士研究生. 主要研究方向为网络与信息安全

王子昱: 1998年生. 硕士研究生. 主要研究方向网络与信息安全

张光华: 1979年生. 博士，教授，硕士生导师. 主要研究方向为网络与信息安全

王鹤: 1987年生. 博士，讲师，硕士生导师. 主要研究方向为密码学、量子密码协议

伍高飞: 1987年生. 博士，讲师，硕士生导师. 主要研究方向为密码学

胡敬炉: 1962年生. 博士，教授，博士生导师. 主要研究方向为计算智能

张玉清: 1966年生. 博士，教授，博士生导师. 主要研究方向为信息安全

通讯作者:
张玉清（zhangyq@nipc.org.cn）

中图分类号: TP311
计量
- 文章访问数: 0
- HTML全文浏览量: 0
- PDF下载量: 0
出版历程
- 收稿日期: 2023-03-29
- 修回日期: 2023-06-05
- 网络出版日期: 2023-07-04
- 刊出日期: 2023-06-30

Survey of Open-Source Software Defect Prediction Method

Tian Xiao^{1, 2,},
Chang Jiyou³,
Zhang Chi²,
Rong Jingfeng^{2, 6},
Wang Ziyu³,
Zhang Guanghua³,
Wang He^{1, 2},
Wu Gaofei^{1, 2, 4},
Hu Jinglu⁵,
Zhang Yuqing^{1, 2, 6, 7, ,}

1.
School of Cyber Engineering, Xidian University, Xi’an 710126
2.
National Computer Network Intrusion Protection Center (University of Chinese Academy of Sciences), Beijing 101408
3.
School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang 050018
4.
Guangxi Key Laboratory of Cryptography and Information Security (Guilin University of Electronic Technology), Guilin,Guangxi 541000
5.
Graduate School of Information, Production and Systems, Waseda University, Japan 808-0135
6.
College of Cyberspace Security, Hainan University, Haikou 570228
7.
Zhongguancun Laboratory, Beijing 100094

Funds: This work was supported by the Open Fund of Advanced Cryptography and System Security Key Laboratory of Sichuan Province (SKLACSS-202205), the Key Research and Development Program of Hainan Province (GHYF2022010, ZDYF202012), and the National Natural Science Foundation of China (U1836210), the Natural Science Basis Research Plan in Shaanxi Province of China (2021JQ-192), and the Program of Guangxi Key Laboratory of Cryptography and Information Security (GCIS202123)

More Information

Author Bio:
Tian Xiao: born in 1999. Master candidate. Her main research interest includes network and information security

Chang Jiyou: born in 1999. Master candidate. His main research interest includes network and information security

Zhang Chi: born in 2002. Master candidate. His main research interest includes AI and security

Rong Jingfeng: born in 1986. PhD candidate. His main research interest includes network and information security

Wang Ziyu: born in 1998. Master candidate. His main research interest includes network and information security

Zhang Guanghua: born in 1979. PhD, professor, master supervisor. His main research interest includes network and information security

Wang He: born in 1987. PhD, lecturer, master supervisor. Her main research interests include cryptography, quantum cryptographic protocol

Wu Gaofei: born in 1987. PhD, lecturer, master supervisor. His main research interest includes cryptography

Hu Jinglu: born in 1962. PhD, professor, PhD supervisor. His main research interest includes computational intelligence

Zhang Yuqing: born in 1966. PhD, professor, PhD supervisor. His main research interest includes information security

摘要

摘要:
开源软件缺陷预测通过挖掘软件历史仓库的数据，利用与软件缺陷相关的度量元或源代码本身的语法语义特征，借助机器学习或深度学习方法提前发现软件缺陷，从而减少软件修复成本并提高产品质量. 漏洞预测则通过挖掘软件实例存储库来提取和标记代码模块，预测新的代码实例是否含有漏洞，减少漏洞发现和修复的成本. 通过对2000年至2022年12月软件缺陷预测研究领域的相关文献调研，以机器学习和深度学习为切入点，梳理了基于软件度量和基于语法语义的预测模型. 基于这2类模型，分析了软件缺陷预测和漏洞预测之间的区别和联系，并针对数据集来源与处理、代码向量的表征方法、预训练模型的提高、深度学习模型的探索、细粒度预测技术、软件缺陷预测和漏洞预测模型迁移六大前沿热点问题进行了详尽分析，最后指出了软件缺陷预测未来的发展方向.
- 软件缺陷预测 /
- 漏洞预测 /
- 机器学习 /
- 深度学习 /
- 度量元 /
- 语法语义分析
Abstract:
Open-source software defect prediction reduces software repair costs and improves product quality by mining data from software history warehouses, using the syntactic semantic features of metrics related to software defects or the source code itself, and utilizing machine learning or deep learning methods to find software defects in advance. Vulnerability prediction extracts and tags code modules by mining software instance repositories to predict whether new code instances contain vulnerabilities in order to reduce the cost of vulnerability discovery and fixing. We investigate and analyze the relevant literatures in the field of software defect prediction from 2000 to December 2022. Taking machine learning and deep learning as the starting point, we sort out two types of prediction models which are based on software metrics and grammatical semantics. Based on the two types of models, the difference and connection between software defect prediction and vulnerability prediction are analyzed. Moreover, six frontier hot issues such as dataset source and processing, code vector representation method, pre-training model improvement, deep learning model exploration, fine-grained prediction technology, software defect prediction and vulnerability prediction model migration are analyzed in detail. Finally, the future development direction of software defect prediction is pointed out.
- software defect prediction /
- vulnerability prediction /
- machine learning /
- deep learning /
- metric /
- semantic and syntactic analysis

HTML全文

深度学习凭借对样本高维特征的非线性表达及数据信息的抽象表示，极大地推进了语音识别、计算机视觉等人工智能方法在工业中的应用. 1989年LeCun等人^[1]提出深度卷积网络LeNet模型，在手写体图像识别领域取得了突破性进展，为深度学习的发展提供了前提和基础. 为进一步提升深度神经网络模式识别及图像处理精度，推广其在工业中的应用，国内外学者不断优化及改进网络结构. 随着模型层数逐步增加，模型参数和架构愈加庞大，算法对存储、计算等资源的需求不断增长，导致大模型网络失效等问题^[2]，例如Resnet50，VGG16等大型神经网络，尽管在图像分类应用上表现出卓越性能，但其冗余参数导致较高计算成本和内存消耗. 同时，多媒体、5G技术、移动终端的快速发展，边缘计算设备广泛部署，使网络应用需求逐步增加. 手机、平板电脑、移动摄像机等便携式近端设备相比于固定设备存在数十倍的计算、存储等能力差距，为大规模网络近端迁移与运行带来困难. 如何提升边缘设备计算、识别及分类能力，实现大规模深度学习网络的近端部署成为有意义的工作. 基于此，Buciluǎ等人^[3]提出神经网络模型压缩方法，将信息从大模型或模型集合传输到需要训练的小型模型，而不降低模型精度. 同时，大规模神经网络模型中包含的大量参数存在一定功能稀疏性，使网络结构出现过参数化等问题，即使在网络性能敏感的大规模场景中，仍包含产生重复信息的神经元与链接. 知识蒸馏 (knowledge distillation, KD)将高性能大规模网络作为教师网络指导小规模学生网络^[4]，实现知识精炼与网络结构压缩，成为模型压缩、加速运算、大规模网络近端部署的重要方法.

然而，随着人们对隐私保护意识的增强以及法律、传输等问题的加剧，针对特定任务的深度网络训练数据往往难以获取，使Data-Free环境下的神经网络模型压缩，即在避免用户隐私数据泄露的同时得到一个与数据驱动条件下压缩后准确率相似的模型，成为一个具有重要实际意义的研究方向. Chen等人^[5]提出Data-Free环境知识蒸馏框架DAFL (data-free learning of student networks, DAFL)，建立教师端生成器，生成伪样本训练集，实现知识蒸馏并获得与教师网络性能近似的小规模学生网络. 然而，该方法在复杂数据集上将降低学生网络识别准确率，其主要原因有3个方面：

1）判别网络优化目标不同. 模型中教师网络优化生成器产生伪数据，实现学生网络知识蒸馏，使学生网络难以获得与教师网络一致的优化信息构建网络模型.

2）误差信息优化生成器. 教师端生成器的构建过度信任教师网络对伪数据的判别结果，利用误差信息优化并生成质量较差的伪训练样本，知识蒸馏过程学生网络难以有效利用教师网络潜在先验分布信息.

3）学生网络泛化性低. 模型中生成数据仅依赖于教师网络训练损失，导致生成数据特征多样性缺失，降低学生网络判别性.

如图1所示， MNIST数据集中类别为1和7时图像特征有较大差异，而图1右侧中DAFL方法的学生网络得到的2类数据统计特征直方图相当近似，该模型训练得到的小规模学生网络针对特征相似图像难以获得更鲁棒的判别结果. 为提升DAFL模型中学生的网络准确率及泛化性，提出新的双生成器网络架构DG-DAFL(double generators-DAFL，DG-DAFL)，图1右侧中由DG-DAFL框架训练得到学生网络判别器特征统计直方图对比，即1类和7类特征统计结果有一定差距，为后续分类提供了前提.

图 1 近似样本特征归一化统计结果对比

Figure 1. Comparison of normalized statistical results for approximate sample characteristics

下载: 全尺寸图片幻灯片

为解决Data-Free环境知识蒸馏、保证网络识别精度与泛化性，本文提出双生成器网络架构DG-DAFL，学生端生成器在教师端生成器的辅助下充分利用教师网络潜在先验知识，产生更适合学生网络训练的伪训练样本，利用生成器端样本分布差异，避免DAFL学生网络对单一教师网络端生成器样本依赖，保证生成器样本多样性，提升学生网络判别器识别泛化性. 本文贡献有3方面：

1）针对Data-Free知识蒸馏问题提出双生成器网络架构DG-DAFL，建立教师生成器网络与学生生成器网络，生成伪样本. 优化教师生成器网络的同时，学生网络判别器优化学生生成器网络，实现生成器与判别器分离，避免误差判别信息干扰生成器构建. 同时，使网络任务及优化目标一致，提升学生网络性能. 该结构可被拓展于解决其他任务的Data-Free知识蒸馏问题.

2）通过增加教师网络及学生网络生成器端样本分布差异度量，避免单生成器网络结构中学生网络训练过度依赖教师生成器网络样本，产生泛化性较低等问题. 同时，该差异度量可使得学生网络生成数据在保证分布近似条件下的样本多样性，进一步提升学生网络识别鲁棒性.

3）所提出框架在流行分类数据集Data-Free环境下，学生网络参数量仅为教师网络的50%时，仍取得了令人满意的识别性能. 同时，进一步验证并分析了近似样本数据集的分类问题，取得了更鲁棒的结果.

1. 相关工作

针对大规模神经网络的近端部署与应用，网络模型压缩及加速成为人工智能领域的研究热点. 目前的模型压缩方法包括网络剪枝^[6]、参数共享^[7]、量化^[8]、网络分解^[9]、紧凑网络设计，其中知识蒸馏凭借灵活、直观的知识抽取及模型压缩性能受到学者广泛关注. 2015年，Hinton等人^[4]提出知识蒸馏模型，构建教师网络、学生网络及蒸馏算法3部分框架，引入温度（temperature，T）系数，使卷积神经网络softmax层的预测标签由硬标签（hard-label）转换为软标签（soft-label），利用庞大、参数量多的教师网络监督训练得到体量、参数量更少且分类性能与教师网络更近似的学生网络^[3-4,10-11]. 根据知识蒸馏操作的不同部分，分为目标（logits）蒸馏^[12-16]与特征图蒸馏^[17-22]两类. logits知识蒸馏模型主要目标集中在构建更为有效的正则化项及优化方法，在硬标签（hard-label）监督训练下得到泛化性能更好的学生网络. Zhang等人^[16]提出深度互学习（deep mutual learning，DML）模型，利用交替学习同时强化学生网络与教师网络. 然而，教师网络与学生网络的性能差距使蒸馏过程难以收敛. 基于此，Mirzadeh等人^[14]提出助教知识蒸馏（teacher assistant knowledge distillation，TAKD）模型，引入中等规模助教网络，缩小教师网络和学生网络之间过大的性能差距，达到逐步蒸馏的目的. 特征图知识蒸馏模型通过直接将样本表征从教师网络迁移至学生网络^[17-18,20]，或将训练教师网络模型样本结构迁移至学生网络^[19,21-22]，实现知识抽取. 该类方法充分利用大规模教师网络对样本的高维、非线性特征表达及样本结构，获得更高效的学生网络.

Data-Free环境中用于训练模型的真实数据往往难以获取，使知识蒸馏模型失效. 对抗生成网络（generative adversarial network，GAN）技术的发展，激发了该类环境下知识蒸馏领域方法的进步. 2014年，Goodfellow等人^[23]提出GAN模型，通过模型中生成器与鉴别器的极大极小博弈，二者相互竞争提升各自生成和识别能力^[24]，可用于生成以假乱真的图片^[25]、影片^[26]等的无监督学习方法. GAN中的生成器可合成数据直接作为训练数据集，或用于训练数据集增广及生成难样本支持学生网络训练. Nguyen等人^[27]利用预训练的GAN生成器作为模型反演的先验，构建伪训练数据集. Bhardwaj等人^[28]利用10%的原始数据和预训练教师模型生成合成图像数据集，并将合成图像用于知识蒸馏. Liu等人^[29]与Zhang等人^[30]均利用无标签数据提升模型效果，分别提出无标签数据蒸馏的光流学习（learning optical flow with unlabeled data distillation, DDFlow）模型^[29]与图卷积网络可靠数据蒸馏（reliable data distillation on graph convolution network, RDDGCN）模型^[30]. 其中RDDGCN模型利用教师网络对所生成的未标注数据给予新的训练注释，构建训练数据集训练学生网络. 有研究借助大规模预训练数据集提升模型效果，Yin等人^[31]提出的DeepInversion方法将图像更新损失与教师、学生之间的对抗性损失结合，教师网络通过对Batch Normalization层中所包含通道的均值和方差进行推导，在大规模ImageNet数据集上预训练深度网络后合成图像作为训练样本集. Lopes等人^[32]进一步利用教师网络先验信息，通过教师网络激活层重构训练数据集以实现学生网络知识蒸馏. 文献[28-32]所述方法均利用少量训练数据或常用的预训练数据集信息，在Data-Free环境中仍难以解决无法直接获取真实且可用于训练小规模学生网络的先验信息等问题.

基于此，DAFL框架借助GAN学习模型，将预训练好的教师网络作为判别器网络，构建并优化生成器网络模型，生成更加接近真实样本分布的伪数据，为高精度、小规模学生网络的知识蒸馏与网络压缩提供有效先验信息，框架如图2所示. 首先，通过函数one_hot获得伪标签，利用损失函数将GAN中判别器的输出结果从二分类转换为多分类，以实现多分类任务的知识蒸馏；其次，采用信息熵损失函数、特征图激活损失函数、类别分布损失函数优化生成器，为学生网络训练提供数据；最终，实现在没有原始数据驱动条件下，通过知识蒸馏方法使学生网络参数减少一半，且具有与教师网络近似的分类准确率. 然而，DAFL框架中生成器优化过程完全信任判别器针对Data-Free环境中初始生成伪样本的先验判别，忽略了伪样本所构造伪标签带来的误差，干扰生成器优化，直接影响学生网络性能. 同时，教师网络与学生网络执行不同任务时存在学生网络过度依赖教师网络生成器样本，降低Data-Free环境下模型学习泛化性.

图 2 DAFL架构

Figure 2. Architecture of DAFL

下载: 全尺寸图片幻灯片

为了提升生成样本质量，Fang等人^[33]提出无数据对抗蒸馏（data-free adversarial distillation，DFAD）模型，通过训练一个外部生成器网络合成数据，使学生网络和教师网络输出差异最大化图像. Han等人^[34]提出鲁棒性和多样性的Data-Free知识蒸馏（robutness and diversity seeking data-free knowledge distillation，RDSKD）方法在生成器训练阶段引入指数惩罚函数，提升生成器生成图像的多样性. Nayak等人^[35]提出零样本知识蒸馏模型，仅利用教师网络参数对softmax层空间建模生成训练样本. 同时，Micaelli等人^[36]提出零样本对抗性信息匹配模型，利用教师网络特征表示的信息生成训练样本. 为避免零样本学习中先验信息缺失降低学生网络学习准确率等问题，Kimura等人^[37]与 Shen等人^[38]分别提出伪样本训练模型与网络嫁接模型，二者均借助少量确定性监督样本，将知识从教师模型提取到学生神经网络中. 为充分利用教师网络先验信息，Storkey等人^[39]提出zero-shot知识蒸馏方法，将教师网络同时定义为样本鉴别器. 同时，Radosavovic等人^[40]提出全方位监督学习模型.

文献[5, 33-40]所述的Data-Free环境中知识蒸馏模型所需的训练数据通常由已训练教师模型的特征表示生成，该类数据包含部分教师网络先验信息，在无数据可用的情况下显示出了很大的潜力. 然而，Data-Free知识蒸馏仍是一项非常具有挑战性的任务，主要集中在如何生成高质量、多样化、具有针对性的训练数据，进而获得更高精度、高泛化性的小规模学生网络.

2. 双生成器网络

针对提升Data-Free环境中知识蒸馏方法有效性与泛化性，本文受DAFL模型的启发，提出DG-DAFL网络架构，如所示. 包括4部分网络结构：教师端生成器网络 ${G_{\rm{T}}}$ 、学生端生成器网络 ${G_{\rm{S}}}$ 、教师端判别器网络 ${N_{\rm{T}}}$ 、学生端判别器网络 ${N_{\rm{S}}}$ . DG-DAFL利用教师端与学生端判别器网络 ${N_{\rm{T}}}$ 与 ${N_{\rm{S}}}$ ，同时优化生成器网络 ${G_{\rm{T}}}$ 与 ${G_{\rm{S}}}$ ，保证学生网络与教师网络优化目标一致，避免真实样本标签类别先验信息缺失时生成器过度信任教师网络判别结果，产生质量较低的伪样本，降低学生网络判别性能. 同时，通过增加生成器端伪样本分布损失，保证学生端生成器网络训练样本多样性，提升学生网络学习泛化性. DG-DAFL框架的训练过程可总结为3个步骤：教师端辅助生成器 ${G_{\rm{T}}}$ 构建、最优化学生端生成器 ${G_{\rm{S}}}$ 构建、学生网络N_S与教师网络N_T知识蒸馏.

图 3 DG-DAFL架构及学习过程

Figure 3. Architecture and learning process of DG-DAFL

下载: 全尺寸图片幻灯片

2.1 教师端辅助生成器 ${G_{\rm{T}}}$ 构建

本文构建双生成器网络架构 ${G_{\rm{T}}}$ 与 ${G_{\rm{S}}}$ ，通过教师网络提取训练样本先验信息，训练教师端生成器网络 ${G_{\rm{T}}}$ ，使生成的伪样本分布更近似于真实样本. 由于真实样本标签缺失， ${G_{\rm{T}}}$ 难以得到来自于 ${N_{\rm{T}}}$ 准确、充分的样本分布先验信息，实现最优化训练. 因此，本文仅利用教师端生成器网络 ${G_{\rm{T}}}$ 作为训练学生端生成器网络 ${G_{\rm{S}}}$ 的辅助网络，强化生成伪样本质量，提升学生网络判别准确率.

随机样本 ${{\boldsymbol{Z}}^{({\rm{T}})}}$ 作为教师端生成器网络 ${G_{\rm{T}}}({{\boldsymbol{Z}}^{\left( {\rm{T}} \right)}};{{\boldsymbol{\theta}} _{\rm{g}}})$ 的初始输入，经网络计算后得到伪样本 ${\boldsymbol{x}}_i^{({\rm{T}})},i = 1,2,…, N$ ，其中 ${{\boldsymbol{\theta}} _{\rm{g}}}$ 为 ${G_{\rm{T}}}$ 网络参数. 同时，伪样本集 ${{\boldsymbol{X}}^{\left( {\rm{T}} \right)}}$ 作为教师网络判别器 ${N_{\rm{T}}}({{\boldsymbol{X}}^{\left( {\rm{T}} \right)}};{{\boldsymbol{\theta}} _{\rm{d}}})$ 的输入，可得到该网络判别结果，结合先验信息构造损失函数 ${{{\mathcal{L}}}_{{G_{\rm{T}}}}}$ ，反馈训练生成器网络 ${G_{\rm{T}}}$ ，得到更真实样本分布的伪训练样本集，用于学生网络知识蒸馏. 为获得优化反馈信息， ${{\mathcal{L}}_{{G_{\rm{T}}}}}$ 由3部分构成：

1)伪样本集可计算得到网络输出向量 ${\boldsymbol{y}}_i^{\left( {\rm{T}} \right)} = {N_{\rm{T}}}\left( {{{\boldsymbol{X}}^{\left( {\rm{T}} \right)}};{{\boldsymbol{\theta}} _{\rm{d}}}} \right)$ ，由于伪样本 ${\boldsymbol{x}}_i^{\left( {\rm{T}} \right)}$ 缺少真实标签信息，可求解输出向量的伪标签 ${{\boldsymbol{t}}_i} = \arg \mathop {\rm{m}}\limits_{{j}} \mathop {{\rm{ax}}}\limits{} {({\boldsymbol{y}}_i^{\rm{T}})_j}$ ，其中 $j{\text{ = }}1,2, … ,k. {\boldsymbol{t}}_{i}$ 为包含 $k$ 类向量中最大值位置，构建经验损失函数 ${{\mathcal{L}}_{\rm{{oh {\text{-}} T}}}}$ ：

${\mathcal{L}_{{\rm{oh}}{{ {\text{-}} {\rm{T}}}}}} = \frac{1}{n}\sum\limits_i {{H_{{\rm{cross}}}}({\boldsymbol{y}}_i^{\left( {\rm{T}} \right)},{{\boldsymbol{t}}_i})} .$

(1)

最小化预测标签与真实标签交叉熵值，学习教师网络判别器先验信息，使 ${G_{\rm{T}}}$ 生成与真实样本分布更为接近的伪样本集.

2)借助DAFL中模型训练过程， ${N_{\rm{T}}}$ 网络中多卷积层所提取的特征向量中更具判别性的神经元将被激活，即伪样本 ${{\boldsymbol{X}}^{\left( {\rm{T}} \right)}}$ 经预训练网络 ${N_{\rm{T}}}$ 逐层非线性特征计算后得到特征向量 ${\boldsymbol{f}}_i^{\left( {\rm{T}} \right)}$ ，其中更大激活值可包含更多的真实样本特征先验信息，特征图激活损失函数可被表示为

$\mathcal{L}_{\alpha \text{-T}}=-\frac{1}{N}{\displaystyle \sum _{i}\left|\right|{{\boldsymbol{f}}}_{i}^{\left({\rm{T}}\right)}|{|}_{\text{1}}^{}}\text{，}$

(2)

该损失在生成器优化过程中减小伪样本经卷积滤波器后激活值更大的特征，得到更接近真实样本特征表达.

3)为充分利用预训练教师网络样本分布及类别先验信息，构建预训练集样本类平衡分布损失 ${\mathcal{L}_{{\rm{ie}}{\text{-T}}}}$ . 定义 $p = \{ {p_1},{p_2}, … ,{p_k}\}$ 为 $k$ 类样本集中的每类样本出现的概率，当各类样本为均匀分布时，即 ${p_k} = \dfrac{1}{K}$ ，所含信息量最大. 为保证教师网络判别结果的均衡性、多样性，充分利用预训练样本分布信息，以教师网络优化生成器在该类数据集下等概率生成各类样本，构建信息熵损失函数：

${\mathcal{L}_{{\rm{ie}} {\text{-}} {\rm{T}}}} = - {H_{\inf {\rm{o}}}}\left(\frac{1}{N}\sum\limits_i {{\boldsymbol{y}}_i^{\rm{T}}} \right).$

(3)

结合式（1）~（3），可得到用于优化辅助生成器 ${G_{\rm{T}}}$ 的目标函数为

$\mathcal{L}_{{G}_{{\rm{T}}}}=\mathcal{L}_{{\rm{oh}}{{\text{-}}{\rm{T}}}}+\alpha \mathcal{L}_{\alpha {{\text{-}}{\rm{T}}}}+\beta \mathcal{L}_{{\rm{ie}}{{\text{-}}{\rm{T}}}}\text{，}$

(4)

其中 $\alpha$ 和 $\beta$ 为平衡因子. 利用式（4）保证 ${G_{\rm{T}}}$ 优化过程充分利用教师网络保存的训练样本分布等先验信息，即可获得更近似于真实数据的高质量伪样本数据集.

2.2 最优化学生端生成器 ${G_{\rm{S}}}$ 的构建

根据2.1节所述的教师端生成器 ${G_{\rm{T}}}$ 的优化过程，借助教师端判别器网络 ${N_{\rm{T}}}$ 包含的真实样本先验信息. 然而，由于函数one_hot所构建的伪样本标签将带来大量噪音，当 ${G_{\rm{T}}}$ 对 ${N_{\rm{T}}}$ 完全信任时，其优化过程将引入错误信息，使学生端判别器网络 ${N_{\rm{S}}}$ 训练阶段难以生成与真实样本分布近似的伪样本集，影响学生网络判别准确率. 同时，当 ${N_{\rm{S}}}$ 的训练将完全依赖于网络 ${G_{\rm{T}}}$ 生成伪样本时将降低模型 ${N_{\rm{S}}}$ 的泛化性.

为解决上述问题，本文在学生网络端引入生成器 ${G_{\rm{S}}}$ ，如所示. 利用 ${G_{\rm{T}}}$ 信息辅助 ${G_{\rm{S}}}$ 优化，生成更接近真实分布且更具多样性的训练样本. 首先，双生成器 ${G_{\rm{T}}}$ 与 ${G_{\rm{S}}}$ 通过随机初始样本同时生成伪样本矩阵 ${{\boldsymbol{X}}^{\left( {\rm{T}} \right)}}$ 与 ${{\boldsymbol{X}}^{\left( {\rm{S}} \right)}}$ ，其中， ${{\boldsymbol{X}}^{\left( {\rm{T}} \right)}}$ 通过 ${N_{\rm{T}}}$ 计算并由式（4）构建损失反馈训练生成器 ${G_{\rm{T}}}$ ，生成新的教师端伪样本集 ${{\boldsymbol{X}}'^{\left( {\rm{T}} \right)}}$ ；其次， ${{\boldsymbol{X}}^{\left({\rm{ S}} \right)}}$ 同时经 ${N_{\rm{T}}}$ 与 ${N_{\rm{S}}}$ 计算，为充分借助教师网络先验数据分布信息度量分布差异，利用式（5）优化 ${N_{\rm{S}}}$ ：

${\mathcal{L}_{{\rm{oh}}{\text{-}}{\rm{S}}}} = \frac{1}{n}\sum\limits_i {{H_{{\rm{cross}}}}\left( {{N_{\rm{T}}}\left( {{{\boldsymbol{X}}^{\left( {\rm{T}} \right)}};{{\boldsymbol{\theta}} _{\rm{d}}}} \right),{N_{\rm{S}}}\left( {{{\boldsymbol{X}}^{\left( {\rm{S}} \right)}};{\boldsymbol{\theta}} _{\rm{d}}^{\left( {\rm{S}} \right)}} \right)} \right)} .$

(5)

此时，利用初步训练得到的 ${N_{\rm{S}}}$ 结合当前生成伪样本集 ${{\boldsymbol{X}}^{\left( {\rm{S}} \right)}}$ 与式（4），构建反馈损失函数 ${\mathcal{L}'_{{G_{\rm{s}}}}} = {\mathcal{L}_{{\rm{oh}}{\text{-}}{\rm{S}}}} +$ $\alpha {\mathcal{L}_{\alpha {\text{-}}{\rm{S}}}} + \beta {\mathcal{L}_{{\rm{ie}}{\text{-}}{\rm{S}}}}$ ，优化当前学生网络生成器 ${G_{\rm{S}}}$ . 该模型可保证教师网络与学生网络执行相同任务，提升学生网络学习能力. 同时，通过对学生网络优化避免对缺失真实标签判别结果的过分信任，降低生成器优化效果. 最后， ${G_{\rm{S}}}$ 生成新的学生端伪样本集 ${{\boldsymbol{X}}'^{\left( {\rm{S}} \right)}}$ . 为使G_S获得更多样本先验信息保证生成样本与真实样本分布一致性，同时，保证生成伪样本多样性，提升学生网络模型泛化性，本文采用KL散度获得2个优化得到的伪样本集 ${{\boldsymbol{X}}'^{\left( {\rm{T}} \right)}}$ 与 ${{\boldsymbol{X}}'^{\left( {\rm{S}} \right)}}$ 随分布差异，如式（6）所示:

${\mathcal{L}_{{\rm{d}} {\text{-}} {\rm{KL}}}} = \sum\limits_{i = 1}^N {{G_{\rm{S}}}({\boldsymbol{x}}{'}_i^{\left( {\rm{S}} \right)}){1} {\text{b}}\frac{{{G_{\rm{S}}}({\boldsymbol{x}}{'}_i^{\left( {\rm{S}} \right)})}}{{{G_{\rm{T}}}({\boldsymbol{x}}{'}_i^{\left( {\rm{T}} \right)})}}} .$

(6)

本文仅期望学生网络生成器 ${G_{\rm{S}}}$ 所得的样本集 ${{\boldsymbol{X}}'^{\left( {\rm{S}} \right)}}$ 在分布上与先验样本分布更为接近. 此时，构建学生网络生成器优化损失表达，如式（7）所示，实现最优化生成器 ${G_{\rm{S}}}$ 的构建.

${\mathcal{L}_{{G_{\rm{S}}}}} = {\mathcal{L}'_{{G_{\rm{S}}}}}{\text{ + }}\gamma {\mathcal{L}_{{\rm{d}} {\text{-}} {\rm{KL}}}},$

(7)

其中， $\gamma$ 为平衡因子.

2.3 学生网络与教师网络知识蒸馏

本文利用优化得到的学生端生成器 ${G_{\rm{S}}}$ ，更新伪样本集 ${{\boldsymbol{X}}'^{\left( {\rm{S}} \right)}}$ 作为训练数据辅助学生网络构建.

教师网络 ${N_{\rm{T}}}$ 与学生网络 ${N_{\rm{S}}}$ 同时接受学生端生成器获得的优化为样本集 ${{\boldsymbol{X}}'^{\left( {\rm{S}} \right)}}$ ，由于模型差异，网络结构相对复杂的教师网络输出结果优于网络结构相对简单的学生网络. 为提升模型压缩效果，借助知识蒸馏技术，将二者softmax层上输出结果进行交叉熵函数计算，使学生网络的输出 ${\boldsymbol{y}}_i^{\left( {\rm{S}} \right)}$ 更近似教师网络的输出 ${\boldsymbol{y}}_i^{\left( {\rm{T}} \right)}$ ，提升学生网络 ${N_{\rm{S}}}$ 的性能. 知识蒸馏损失函数为

${\mathcal{L}_{{\rm{KD}}}} = \frac{1}{N}\sum\limits_{i = 1}^N {{H_{{\rm{cross}}}}({\boldsymbol{y}}_i^{\left( {\rm{S}} \right)},{\boldsymbol{y}}_i^{\left( {\rm{T}} \right)})} .$

(8)

结合伪样本训练，在此损失函数约束下，实现在相同任务下较为稀疏的大规模网络到紧凑小规模网络的压缩及知识蒸馏.

3. 实验结果与分析

本文在3个流行图像数据集上验证了所提出方法的有效性，并与近年Data-free环境下较为流行的知识蒸馏模型，包括DAFL, DFAD, RDSKD模型在精度、鲁棒性、泛化性上进行对比与分析. 同时，通过对模型消融实验结果的统计，讨论模型框架结构设计的合理性. 本文进一步设置实验数据，验证DG-DAFL模型的泛化性. 实验运行在Intel Core i7-8700及NVIDIA Geforce RTX 2070硬件环境，及Windows10操作系统、Python3语言环境、Pytorch深度学习框架上.

本文为了更全面地验证模型效果，采用4种评价指标：准确率（Accuracy）、精确率（Precision）、召回率（Recall）、特异度（Specificity）.

准确率（Accuracy）指分类模型中正确样本量占总样本量的比重，其计算公式为

$Accuracy = \frac{{TP + TN}}{{TP + TN + FP + FN}}.$

(9)

精确率（Precision）指分类结果预测为阳性的正确比重，计算公式为

$Precision = \frac{{TP}}{{TP + FP}}.$

(10)

召回率（Recall）指真实值为阳性的正确比重，其计算公式为

$Recall = \frac{{TP}}{{TP + FN}}.$

(11)

特异度（Specificity）指真实值为阳性的正确比重，其计算公式为

$S pecificity = \frac{{TN}}{{TN + FP}}.$

(12)

式（9）~（12）中，TP为模型正确预测为正例样本量，TN为模型正确预测为反例样本量，FP为模型错误预测为正例样本量，FN为模型错误预测为反例样本量.

本文引入双生成器端损失在充分利用教师网络先验样本分布信息条件下，保证生成样本多样性，如式（7）所示，其中 $\gamma$ 为平衡因子. 为保证实验的公平性， $\gamma$ 值的选取采用确定范围 $\left\{ {0.01},{0.1},1,{10},{100} \right\}$ 内值遍历选取方法，如中所示， $\gamma$ 取值将对学生网络模型识别结果产生较大影响. 当 $\gamma {\text{ = }}10$ 时，MNIST与USPS数据集均达到Accuracy统计的最高值. 因此，本文验证实验中的所有数据集，均设置 $\gamma {\text{ = }}10$ .

图 4 参数

$\gamma$ 值对模型性能的影响

Figure 4. Effect of

$\gamma$ on model performance

下载: 全尺寸图片幻灯片

3.1 实验结果对比

1）MNIST手写体数据集

MNIST数据集为10分类手写体数据集，由像素大小为28×28的70000张图像组成，本文中随机选取60000张图像为训练数据集，10000张图像为测试数据集，部分样本可视化结构如图5所示.

图 5 MNIST数据集中样本可视化

Figure 5. Sample visualization of MNIST dataset

下载: 全尺寸图片幻灯片

本数据集实验中，利用LeNet-5作为教师网络实现该数据集分类模型训练. 构建学生网络LeNet-5-half，其网络结构与教师网络相同，每层通道数相比教师网络少一半，计算成本相比教师网络少50%，可实现网络压缩. 表1中统计并对比了所提算法在MNIST数据集上的Accuracy值.

表 1 MNIST数据集上的分类结果

Table 1. Classification Results on MNIST Dataset

算法	Accuracy
教师网络	0.9894
KD^[4]	0.8678
DAFL^[5]	0.9687±0.001
DFAD^[33]	0.9596±0.0021
RDSKD^[34]	0.9755±0.0024
DG-DAFL	0.9809±0.0009
注：加粗为最优结果；“±”后的数值为多次实验的标准差.

下载: 导出CSV

| 显示表格

表1中对10次实验统计的均值可见，利用真实数据训练得到教师网络的Accuracy=0.989 4. 由噪声数据随机生成伪样本作为训练集，在教师网络指导下，利用知识蒸馏可得到Accuracy=0.8678的学生网络，该状态下仅利用教师网络前期训练得到的判别信息，不借助样本分布信息，难以达到满意的蒸馏效果. DAFL方法中，通过教师网络模型判别结果回传损失，优化生成器网络，生成与真实样本分布更为接近的伪样本数据，训练学生网络，模型Accuracy值可达到0.968 7. 本文提出的DG-DAFL方法相比DAFL方法，避免了单一生成器网络对教师网络在无标签伪样本集上判别结果过度信任所产生的无效先验优化失败问题，同时，学生网络端生成器在教师端生成器的辅助下产生更适合学生端生成器的训练样本，保证生成样本的多样性，提升识别泛化性. 同时，RDSKD模型通过增加正则化项提升样本多样性，针对不同类样本特征较为近似的MNIST数据集取得了比DAFL与DFAD模型更好的分类性能. DG-DAFL模型中，学生网络Accuracy值提升至0.9809，其网络性能十分接近教师网络，同时，根据10次实验运行结果的均值与方差可知DG-DAFL模型获得了更好的鲁棒性.

2）AR人脸数据集

AR数据集为包含100类的人脸数据集，由图像尺寸为120×165的2600张图片组成，其中前50类为男性样本，后50类为女性样本，每类包含26张人脸图，包括不同的面部表情、照明条件、遮挡情况，是目前使用最为广泛的标准数据集. 在实验中，本文将每类的20张图片作为训练集，剩余的6张作为测试集，通过此方式对网络性能进行评价. AR数据集可视化结果如图6所示.

图 6 AR数据集的可视化结果

Figure 6. Sample visualization results of AR dataset

下载: 全尺寸图片幻灯片

本数据集实验中，利用ResNet34作为教师网络，ResNet18作为学生网络. ResNet34与ResNet18采用相同的5层卷积结构，ResNet34在每层卷积结构中的层数更多，其所消耗的计算成本更高；ResNet34的Flops计算量为 $3.6 \times {10^9}$ ，ResNet18的Flops计算量为 $1.8 \times {10^9}$ . 表2中统计并对比了所提方法在AR数据集上的Accuracy结果.

表 2 AR数据集上的分类结果

Table 2. Classification Results on AR Dataset

算法	Accuracy
教师网络	0.865
DAFL^[5]	0.6767±0.0013
DFAD^[33]	0.52±0.0032
RDSKD^[34]	0.52±0.0026
DG-DAFL	0.7183±0.001
注：加粗为最优结果；“±”后的数值为多次实验的标准差.

下载: 导出CSV

| 显示表格

实验统计结果如表2所示. 教师网络经包含真实标签数据集训练后Accuracy=0.865. Data-Free环境下，DAFL模型中经知识蒸馏后学生网络的Accuracy=0.6767. AR数据集相比MNIST数据集，图像类别数量提升，图像复杂度及细节增加，不同类别间样本特征分布更为近似，难以判别. DAFL模型中生成器优化过程完全依赖教师网络判别结果，导致生成大量用于训练学生网络的噪音样本，使学生网络判别准确率与鲁棒性下降. DFAD模型忽略教师网络对样本生成所提供的先验信息，难以获得与原训练样本分布更为近似的生成样本，极大影响学生网络识别准确率. RDSKD模型面对的复杂特征样本集同样面临未充分利用预训练教师网络样本先验信息，导致知识蒸馏效果下降，学生网络的Accuracy仅为0.52. 本文通过构建双生成器模型DG-DAFL，在充分利用教师网络的潜在样本先验知识的同时，构造生成器端损失，避免对误差样本信息过学习，生成更有效且与真实样本分布一致的伪样本. 在AR较为复杂的数据集上，本文所提出的DG-DAFL模型的Accuracy=0.7183.

3）USPS 手写体数据集

USPS数据集为10类别分类数据集，由像素大小为16×16的9298张灰度图像组成，该数据集相比于MNIST数据集包含的样本量更多，样本尺寸更小，且样本表达更为模糊、抽象，为识别带来了困难，USPS数据集可视化结果如图7所示. 本文实验中，随机选取7291张与2007张图像分别构建教师网络的训练集与测试集.

图 7 USPS数据集的可视化结果

Figure 7. Sample visualization results of USPS dataset

下载: 全尺寸图片幻灯片

教师网络选择与MNIST数据集下相同的网络结构LeNet-5，学生网络结构为LeNet-5-half. 表3中统计并对比了所提出方法在USPS数据集上的Accuracy结果.

表 3 USPS数据集上的分类结果

Table 3. Classification Results on USPS Dataset

算法	Accuracy
教师网络	0.96
DAFL^[5]	0.9267±0.0021
DFAD^[33]	0.8899±0.0024
RDSKD^[34]	0.9073±0.0017
DG-DAFL	0.9302±0.0012
注：加粗为最优结果；“±”后的数值为多次实验的标准差.

下载: 导出CSV

| 显示表格

由表3可知，教师网络分类Accuracy=0.96，在此基础上实现DAFL模型. 学生网络的Accuracy=0.926 7. DFAD模型在USPS数据集上的Accuracy=0.8899，由于教师网络过度信任生成样本集中包含的噪音等样本，影响知识蒸馏效果及模型鲁棒性. RDSKD模型同样存在忽略生成样本质量等问题，降低学生网络准确率. DG-DAFL通过引入学生端生成器的双生成器方法，解决单生成器网络结构中学生网络训练过度依赖教师生成器网络样本产生的泛化性较低等问题. 同时，学生网络生成器所生成的数据在保证分布近似条件下的样本多样性，进一步提升学生网络识别泛化性的基础上，学生网络在USPS数据集下获得了更高的准确率及鲁棒性.

3.2 实验分析

1）DG-DAFL消融分析

为进一步讨论所提DG-DAFL模型中学生端生成器 ${G_{\rm{S}}}$ 优化过程的合理性及损失函数各部分的必要性，本节在MNIST数据集上实现消融实验并分析实验结果. 表4统计并对比了不同损失函数部分对Data-Free环境下模型准确率的影响.

表 4 MNIST数据集上消融实验结果

Table 4. Ablation Experiment Results on MNIST Dataset

伪标签损失	信息熵损失	特征损失	伪样本KL散度损失	Accuracy
√				0.8687
				0.2336
	√			0.1140
		√		0.2167
			√	0.8711
√	√	√		0.9758
√	√	√	√	0.9800
注：√表示该项存在.

下载: 导出CSV

| 显示表格

在消融实验中，利用真实数据训练的教师网络分类Accuracy=0.9839；学生端生成器 ${G_S}$ 在没有任何损失函数优化的情况下，利用随机生成样本并结合教师网络知识蒸馏，Accuracy达到0.868 7. 若仅利用对随机伪样本判别结果所构造的任一损失函数，包括伪标签损失、信息熵损失、特征损失，优化学生网络生成器 ${G_S}$ ，均难以得到满意的判别结果，其主要原因在于学生网络判别器未经过真实样本训练不包含真实先验信息，难以指导生成器训练. 若仅利用双生成器端KL散度作为优化信息，教师端生成器 ${G_{\rm{T}}}$ 经教师网络优化包含部分真实样本先验信息，可对 ${G_{\rm{S}}}$ 生成样本产生一定的先验监督作用，辅助生成器 ${G_{\rm{S}}}$ 生成相近的输出分布，在KL散度损失单独优化下，学生网络性能有小幅度提升. 当3种损失函数与生成器损失结合后，生成器 ${G_{\rm{S}}}$ 获得更多样本先验信息，保证生成样本与真实样本的分布一致性，并保证生成伪样本的多样性，提升学生网络模型的准确率.

2）DG-DAFL泛化性分析

为验证所提出的DG-DAFL模型具有更好的泛化性，本文基于MNIST数据集，构建实验数据集MNIST-F（训练集Tra与测试集Te）. 其中0~9为类别编号，由于样本类别编号1和7、0和8、6和9等具有判别特征上的相似性，将混淆分类模型，为识别带来难度. 本文缩小易混淆类别训练样本规模，具体将原始数据集中的训练样本类别编号为1，6，8的样本量减半，测试数据量保持不变，其详细描述如表5所示，表5中nTra与nTe分别为原始训练集与原始测试集.

表 5 泛化性测试数据集描述

Table 5. Description of Generalizability Test Dataset

类别编号	nTra	nTe	Tra	Te
0	5923	5923	980	980
1	3371	6742	1135	1135
2	5958	5958	1032	1032
3	6131	6131	1010	1010
4	5842	5842	982	982
5	5421	5421	892	892
6	2959	5918	958	968
7	6265	6265	1028	1028
8	2925	5851	974	974
9	5949	5949	1009	1009

下载: 导出CSV

| 显示表格

数据集MNIST-F实验中，教师网络结构为LeNet-5，学生网络结构为LeNet-5-half. 本文分别统计及对比了DAFL模型与所提出DG-DAFL模型的分类Accuracy，结果如表6所示.

表 6 MNIST-F数据集上的分类结果

Table 6. Classification Results on MNIST-F Dataset

算法	Accuracy
教师网络	0.9897
DAFL^[5]	0.9425
DG-DAFL	0.9695

下载: 导出CSV

| 显示表格

表6所示的是不同算法在MNIST-F数据集下的泛化性测试结果. DAFL算法的Accuracy=0.9425，DG-DAFL算法的Accuracy=0.9695，相比在MNIST数据集下的测试结果，DAFL算法的Accuracy值下降0.0262，DG-DAFL的算法Accuracy值下降0.0114，当在易混淆类别训练不足的情况下，本文所提出的DG-DAFL模型相比DAFL模型具有更好的泛化性和鲁棒性. DG-DAFL模型中的学生网络 ${N_{\rm{S}}}$ 的训练数据不完全依赖于教师端生成器 ${G_{\rm{T}}}$ ，避免在DAFL模型下由于函数one_hot构建的伪样本标签带来的大量噪声，解决学生网络 ${N_{\rm{S}}}$ 鲁棒性的问题. 为便于观察与分析，本文统计并对比了DAFL与DG-DAFL模型在MNIST-F数据集上的其他评价标准结果，如表6和表7所示.

表 7 DAFL模型针对不同类别统计结果

Table 7. Statistical Results of DAFL Model for Different Categories

类别编号	Accuracy	Recall	Specificity
0	0.993	0.957	0.999
1	0.988	0.989	0.998
2	0.928	0.987	0.991
3	0.910	0.988	0.989
4	0.986	0.987	0.998
5	0.908	0.952	0.991
6	0.995	0.956	0.999
7	0.972	0.979	0.997
8	0.989	0.860	0.999
9	0.973	0.966	0.997
注：加粗为最优结果.

下载: 导出CSV

| 显示表格

由表7与表8可知，泛化性测试下DG-DAFL模型总体上比DAFL模型在精确率、召回率、特异度指标上均有所提升. 类别1，6，8中训练样本量减少为一半的情况下，本文所提出的模型DG-DAFL在这3类上均获得了更好的性能. 原因在于DG-DAFL模型下，训练数据由双生成器生成，其更具多样性，避免了单一生成器容易导致生成数据泛化性低的问题.

表 8 DG-DAFL模型针对不同类别统计结果

Table 8. Statistical Results of DG-DAFL Model for Different Categories

类别编号	Accuracy	Recall	Specificity
0	0.989	0.989	0.999
1	0.982	0.996	0.998
2	0.954	0.995	0.994
3	0.937	0.994	0.993
4	0.989	0.983	0.999
5	0.968	0.964	0.997
6	0.996	0.973	1.000
7	0.982	0.982	0.998
8	0.997	0.878	1.000
9	0.958	0.983	0.995
注：加粗为最优结果.

下载: 导出CSV

| 显示表格

图8~10通过MNIST-F数据集下各类别的分类结果样本量及误分类样本量的混淆矩阵，可更为清晰地观察到DG-DAFL模型的效果更加接近教师网络，分类效果较优. 在真实标签为0，5，6，8，9上的分类中，DAFL模型比DG-DAFL模型出现更多错误分类，其原因为DAFL模型的训练数据仅依赖于教师网络，教师网络生成的伪标签带来大量噪声影响生成器性能，降低学生网络性能. DG-DAFL模型中学生网络的训练数据取决于教师端生成器和学生端生成器2方面的影响，避免过度依赖教师网络端生成器的情况，使得在DG-DAFL模型的训练过程中，生成训练数据更加接近真实数据，且保证生成图像的多样性. 同时，可观察到DAFL模型在易混淆的类别中将1类样本被误分类为7类样本，0，6，8类样本由于模型泛化性较低而被互相混淆，产生错误的分类.

图 8 教师网络泛化性测试的混淆矩阵

Figure 8. Confusion matrix for teacher network generalization test

下载: 全尺寸图片幻灯片

图 9 DAFL模型泛化性测试的混淆矩阵

Figure 9. Confusion matrix for DAFL generalization test

下载: 全尺寸图片幻灯片

图 10 DG-DAFL模型泛化性测试的混淆矩阵

Figure 10. Confusion matrix for DG-DAFL generalization test

下载: 全尺寸图片幻灯片

4. 结　　论

本文针对Data-Free环境中网络压缩及知识蒸馏问题，借助DAFL模型通过构建生成器获得伪训练样本的学习方式，提出DG-DAFL网络框架. 该框架设计双生成器网络结构，保证教师网络与学生网络完成一致学习任务，并实现样本生成器与教师网络分离，避免DAFL模型中生成器完全信任教师网络判别结果，产生失效优化问题. 同时，在学生网络生成器训练过程中，构造双生成器端伪样本分布损失，在充分利用教师网络潜在样本分布先验信息的同时避免过度依赖，生成更具多样性的伪样本集. 本文在3个流行的数据集上验证了算法的有效性，并构造数据集进一步分析了算法的泛化性及鲁棒性. 然而，Data-Free环境中生成的伪训练样本的质量将影响学生网络性能，接下来本文工作将围绕充分挖掘教师网络预训练样本结构特征等先验知识，构建更高质量的学生网络训练样本集. DG-DAFL方法代码及模型已开源：https://github.com/LNNU-computer-research-526/DG-DAFL.git.

作者贡献声明：张晶主要负责模型提出、算法设计及论文撰写；鞠佳良负责算法实现、实验验证及论文撰写；任永功负责模型思想设计及写作指导.

图 1 基于软件度量的缺陷预测模型

Figure 1. Defect prediction model based on software metrics

下载: 全尺寸图片幻灯片

图 2 基于语法语义的缺陷预测模型

Figure 2. Defect prediction model based on semantic and syntactic

下载: 全尺寸图片幻灯片

图 3 缺陷预测和漏洞预测相关文献数量

Figure 3. Number of literatures related to defect prediction and vulnerability prediction

下载: 全尺寸图片幻灯片

图 4 缺陷预测框架

Figure 4. Defect prediction framework

下载: 全尺寸图片幻灯片

图 5 评估指标统计

Figure 5. Summary of evaluation indicators

下载: 全尺寸图片幻灯片

图 6 度量元发展时间线

Figure 6. Timeline of metrics development

下载: 全尺寸图片幻灯片

图 7 代码示例

Figure 7. Code example

下载: 全尺寸图片幻灯片

表 1 软件缺陷状态描述

Table 1 Software Defect State Description

状态	描述
新建（New）	缺陷在测试中首次出现，并被质量工程师标记
待确认（Pending）	缺陷已被报告，并等待确认
开放（Open）	被确定为缺陷，等待被分配和修复
已分配（Assigned）	初步筛选后，被分配给适当的团队进行修复
拒绝（Rejected）	缺陷不需要修复或者不是缺陷
修复中（In Progress）	缺陷已被确认，并且开发人员正在处理修复
已修复（Fixed）	开发人员修改代码或者配置，并将缺陷标记为已修复
待测试（Test）	修复后的缺陷等待再次进行测试以验证修复是否有效
重新开放（Re-open）	经过修复并重新测试后，缺陷再次出现并被重新标记
已解决（Resolved）	缺陷已经修复，并且通过再次测试验证了修复的有效性
已关闭（Closed）	缺陷被确认为已解决，不需要进一步处理

下载: 导出CSV

表 2 缺陷检测与缺陷预测方法对比

Table 2 Comparison of Defect Detection and Defect Prediction Methods

方法	类别	准确性	范围	时间	局限性
手动测试	缺陷检测	较为准确	较小	很多	可能出现人为错误
自动化分析	缺陷检测	基本准确	较大	适中	难以处理视觉、用户体验等问题
静态分析	缺陷检测	基本准确	较小	少	无法检测运行时行为和集成问题
代码审查	缺陷检测	较为准确	较小	很多	取决于审查者的经验和技能水平
人工智能	缺陷预测	基本准确	大	适中	取决于数据质量和技术

下载: 导出CSV

表 3 软件缺陷模型的公共仓库数据来源

Table 3 Public Warehouse Data Sources for Software Defect Modeling

数据集	项目数	度量名称	文献数量	粒度	数据链接
NASA	13	代码度量	33	函数	http://promise.site.uottawa.ca/SERepository/datasets-page.html
SOFTLAB	5	代码度量	2	函数	https://github.com/bharlow058/AEEEM-and-other-SDP-datasets/tree/master/dataset/SOFTLAB
PROMISE	38	代码度量	41	类	https://zenodo.org/search?page=1&size=20&q=Marian%20Jureckzo&file_type=csv#
Relink	3	代码度量	7	文件	https://github.com/ai-se/HDP_pyjnius/tree/master/dataset/Relink
AEEEM	5	代码度量过程度量	14	类	https://bug.inf.usi.ch/download.php
MORPH	9	代码度量	2	类	https://github.com/bharlow058/AEEEM-and-other-SDP-datasets/tree/master/dataset/MORPH

下载: 导出CSV

表 4 公共数据集属性列表

Table 4 Attributes List of the Publicly Available Datasets

数据集	缺陷仓库	语言	属性	行数	缺陷行	缺陷率/%
CM1	NASA	C	22	498	49	9.84
JM1	NASA	C	22	10885	8779	80.65
KC1	NASA	C++	22	2109	326	15.46
KC2	NASA	C++	22	522	105	20.11
KC3	NASA	Java	40	458	43	9.39
KC4	NASA	Perl	40	125	61	48.80
MC1	NASA	C++	39	9466	68	0.72
MC2	NASA	C++	40	161	52	32.30
MW1	NASA	C	40	403	61	15.14
PC1	NASA	C	40	1107	76	6.87
PC2	NASA	C	40	5589	23	0.41
PC3	NASA	C	40	1563	160	10.24
PC4	NASA	C	40	1458	178	12.21
PC5	NASA	C++	39	17186	516	3.00
ant-1.7	PROMISE	Java	21	745	166	22.30
ivy-2.0	PROMISE	Java	21	352	40	11.40
camel-1.6	PROMISE	Java	21	965	188	19.50
jedit-4.0	PROMISE	Java	21	306	75	24.50
log4j-1.2	PROMISE	Java	21	109	37	33.90
Lucene-2.4	PROMISE	Java	21	195	91	46.70
poi-2.0	PROMISE	Java	21	314	37	11.80
Synapse-1.1	PROMISE	Java	21	222	60	27.00
velocity-1.6	PROMISE	Java	21	229	78	34.10
Xerces-1.3	PROMISE	Java	21	453	60	15.20
tomcat	PROMISE	Java	21	858	77	8.90
Xalan-2.4	PROMISE	Java	21	723	110	15.20
EQ	AEEEM	Java	62	324	129	39.81
JDT	AEEEM	Java	62	997	206	20.66
LC	AEEEM	Java	62	691	64	9.26
ML	AEEEM	Java	62	1862	245	13.16
PDE	AEEEM	Java	62	1497	209	13.96

下载: 导出CSV

表 5 开源软件项目缺陷数量列表

Table 5 Number of Defects List in Open-Source Software Projects

来源	开源软件	版本数量	细粒度	代码行	缺陷数量
文献[12]	Camel	2	文件	112367	62.00
文献[12]	Flume	2	文件	95782	47.00
文献[12]	Tika	2	文件	85341	16.00
文献[12]	Gedit	2	文件	60441	18.50
文献[12]	Nginx	2	文件	80618	18.00
文献[12]	Redis	2	文件	45991	21.00
文献[13]	Gedit	314	函数	2012	58.96
文献[13]	Nagios Core	93	函数	1750	4.82
文献[13]	Nginx	455	函数	1975	6.17
文献[13]	Redis	173	函数	2350	57.31

下载: 导出CSV

表 6 开源软件项目代码变更对缺陷的影响

Table 6 Impact of Code Changes on Defects in Open-Source Software Projects

开源软件	代码更改时间段	文件数量	每次更改的文件数	平均变更的代码行	代码更改诱发的缺陷率/%
Bugzilla	08/1998−12/2006	4620	2.3	37.5	36
Platform	05/2001−12/2007	64250	4.3	72.2	14
Mozilla	01/2000−12/2006	98275	5.3	106.5	5
JDT	05/2001−12/2007	35386	4.3	71.4	14
Columba	11/2002−07/2006	4455	6.2	149.4	31
PostgreSQL	07/1996−05/2010	20431	4.5	101.3	25

下载: 导出CSV

表 7 数据预处理方法

Table 7 Data Preprocessing Methods

来源	年份	数据集	数据预处理模型	分类方法	评价指标
文献[23]	2022	AEEM，NASA	AJCC-Ram	XGBoost	F1-Score
文献[27]	2018	PROMISE	NCL，RUS	Adaboost	PD，PF，G-mean，AUC
文献[30]	2018	NASA，SOFTLAB，ReLink， AEEEM，MORPH	CTKCCA	逻辑回归	PD，PF，F-measure， G-mean，AUC
文献[33]	2019	NASA，PROMISE	STr-NN+TCA	集成学习	F-measure，AUC，Recall，PF
文献[34]	2022	NASA，AEEEM，Relink	BiGAN	随机森林、支持向量机、朴素贝叶斯	AUC，G-mean，F1-Score
文献[39]	2021	NASA，PROMISE，AEEEM，ReLink	EWFS	决策树、朴素贝叶斯	F-measure，AUC
文献[45]	2017	ReLink，AEEEM	FESCH	决策树、朴素贝叶斯、逻辑回归	Precision，Recall， F-measure，AUC
文献[46]	2019	NASA	LSKDSA	逻辑回归	F-measure，AUC
文献[49]	2018	MORPH	HAL，KCPA	逻辑回归	F-measure，G-mean，Balance
文献[50]	2020	MORPH	CDS	随机森林、逻辑回归、朴素贝叶斯	F-measure，G-mean，Balance

下载: 导出CSV

表 8 常用评价指标及其描述

Table 8 Common Evaluation Indicators and Their Descriptions

评价指标	具体描述
Accuracy	模型预测正确的个数占实例总数中的比例
Precision，Correctness	模型预测有缺陷的实例中真实类别为缺陷所占的比例
Recall，TPR	模型预测有缺陷的实际数量占真实有缺陷中的比例
Specificity，TNR	模型预测无缺陷模块的实际数量占真实无缺陷中的比例
FPR	模型预测有缺陷的模块占真实无缺陷中的比例
FNR	模型预测无缺陷的模块占真实有缺陷的比例
AUC	ROC曲线下面积、AUC值越大，模型的有效性越好
MCC	观察到的分类与预测分类的比值
Balance	PF的最佳截止点，ROC曲线中(0, 1)点的归一化欧几里得距离
F-measure	是召回率和精确度之间的调和平均值
F1-Score，F2-Score	不平衡数据集学习的评价标准，表示精准率和召回率的组合
G-mean	Recall和Precision的几何平均数
Error Rate	所有实例中错误分类的比率
AAE	平均绝对误差，表示预测值和实际值之间的绝对差
ARE	平均相对误差，表示预测值和实际值的绝对差与实际值的比值
Completeness	实际缺陷值与预测缺陷值的比值

下载: 导出CSV

表 9 度量元的演进对比

Table 9 Comparison of Metric Evolution

来源	时间	度量元	机器学习/深度学习	评价指标	对比结果
文献[59]	2021	代码气味代码度量	RF，SVM，MLP，DT，NB	ROC，AUC，PR，F1-Score	代码气味优于代码度量且优于这两者的混合度量
文献[62]	2008	代码变更代码度量	LR，NB，DT	FP，Recall，PC	过程度量比代码度量更有效
文献[64]	2016	演化模式度量元代码变更度量元	NB、二元逻辑回归、J48决策树	Precision，Recall，F-measure，ROC	与代码和代码变更相比，演化度量元有相对较好的预测性能
文献[81]	2018	交叉熵	基于LSTM的循环神经网络（RNN）	Precision，Recall，F1-Score，AUC	交叉熵度量比50%的传统度量有更好的预测能力

下载: 导出CSV

表 10 缺陷和漏洞的区别与联系

Table 10 Differences and Connections Between Defects and Vulnerabilities

区别与联系	角度	缺陷	漏洞
区别	概念	软件或程序中存在的某种错误或隐藏的功能故障	软件在设计、实现、配置策略及使用过程中出现的缺陷，它可能导致攻击者在未授权的情况下访问或破坏系统.
	来源	软件架构和设计	软件代码（源代码或二进制代码）
	产生原因	测试范围过小，需求分析不精准，团队职责不规范，硬件配置、固件、处理器中的缺陷，软件配置、操作系统中的缺陷	编程人员的能力，硬件缺陷，软件缺陷，协议缺陷
	披露方式	软件存储库中会对缺陷进行披露，缺陷数据的质量高于漏洞数据的质量	漏洞的披露会引发一系列的攻击，开发人员和漏洞研究人员通常会限制公开披露漏洞的信息
	数量	较多	较少
联系	概念	漏洞是可能被攻击者利用从而实施入侵的软件缺陷
	来源	与硬件、代码的复杂性以及编程人员的能力有关
	影响	会对企业和人们的生活造成巨大的伤害
	检测和预测方法	手动测试、自动化、静态分析、动态分析、代码审查、人工智能

下载: 导出CSV

表 11 缺陷预测和漏洞预测任务的挑战与机遇

Table 11 Opportunities and Challenges of Defect Prediction and Vulnerability Prediction Tasks

挑战	机遇
数据集的来源与处理	建立一个高质量平衡且无噪音的基准数据集
代码向量的表征方法	构建一种最大程度蕴含语法语义信息的表征方法
预训练模型的提高	利用在其他领域训练好的词向量嵌入提升模型性能
深度学习模型的探索	探索更适合具体预测任务的深度学习模型
细粒度预测技术	更加精确地定位缺陷和漏洞可能出现的位置
预训练模型的迁移	通过模型的迁移节约时间和资源成本

下载: 导出CSV

参考文献(124)

[1]	Pachouly J, Ahirrao S, Kotecha K, et al. A systematic literature review on software defect prediction using artificial intelligence: Datasets, data validation methods, approaches, and tools[J]. Engineering Applications of Artificial Intelligence, 2022, 111: 1−33 doi: 10.1016/j.engappai.2022.104773
[2]	陈翔,顾庆,刘望舒,等. 静态软件缺陷预测方法研究[J]. 软件学报,2016,27(1):1−25 doi: 10.13328/j.cnki.jos.004923 Chen Xiang, Gu Qing, Liu Wangshu, et al. Survey of static software defect prediction[J]. Journal of Software, 2016, 27(1): 1−25 (in Chinese) doi: 10.13328/j.cnki.jos.004923
[3]	顾绵雪,孙鸿宇,韩丹,等. 基于深度学习的软件安全漏洞挖掘[J]. 计算机研究与发展,2021,58(10):2140−2162 doi: 10.7544/issn1000-1239.2021.20210620 Gu Mianxue, Sun Hongyu, Han Dan, et al. Software security vulnerability mining based on deep learning[J]. Journal of Computer Research and Development, 2021, 58(10): 2140−2162 (in Chinese) doi: 10.7544/issn1000-1239.2021.20210620
[4]	Trachtenberg M. Discovering how to ensure software reliability[J]. Radio Corporation of America Engineer, 1982, 27(1): 53−57
[5]	Qian Lianfen, Yao Qingchuan, Khoshgoftaar T M. Dynamic two-phase truncated Rayleigh model for release date prediction of software[J]. Journal of Software Engineering and Applications, 2010, 3(06): 603−609 doi: 10.4236/jsea.2010.36070
[6]	Bustamante A, Bustamante B. Multinomial-exponential reliability function: A software reliability model[J]. Reliability Engineering & System Safety, 2003, 79(3): 281−288
[7]	Zheng Yanyan, Xu Renzuo. An adaptive exponential smoothing approach for software reliability prediction[C]//Proc of 2008 4th Int Conf on Wireless Communications, Networking and Mobile Computing. Piscataway, NJ: IEEE, 2008: 1−4
[8]	Yamada S, Ohba M, Osaki S. S-shaped reliability growth modeling for software error detection[J]. IEEE Transactions on Reliability, 1983, 32(5): 475−484
[9]	Kececioglu D, Jiang S, Vassiliou P. The modified Gompertz reliability growth model[C]//Proc of Annual Reliability and Maintainability Symp (RAMS). Piscataway, NJ: IEEE, 1994: 160−165
[10]	Ahmad N, Imam M Z. Software reliability growth models with log-logistic testing-effort function: A comparative study[J]. International Journal of Computer Applications, 2014, 75(12): 8−11
[11]	宫丽娜,姜淑娟,姜丽. 软件缺陷预测技术研究进展[J]. 软件学报,2019,30(10):3090−3114 doi: 10.13328/j.cnki.jos.005790 Gong Lina, Jiang Shujuan, Jiang Li. Research progress of software defect prediction[J]. Journal of Software, 2019, 30(10): 3090−3114 (in Chinese) doi: 10.13328/j.cnki.jos.005790
[12]	Li Yiyao, Lee S Y, Wotawa F, et al. Using tri-relation networks for effective software fault-proneness prediction[J]. IEEE Access, 2019, 7: 63066−63080 doi: 10.1109/ACCESS.2019.2916615
[13]	Lee S Y, Wong W E, Li Yiyao, et al. Software fault-proneness analysis based on composite developer-module networks[J]. IEEE Access, 2021, 9: 155314−155334 doi: 10.1109/ACCESS.2021.3128438
[14]	Zhu Kun, Zhang Nana, Ying Shi, et al. Within-project and cross-project software defect prediction based on improved transfer naive Bayes algorithm[J]. Computers, Materials and Continua, 2020, 63(2): 891−910
[15]	Akiyama F. An example of software system debugging.[J]. IFIP Congress, 1971, 71(1): 353−359
[16]	Halstead M H. Elements of Software Science (Operating and Programming Systems Series)[M]. New York: Elsevier Science Inc, 1977
[17]	Shepperd M, Song Qinbao, Sun Zhongbin, et al. Data quality: Some comments on the NASA software defect datasets[J]. IEEE Transactions on Software Engineering, 2013, 39(9): 1208−1215 doi: 10.1109/TSE.2013.11
[18]	Khoshgoftaar T M, Gao Kehan, Napolitano A, et al. A comparative study of iterative and non-iterative feature selection techniques for software defect prediction[J]. Information Systems Frontiers, 2014, 16(5): 801−822 doi: 10.1007/s10796-013-9430-0
[19]	Li Zhiqiang, Jing Xiaoyuan, Zhu Xiaoke, et al. Heterogeneous defect prediction through multiple kernel learning and ensemble learning[C]//Proc of 2017 IEEE Int Conf on Software Maintenance and Evolution (ICSME). Piscataway, NJ: IEEE, 2017: 91−102
[20]	Kubat M, Matwin S. Addressing the curse of imbalanced training sets: One-sided selection[C]//Proc of the 14th Int Conf on Machine Learning. San Francisco: Morgan Kaufmann, 1997: 179−186
[21]	Kotsiantis S B, Pintelas P E. Mixture of expert agents for handling imbalanced data sets[J]. Annals of Mathematics, Computing & Teleinformatics, 2003, 1(1): 46−55
[22]	Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: Synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321−357 doi: 10.1613/jair.953
[23]	饶珍丹. 软件缺陷预测中不平衡数据分类算法研究[D]. 哈尔滨: 哈尔滨师范大学, 2022 Yao ZhenDan. Research on unbalanced data classification algorithm in software defect prediction[D]. Harbin: Harbin Normal University, 2022(in Chinese)
[24]	He Haibo, Bai Yang, Garcia E A, et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning[C]//Proc of 2008 IEEE Int Joint Conf on Neural Networks (IEEE World Congress on Computational Intelligence). Piscataway, NJ: IEEE, 2008: 1322−1328
[25]	Ma Li, Fan Suohai. CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests[J]. BMC Bioinformatics, 2017, 18(1): 1−18 doi: 10.1186/s12859-016-1414-x
[26]	Kim S, Zhang Hongyu, Wu Rongxin, et al. Dealing with noise in defect prediction[C]//Proc of 2011 33rd Int Conf on Software Engineering (ICSE). Piscataway, NJ: IEEE, 2011: 481−490
[27]	Chen Liu, Fang Bin, Shang Zhaowei, et al. Tackling class overlap and imbalance problems in software defect prediction[J]. Software Quality Journal, 2018, 26(1): 97−125 doi: 10.1007/s11219-016-9342-6
[28]	Tang Wei, Khoshgoftaar T M. Noise identification with the k-means algorithm[C]//Proc of 16th IEEE Int Conf on Tools with Artificial Intelligence. Piscataway, NJ: IEEE, 2004: 373−378
[29]	Goyal S. Handling class-imbalance with KNN (neighbourhood) under-sampling for software defect prediction[J]. Artificial Intelligence Review, 2022, 55(3): 2023−2064 doi: 10.1007/s10462-021-10044-w
[30]	Li Zhiqiang, Jing Xiaoyuan, Wu Fei, et al. Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction[J]. Automated Software Engineering, 2018, 25(2): 201−245 doi: 10.1007/s10515-017-0220-7
[31]	Yang Zhenyu, Jin Chufeng, Zhang Yue, et al. Software defect prediction: An ensemble learning approach[J]. Journal of Physics:Conf Series, 2022, 2171(1): 012008 doi: 10.1088/1742-6596/2171/1/012008
[32]	Jiang Feng, Yu Xu, Gong Dunwei, et al. A random approximate reduct-based ensemble learning approach and its application in software defect prediction[J]. Information Sciences, 2022, 609: 1147−1168 doi: 10.1016/j.ins.2022.07.130
[33]	Gong Lina, Jiang Shujuan, Bo Lili, et al. A novel class-imbalance learning approach for both within-project and cross-project defect prediction[J]. IEEE Transactions on Reliability, 2019, 69(1): 40−54
[34]	Zhang Shenggang, Jiang Shujuan, Yan Yue. A software defect prediction approach based on BiGAN anomaly detection[J]. Scientific Programming, 2022, 2022(1): 1−13
[35]	Rodriguez D, Herraiz I, Harrison R, et al. Preliminary comparison of techniques for dealing with imbalance in software defect prediction[C]//Proc of the 18th Int Conf on Evaluation and Assessment in Software Engineering. New York: ACM, 2014: 1−10
[36]	Eivazpour Z, Keyvanpour M R. CSSG: A cost-sensitive stacked generalization approach for software defect prediction[J]. Software Testing, Verification and Reliability, 2021, 31(5): e1761
[37]	Kohavi R, John G H. Wrappers for feature subset selection[J]. Artificial Intelligence, 1997, 97(1-2): 273−324 doi: 10.1016/S0004-3702(97)00043-X
[38]	He Xiaofei, Cai Deng, Niyogi P. Laplacian score for feature selection[C]//Proc of the 18th Int Conf on Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 2005
[39]	Balogun A O, Basri S, Capretz L F, et al. Software defect prediction using wrapper feature selection based on dynamic re-ranking strategy[J]. Symmetry, 2021, 13(11): 2166−2189 doi: 10.3390/sym13112166
[40]	Thirumoorthy K. A feature selection model for software defect prediction using binary Rao optimization algorithm[J]. Applied Soft Computing, 2022, 131: 109737−109753 doi: 10.1016/j.asoc.2022.109737
[41]	Bahaweres R B, Suroso A I, Hutomo A W, et al. Tackling feature selection problems with genetic algorithms in software defect prediction for optimization[C]//Proc of 2020 Int Conf on Informatics, Multimedia, Cyber and Information System (ICIMCIS). Piscataway, NJ: IEEE, 2020: 64−69
[42]	Miao Linsong, Liu Mingxia, Zhang Daoqiang. Cost-sensitive feature selection with application in software defect prediction[C]//Proc of the 21st Int Conf on Pattern Recognition (ICPR2012). Piscataway, NJ: IEEE, 2012: 967−970
[43]	Liu Shulong, Chen Xiang, Liu Wangshu, et al. FECAR: A feature selection framework for software defect prediction[C]//Proc of 2014 IEEE 38th Annual Computer Software and Applications Conf. Piscataway, NJ: IEEE, 2014: 426−435
[44]	Nam J, Pan S J, Kim S. Transfer defect learning[C]//Proc of 2013 35th Int Conf on Software Engineering (ICSE). Piscataway, NJ: IEEE, 2013: 382−391
[45]	Ni Chao, Liu Wangshu, Chen Xiang, et al. A cluster based feature selection method for cross-project software defect prediction[J]. Journal of Computer Science and Technology, 2017, 32(6): 1090−1107 doi: 10.1007/s11390-017-1785-0
[46]	Li Zhiqiang, Qi Chao, Zhang Li, et al. Discriminant subspace alignment for cross-project defect prediction[C]//Proc of 2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). Piscataway, NJ: IEEE, 2019: 1728−1733
[47]	Chen Jinfu, Wang Xiaoli, Cai Saihua, et al. A software defect prediction method with metric compensation based on feature selection and transfer learning[J]. Frontiers of Information Technology & Electronic Engineering, 2022, 23(5): 715−731
[48]	Lu Huihua, Kocaguneli E, Cukic B. Defect prediction between software versions with active learning and dimensionality reduction[C]//Proc of 2014 IEEE 25th Int Symp on Software Reliability Engineering. Piscataway, NJ: IEEE, 2014: 312−322
[49]	Xu Zhou, Liu Jin, Luo Xiapu, et al. Cross-version defect prediction via hybrid active learning with kernel principal component analysis[C]//Proc of 2018 IEEE 25th Int Conf on Software Analysis, Evolution and Reengineering (SANER). Piscataway, NJ: IEEE, 2018: 209−220
[50]	Zhang Jie, Wu Jiajing, Chen C, et al. Cds: A cross–version software defect prediction model with data selection[J]. IEEE Access, 2020, 8: 110059−110072 doi: 10.1109/ACCESS.2020.3001440
[51]	Marcus A, Maletic J I. Recovering documentation-to-source-code traceability links using latent semantic indexing[C]//Proc of 25th Int Conf on Software Engineering (ICSE). Piscataway, NJ: IEEE, 2003: 125−135
[52]	Menzies T, Dekhtyar A, Distefano J, et al. Problems with precision: A response to “comments on ‘data mining static code attributes to learn defect predictors’”[J]. IEEE Transactions on Software Engineering, 2007, 33(9): 637−640 doi: 10.1109/TSE.2007.70721
[53]	Yao Jingxiu, Shepperd M. The impact of using biased performance metrics on software defect prediction research[J]. Information and Software Technology, 2021, 139(11): 1−14
[54]	乔辉. 软件缺陷预测技术研究[D]. 郑州: 解放军信息工程大学, 2013 Qiao Hui. Research on software defect prediction techniques[D]. Zhengzhou: Information Engineering University, 2013 (in Chinese)
[55]	McCabe T J. A complexity measure[J]. IEEE Transactions on Software Engineering, 1976, 2(4): 308−320
[56]	Chidamber S R, Kemerer C F. A metrics suite for object oriented design[J]. IEEE Transactions on Software Engineering, 1994, 20(6): 476−493 doi: 10.1109/32.295895
[57]	Brito E A F, Carapuça R. Candidate metrics for object-oriented software within a taxonomy framework[J]. Journal of Systems and Software, 1994, 26(1): 87−96 doi: 10.1016/0164-1212(94)90099-X
[58]	Bansiya J, Davis C G. A hierarchical model for object-oriented design quality assessment[J]. IEEE Transactions on Software Engineering, 2002, 28(1): 4−17 doi: 10.1109/32.979986
[59]	Sotto-Mayor B, Kalech M. Cross-project smell-based defect prediction[J]. Soft Computing, 2021, 25(22): 14171−14181 doi: 10.1007/s00500-021-06254-7
[60]	Khoshgoftaar T M, Szabo R M. Improving code churn predictions during the system test and maintenance phases[C]//Proc of 1994 Int Conf on Software Maintenance. Piscataway, NJ: IEEE, 1994: 58−67
[61]	Nagappan N, Ball T. Use of relative code churn measures to predict system defect density[C]//Proc of the 27th Int Conf on Software Engineering. New York: ACM, 2005: 284−292
[62]	Moser R, Pedrycz W, Succi G. A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction[C]//Proc of the 30th Int Conf on Software Engineering. New York: ACM, 2008: 181−190
[63]	Knab P, Pinzger M, Bernstein A. Predicting defect densities in source code files with decision tree learners[C]//Proc of the 2006 Int Workshop on Mining Software Repositories. New York: ACM, 2006: 119−125
[64]	王丹丹,王青. 基于演化数据的软件缺陷预测性能改进[J]. 软件学报,2016,27(12):3014−3029 doi: 10.13328/j.cnki.jos.004869 Wang Dandan, Wang Qing. Improving the performance of defect prediction based on evolution data[J]. Journal of Software, 2016, 27(12): 3014−3029 (in Chinese) doi: 10.13328/j.cnki.jos.004869
[65]	Liu Yibin, Li Yanhui, Guo Jianbo, et al. Connecting software metrics across versions to predict defects[C]//Proc of 2018 IEEE 25th Int Conf on Software Analysis, Evolution and Reengineering (SANER). Piscataway, NJ: IEEE, 2018: 232−243
[66]	Mockus A, Weiss D M. Predicting risk of software changes[J]. Bell Labs Technical Journal, 2000, 5(2): 169−180
[67]	Weyuker E J, Ostrand T J, Bell R M. Using developer information as a factor for fault prediction[C]//Proc of Third Int Workshop on Predictor Models in Software Engineering. Piscataway, NJ: IEEE, 2007: 8−15
[68]	Ostrand T J, Weyuker E J, Bell R M. Programmer-based fault prediction[C]//Proc of the 6th Int Conf on Predictive Models in Software Engineering. New York: ACM, 2010: 1−10
[69]	Pinzger M, Nagappan N, Murphy B. Can developer-module networks predict failures?[C]//Proc of the 16th ACM SIGSOFT Int Symp on Foundations of Software Engineering. New York: ACM, 2008: 2−12
[70]	Nagappan N, Murphy B, Basili V. The influence of organizational structure on software quality[C]//Proc of 2008 ACM/IEEE 30th Int Conf on Software Engineering. New York: ACM, 2008: 521−530
[71]	Mockus A. Organizational volatility and its effects on software defects[C]//Proc of the 18th ACM SIGSOFT Int Symp on Foundations of Software Engineering. New York: ACM, 2010: 117−126
[72]	Zhou Yuming, Leung H. Empirical analysis of object-oriented design metrics for predicting high and low severity faults[J]. IEEE Transactions on Software Engineering, 2006, 32(10): 771−789 doi: 10.1109/TSE.2006.102
[73]	Pai G J, Dugan J B. Empirical analysis of software fault content and fault proneness using Bayesian methods[J]. IEEE Transactions on Software Engineering, 2007, 33(10): 675−686 doi: 10.1109/TSE.2007.70722
[74]	Seliya N, Khoshgoftaar T M. Software quality analysis of unlabeled program modules with semisupervised clustering[J]. IEEE Transactions on Systems, Man, and Cybernetics-Part A:Systems and Humans, 2007, 37(2): 201−211 doi: 10.1109/TSMCA.2006.889473
[75]	Catal C, Sevim U, Diri B. Clustering and metrics thresholds based software fault prediction of unlabeled program modules[C]//Proc of 2009 6th Int Conf on Information Technology: New Generations. Piscataway, NJ: IEEE, 2009: 199−204
[76]	Arisholm E, Briand L C, Johannessen E B. A systematic and comprehensive investigation of methods to build and evaluate fault prediction models[J]. Journal of Systems and Software, 2010, 83(1): 2−17 doi: 10.1016/j.jss.2009.06.055
[77]	Gyimóthy T, Ferenc R, Siket I. Empirical validation of object-oriented metrics on open source software for fault prediction[J]. IEEE Transactions on Software Engineering, 2005, 31(10): 897−910 doi: 10.1109/TSE.2005.112
[78]	Zheng Jun. Cost-sensitive boosting neural networks for software defect prediction[J]. Expert Systems with Applications, 2010, 37(6): 4537−4543 doi: 10.1016/j.eswa.2009.12.056
[79]	Shukla S, Radhakrishnan T, Muthukumaran K, et al. Multi-objective cross-version defect prediction[J]. Soft Computing, 2018, 22(6): 1959−1980 doi: 10.1007/s00500-016-2456-8
[80]	Zhao Liuchang, Shang Zhaowei, Zhao Ling, et al. Siamese dense neural network for software defect prediction with small data[J]. IEEE Access, 2018, 7: 7663−7677
[81]	Zhang Xian, Ben K, Zeng Jie. Cross-entropy: A new metric for software defect prediction[C]//Proc of 2018 IEEE Int Conf on Software Quality, Reliability and Security (QRS). Piscataway, NJ: IEEE, 2018: 111−122
[82]	Yang Xinli, Lo D, Xia Xin, et al. Deep learning for just-in-time defect prediction[C]//Proc of 2015 IEEE Int Conf on Software Quality, Reliability and Security. Piscataway, NJ: IEEE, 2015: 17−26
[83]	Wang Song, Liu Taiyue, Tan Lin. Automatically learning semantic features for defect prediction[C]//Proc of 2016 IEEE/ACM 38th Int Conf on Software Engineering (ICSE). Piscataway, NJ: IEEE, 2016: 297−308
[84]	Wang Song, Liu Taiyue, Nam J, et al. Deep semantic feature learning for software defect prediction[J]. IEEE Transactions on Software Engineering, 2018, 46(12): 1267−1293
[85]	Li Jian, He Pinjia, Zhu Jieming, et al. Software defect prediction via convolutional neural network[C]//Proc of 2017 IEEE Int Conf on Software Quality, Reliability and Security (QRS). Piscataway, NJ: IEEE, 2017: 318−328
[86]	Fan Guisheng, Diao Xuyang, Yu Huiqun, et al. Software defect prediction via attention-based recurrent neural network[J]. Scientific Programming, 2019, 2019(4): 1−14
[87]	Qiu Shaojian, Lu Lu, Cai Ziyi, et al. Cross-project defect prediction via transferable deep learning-generated and handcrafted features[C]//Proc of the 31st Int Conf on Software Engineering and Knowledge Engineering. Skokie: Knowledge Systems Institute Graduate School, 2019: 431−552
[88]	Liu Wangshu, Zhu Yongteng, Chen Xiang, et al. S² LMMD: Cross-project software defect prediction via statement semantic learning and maximum mean discrepancy[C]//Proc of 2021 28th Asia-Pacific Software Engineering Conf (APSEC). Piscataway, NJ: IEEE, 2021: 369−379
[89]	Dam H K, Pham T, Ng S W, et al. A deep tree-based model for software defect prediction[J]. ArXiv Preprint ArXiv: 1802.00921, 2018
[90]	Šikić L, Kurdija A S, Vladimir K, et al. Graph neural network for source code defect prediction[J]. IEEE Access, 2022, 10: 10402−10415 doi: 10.1109/ACCESS.2022.3144598
[91]	Phan A V, Le Nguyen M, Bui L T. Convolutional neural networks over control flow graphs for software defect prediction[C]//Proc of 2017 IEEE 29th Int Conf on Tools with Artificial Intelligence (ICTAI). Piscataway, NJ: IEEE, 2017: 45−52
[92]	Xu Jiaxi, Ai Jun, Liu Jingyu, et al. ACGDP: An augmented code graph-based system for software defect prediction[J]. IEEE Transactions on Reliability, 2022, 71(2): 850−864 doi: 10.1109/TR.2022.3161581
[93]	Li Zhen, Zou Deqing, Xu Shouhuai, et al. VulDeePecker: A deep learning-based system for vulnerability detection[J]. ArXiv Preprint ArXiv: 1801.01681, 2018
[94]	Huo Xuan, Yang Yang, Li Ming, et al. Learning semantic features for software defect prediction by code comments embedding[C]//Proc of 2018 IEEE Int Conf on Data Mining (ICDM). Piscataway, NJ: IEEE, 2018: 1049−1054
[95]	Qu Yu, Liu Ting, Chi Jianlei, et al. Node2defect: Using network embedding to improve software defect prediction[C]//Proc of 2018 33rd IEEE/ACM Int Conf on Automated Software Engineering (ASE). Piscataway, NJ: IEEE, 2018: 844−849
[96]	Zeng Cheng, Zhou Chunying, Lv Shengkai, et al. GCN2defect: Graph convolutional networks for smotetomek-based software defect prediction[C]//Proc of 2021 IEEE 32nd Int Symp on Software Reliability Engineering (ISSRE). Piscataway, NJ: IEEE, 2021: 69−79
[97]	Zhou Chunying, He Peng, Zeng Cheng, et al. Software defect prediction with semantic and structural information of codes based on graph neural networks[J]. Information and Software Technology, 2022, 152: 107057 doi: 10.1016/j.infsof.2022.107057
[98]	Yang Fengyu, Huang Yaxuan, Xu Haoming, et al. Fine-grained software defect prediction based on the method-call sequence[J]. Computational Intelligence and Neuroscience, 2022, 2022(8): 1−15
[99]	Uddin M N, Li Bixin, Ali Z, et al. Software defect prediction employing BiLSTM and BERT-based semantic feature[J]. Soft Computing, 2022, 26(16): 7877−7891 doi: 10.1007/s00500-022-06830-5
[100]	Shin Y, Williams L. An empirical model to predict security vulnerabilities using code complexity metrics[C]//Proc of the Second ACM-IEEE Int Symp on Empirical Software Engineering and Measurement. New York: ACM, 2008: 315−317
[101]	Gegick M, Williams L, Osborne J, et al. Prioritizing software security fortification through code-level security metrics[C]//Proc of Workshop on Quality of Protection. New York: ACM, 2008: 31−38
[102]	Meneely A, Williams L. Secure open source collaboration: An empirical study of Linus’ law[C]//Proc of the 16th ACM Conf on Computer and Communications Security. New York: ACM, 2009: 453−462
[103]	Shin Y, Meneely A, Williams L, et al. Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities[J]. IEEE Transactions on Software Engineering, 2010, 37(6): 772−787
[104]	Chowdhury I, Zulkernine M. Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities[J]. Journal of Systems Architecture, 2011, 57(3): 294−313 doi: 10.1016/j.sysarc.2010.06.003
[105]	Hovsepyan A, Scandariato R, Joosen W, et al. Software vulnerability prediction using text analysis techniques[C]//Proc of the 4th Int Workshop on Security Measurements and Metrics. New York: ACM, 2012: 7−10
[106]	Scandariato R, Walden J, Hovsepyan A, et al. Predicting vulnerable software components via text mining[J]. IEEE Transactions on Software Engineering, 2014, 40(10): 993−1006 doi: 10.1109/TSE.2014.2340398
[107]	Yamaguchi F, Lottmann M, Rieck K. Generalized vulnerability extrapolation using abstract syntax trees[C]//Proc of the 28th Annual Computer Security Applications Conf. New York: ACM, 2012: 359−368
[108]	Meng Qingkun, Wen Shameng, Feng Chao, et al. Predicting buffer overflow using semi-supervised learning[C]//Proc of 2016 9th Int Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI). Piscataway, NJ: IEEE, 2016: 1959−1963
[109]	Pang Yulei, Xue Xiaozhen, Wang Huaying. Predicting vulnerable software components through deep neural network[C]//Proc of the 2017 Int Conf on Deep Learning Technologies. New York: ACM, 2017: 6−10
[110]	Dam H K, Tran T, Pham T, et al. Automatic feature learning for predicting vulnerable software components[J]. IEEE Transactions on Software Engineering, 2018, 47(1): 67−85
[111]	Kalouptsoglou I, Siavvas M, Kehagias D, et al. An empirical evaluation of the usefulness of word embedding techniques indeep learning-based vulnerability prediction[C]//Proc of Int ISCIS Security Workshop. Berlin: Springer, 2022: 23−37
[112]	马倩华. 基于深度学习的软件源码漏洞预测[D]. 北京: 北京邮电大学, 2020 Ma Qianhua. Deep learning-based software vulnerability prediction[D]. Beijing: Beijing University of Posts and Telecommunications, 2020 (in Chinese)
[113]	Zhou Yaqin, Liu Shangqing, Siow J, et al. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks[C]//Proc of 33rd Conf on Neural Information Processing Systems (NeurIPS). San Diego, CA, USA: NIPS, 2019: 10197−10207
[114]	Li Zhen, Zou Deqing, Xu Shouhuai, et al. SySeVR: A framework for using deep learning to detect software vulnerabilities[J]. IEEE Transactions on Dependable and Secure Computing, 2021, 19(4): 2244−2258
[115]	Li Yi, Wang Shouhua, Nguyen T N. Vulnerability detection with fine-grained interpretations[C]//Proc of the 29th ACM Joint Meeting on European Software Engineering Conf and Symp on the Foundations of Software Engineering. New York: ACM, 2021: 292−303
[116]	Fu M, Tantithamthavorn C. LineVul: A transformer-based line-level vulnerability prediction[C]//Proc of 2022 IEEE/ACM 19th Int Conf on Mining Software Repositories (MSR). Piscataway, NJ: IEEE, 2022: 608−620
[117]	Shin Y, Williams L. Is complexity really the enemy of software security?[C]//Proc of the 4th ACM Workshop on Quality of Protection. New York: ACM, 2008: 47−50
[118]	Viega J, McGraw G R. Building Secure Software: How to Avoid Security Problems the Right Way, Portable Documents[M]. London: Pearson Education, 2001
[119]	高志伟,姚尧,饶飞,等. 基于漏洞严重程度分类的漏洞预测模型[J]. 电子学报,2013,41(9):1784−1787 doi: 10.3969/j.issn.0372-2112.2013.09.018 Gao Zhiwei, Yao Yao, Rao Fei, et al. Prediction model of vulnerabilities based on the type of vulnerability severity[J]. Acta Electronica Sinica, 2013, 41(9): 1784−1787 (in Chinese) doi: 10.3969/j.issn.0372-2112.2013.09.018
[120]	Pan Zhixin, Mishra P. A survey on hardware vulnerability analysis using machine learning[J]. IEEE Access, 2022, 10: 49508−49527 doi: 10.1109/ACCESS.2022.3173287
[121]	Palix N, Thomas G, Saha S, et al. Faults in Linux: Ten years later[C]//Proc of the 16th Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2011: 305−318
[122]	Zimmermann T, Nagappan N, Williams L. Searching for a needle in a haystack: Predicting security vulnerabilities for windows vista[C]//Proc of 2010 3rd Int Conf on Software Testing, Verification and Validation. Piscataway, NJ: IEEE, 2010: 421−428
[123]	Shin Y, Williams L A. Can fault prediction models and metrics be used for vulnerability prediction?[R]. North Carolina, USA: North Carolina State University, Department of Computer Science, 2010
[124]	Shin Y, Williams L. Can traditional fault prediction models be used for vulnerability prediction?[J]. Empirical Software Engineering, 2013, 18(1): 25−59 doi: 10.1007/s10664-011-9190-8

施引文献(6)

期刊类型引用(2)

1.	邱紫韵. 基于节点重要性的端到端信息流控制方法. 河南科技学院学报(自然科学版). 2025(01): 58-65 . 百度学术
2.	牛月. 计算机技术在办公自动化中的应用. 科技创新与应用. 2024(08): 187-190 . 百度学术