开源软件缺陷预测方法综述

田笑; 常继友; 张弛; 荣景峰; 王子昱; 张光华; 王鹤; 伍高飞; 胡敬炉; 张玉清

doi:10.7544/issn1000-1239.202221046

开源软件缺陷预测方法综述

田笑^{1, 2,},
常继友³,
张弛²,
荣景峰^{2, 6},
王子昱³,
张光华³,
王鹤^{1, 2},
伍高飞^{1, 2, 4},
胡敬炉⁵,
张玉清^{1, 2, 6, 7, ,}

1.
西安电子科技大学网络与信息安全学院　西安　710126
2.
国家计算机网络入侵防范中心（中国科学院大学）　北京　101408
3.
河北科技大学信息科学与工程学院　石家庄　050018
4.
广西密码学与信息安全重点实验室（桂林电子科技大学）　广西桂林　541000
5.
早稻田大学情报生产系统研究科　日本　808-0135
6.
海南大学网络空间安全学院　海口　570228
7.
中关村实验室　北京　100094

基金项目: 先进密码技术与系统安全四川省重点实验室开放课题(SKLACSS-202205)；海南省重点研发计划项目(GHYF2022010, ZDYF202012)；国家自然科学基金项目(U1836210)；陕西省自然科学基础研究计划(2021JQ-192)；广西密码学与信息安全重点实验室课题(GCIS202123)

详细信息

作者简介:
田笑: 1999年生. 硕士研究生. 主要研究方向为网络与信息安全

常继友: 1999年生. 硕士研究生. 主要研究方向为网络与信息安全

张弛: 2002年生. 硕士研究生. 主要研究方向为人工智能与安全

荣景峰: 1986年生. 博士研究生. 主要研究方向为网络与信息安全

王子昱: 1998年生. 硕士研究生. 主要研究方向网络与信息安全

张光华: 1979年生. 博士，教授，硕士生导师. 主要研究方向为网络与信息安全

王鹤: 1987年生. 博士，讲师，硕士生导师. 主要研究方向为密码学、量子密码协议

伍高飞: 1987年生. 博士，讲师，硕士生导师. 主要研究方向为密码学

胡敬炉: 1962年生. 博士，教授，博士生导师. 主要研究方向为计算智能

张玉清: 1966年生. 博士，教授，博士生导师. 主要研究方向为信息安全

通讯作者:
张玉清（zhangyq@nipc.org.cn）

中图分类号: TP311
计量
- 文章访问数: 678
- HTML全文浏览量: 109
- PDF下载量: 293
出版历程
- 收稿日期: 2023-03-29
- 修回日期: 2023-06-05
- 网络出版日期: 2023-07-04
- 刊出日期: 2023-06-30

Survey of Open-Source Software Defect Prediction Method

Tian Xiao^{1, 2,},
Chang Jiyou³,
Zhang Chi²,
Rong Jingfeng^{2, 6},
Wang Ziyu³,
Zhang Guanghua³,
Wang He^{1, 2},
Wu Gaofei^{1, 2, 4},
Hu Jinglu⁵,
Zhang Yuqing^{1, 2, 6, 7, ,}

1.
School of Cyber Engineering, Xidian University, Xi’an 710126
2.
National Computer Network Intrusion Protection Center (University of Chinese Academy of Sciences), Beijing 101408
3.
School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang 050018
4.
Guangxi Key Laboratory of Cryptography and Information Security (Guilin University of Electronic Technology), Guilin,Guangxi 541000
5.
Graduate School of Information, Production and Systems, Waseda University, Japan 808-0135
6.
College of Cyberspace Security, Hainan University, Haikou 570228
7.
Zhongguancun Laboratory, Beijing 100094

Funds: This work was supported by the Open Fund of Advanced Cryptography and System Security Key Laboratory of Sichuan Province (SKLACSS-202205), the Key Research and Development Program of Hainan Province (GHYF2022010, ZDYF202012), and the National Natural Science Foundation of China (U1836210), the Natural Science Basis Research Plan in Shaanxi Province of China (2021JQ-192), and the Program of Guangxi Key Laboratory of Cryptography and Information Security (GCIS202123)

More Information

Author Bio:
Tian Xiao: born in 1999. Master candidate. Her main research interest includes network and information security

Chang Jiyou: born in 1999. Master candidate. His main research interest includes network and information security

Zhang Chi: born in 2002. Master candidate. His main research interest includes AI and security

Rong Jingfeng: born in 1986. PhD candidate. His main research interest includes network and information security

Wang Ziyu: born in 1998. Master candidate. His main research interest includes network and information security

Zhang Guanghua: born in 1979. PhD, professor, master supervisor. His main research interest includes network and information security

Wang He: born in 1987. PhD, lecturer, master supervisor. Her main research interests include cryptography, quantum cryptographic protocol

Wu Gaofei: born in 1987. PhD, lecturer, master supervisor. His main research interest includes cryptography

Hu Jinglu: born in 1962. PhD, professor, PhD supervisor. His main research interest includes computational intelligence

Zhang Yuqing: born in 1966. PhD, professor, PhD supervisor. His main research interest includes information security

摘要

摘要:
开源软件缺陷预测通过挖掘软件历史仓库的数据，利用与软件缺陷相关的度量元或源代码本身的语法语义特征，借助机器学习或深度学习方法提前发现软件缺陷，从而减少软件修复成本并提高产品质量. 漏洞预测则通过挖掘软件实例存储库来提取和标记代码模块，预测新的代码实例是否含有漏洞，减少漏洞发现和修复的成本. 通过对2000年至2022年12月软件缺陷预测研究领域的相关文献调研，以机器学习和深度学习为切入点，梳理了基于软件度量和基于语法语义的预测模型. 基于这2类模型，分析了软件缺陷预测和漏洞预测之间的区别和联系，并针对数据集来源与处理、代码向量的表征方法、预训练模型的提高、深度学习模型的探索、细粒度预测技术、软件缺陷预测和漏洞预测模型迁移六大前沿热点问题进行了详尽分析，最后指出了软件缺陷预测未来的发展方向.
- 软件缺陷预测 /
- 漏洞预测 /
- 机器学习 /
- 深度学习 /
- 度量元 /
- 语法语义分析
Abstract:
Open-source software defect prediction reduces software repair costs and improves product quality by mining data from software history warehouses, using the syntactic semantic features of metrics related to software defects or the source code itself, and utilizing machine learning or deep learning methods to find software defects in advance. Vulnerability prediction extracts and tags code modules by mining software instance repositories to predict whether new code instances contain vulnerabilities in order to reduce the cost of vulnerability discovery and fixing. We investigate and analyze the relevant literatures in the field of software defect prediction from 2000 to December 2022. Taking machine learning and deep learning as the starting point, we sort out two types of prediction models which are based on software metrics and grammatical semantics. Based on the two types of models, the difference and connection between software defect prediction and vulnerability prediction are analyzed. Moreover, six frontier hot issues such as dataset source and processing, code vector representation method, pre-training model improvement, deep learning model exploration, fine-grained prediction technology, software defect prediction and vulnerability prediction model migration are analyzed in detail. Finally, the future development direction of software defect prediction is pointed out.
- software defect prediction /
- vulnerability prediction /
- machine learning /
- deep learning /
- metric /
- semantic and syntactic analysis

HTML全文

无线体域网^[1]（wireless body area network, WBAN）指由佩戴或嵌入在人体的各种无线传感器（wireless sensor, WS）组成的无线通信网络.WBAN技术在医疗数据监测方面的应用极为广泛，不同类型的无线医疗传感器负责监测患者各个方面的医疗数据并将数据发送给各种远端服务器，方便对患者的医疗数据做出专业的分析与整合.然而，开放的WBAN在传输患者敏感的医疗数据时，面临着患者的隐私被泄露或医疗数据被恶意篡改等风险^[2].

许多国内外学者提出将密码体制应用到WBAN中，以确保WBAN的医疗数据在传输与共享时的机密性.Mykletun等人^[3]基于传统公钥密码（public key cryptography, PKC）体制，设计了一种保证无线传感网络数据机密性的加密方案.Nadir等人^[4]基于PKC体制与椭圆曲线密码体制为用户生成对称密钥来加密数据，确保医疗数据在无线传感网络中传输与共享时的机密性.然而，基于PKC体制的方案^[3-4]需要可信中心对用户证书进行管理，为消除证书管理的开销，一些基于身份加密体制的WBAN方案^[5-7]相继被提出.上述文献[3−7]利用对数据进行加密的方式确保了医疗数据传输时的机密性，但这种方式没有实现对医疗数据来源的认证.如果无法实现医疗数据的可认证性，不仅会导致医院浪费宝贵的医疗资源进行无效的诊断，还可能基于被篡改的医疗数据而对患者的病情做出错误诊断.

为了实现WBAN中医疗数据的可认证性，Ahn等人^[8]构造了一种基于高级加密标准（advanced encryption standard，AES）对称密码体制的认证方案.黄一才等人^[9]基于身份密码体制设计了一种签名方案，该方案实现了抗重放攻击.Cagalaban等人^[10]将数字签密技术引入医疗保健系统，在确保医疗数据机密性的同时实现了数据的可认证性.Ullah等人^[11]利用超椭圆曲线的概念，设计了一种基于证书的签密方案.尽管文献[8−11]实现了医疗数据的可认证性，但都没有考虑在多用户环境下的应用场景.为解决密码方案在多用户环境下的WBAN中计算效率较低的问题，基于聚合签名与聚合加密等技术，一些支持聚合模式的方案^[12-15]相继被提出.然而，文献[8−15]没有考虑如何对WBAN云端密文进行有效的搜索，导致数据用户在对医疗数据进行检索时开销较大.

基于可搜索加密技术^[16]与密文等值测试技术^[17]，国内外学者提出了一些适用于WBAN的密文检索方案^[18-21].但这些WBAN密文检索方案均存在一些缺陷，例如张嘉懿^[18]与Andrew等人^[19]提出的可搜索加密方案仅支持对用相同公钥加密的医疗数据进行搜索；Ramadan等人^[20]设计的等值测试加密方案无法实现对医疗数据来源的认证；Elhabob等人^[21]设计的基于证书的密文等值测试方案存在证书管理问题等.此外，医生或医疗机构有时需要判断多个患者某些特定方面的医疗数据是否相同，或对有相同病症的患者的医疗数据进行整合与存档，但密文检索文献[18− 21]均没有考虑到多用户检索以及对多密文同时进行检索的情况，在用户节点众多的WBAN实际应用环境中存在一定局限性.

WBAN通常会面临需要对2个以上的密文进行匹配的情况，而传统的密文等值测试技术只能将多个密文两两分为一组，再对所有的分组逐个进行测试，在多用户环境下的密文检索效率较低.为提高密文等值测试技术在多密文测试时的计算效率，Susilo等人^[22]提出了一种支持多密文等值测试的公钥加密（public-key encryption with multi-ciphertext equality test, PKE-MET）方案，实现了对2个以上的密文同时进行匹配的功能.在PKE-MET方案中，每个参与多密文等值测试的数据拥有者都可以指定1个数字n，并将自己的密文与其他n−1个数据拥有者的密文进行匹配.PKE-MET在支持同时对多密文进行等值测试的同时，还支持对多个用户同时进行密文检索，当测试者接收到n个希望进行密文检索的数据用户分别上传的n个测试陷门时，才可以对数据拥有者的密文进行测试，实现了多数据用户同时进行密文匹配的功能.然而，PKE-MET方案中存在证书管理开销较大、无法对数据的来源进行认证等问题.

针对以上问题，本文提出了一种支持多密文等值测试的WBAN聚合签密方案.该方案的创新点主要包括3个方面：

1）基于身份签密体制.本文方案采用基于身份的签密体制，消除了传统公钥加密方案中存在的证书管理开销，确保了WBAN中医疗数据的机密性、完整性、可认证性与数据拥有者签名的不可伪造性.

2）支持多用户密文聚合签密.引入聚合签密技术，验证者可以实现对多个数据拥有者医疗数据密文的批量验证，提高了签密方案在多用户环境下的验证效率.

3）支持多密文等值测试.引入多密文等值测试技术，测试者可以利用数据用户上传的测试陷门同时对多个密文进行匹配，实现了多用户检索与多密文等值测试，降低了多用户环境下等值测试过程的计算开销.

1. 预备知识

1.1 困难问题

计算性Diffie-Hellman（computation Diffie-Hellman, CDH）问题：给定 $(P,aP,bP)$ ，其中 $a,b \in \mathbb{Z}_p^*$ ，计算 $abP$ .

1.2 克拉默法则

由含有 $n$ 个未知数 ${x_1},{x_2}, …,{x_n}$ 的 $n$ 个线性方程所组成的非齐次线性方程组

$\left\{ \begin{gathered} {a_{11}}{x_1} + {a_{12}}{x_2} + \cdots + {a_{1n}}{x_n} = {b_1} , \\ {a_{21}}{x_1} + {a_{22}}{x_2} + \cdots + {a_{2n}}{x_n} = {b_2} , \\ {\text{ }} \vdots \\ {a_{n1}}{x_1} + {a_{n2}}{x_2} + \cdots + {a_{nn}}{x_n} = {b_n} , \\ \end{gathered} \right.$

所对应的系数矩阵为

${\boldsymbol{A}} = \left({\begin{array}{*{20}{c}} {{a_{11}}}&{{a_{12}}}& \cdots &{{a_{1n}}} \\ {{a_{21}}}&{{a_{22}}}& \cdots &{{a_{2n}}} \\ \vdots & \vdots &{}& \vdots \\ {{a_{n1}}}&{{a_{n2}}}& \cdots &{{a_{nn}}} \end{array}} \right),$

矩阵A对应的行列式为

$\det ({\boldsymbol{A}}) = \left| {\begin{array}{*{20}{c}} {{a_{11}}}&{{a_{12}}}& \cdots &{{a_{1n}}} \\ {{a_{21}}}&{{a_{22}}}& \cdots &{{a_{2n}}} \\ \vdots & \vdots &{}& \vdots \\ {{a_{n1}}}&{{a_{n2}}}& \cdots &{{a_{nn}}} \end{array}} \right| \text{，}$

若 $\det ({\boldsymbol{A}}) \ne 0$ ，则该方程组有唯一解.

1.3 范德蒙矩阵与范德蒙行列式

形如

${\boldsymbol{V}} = \left( {\begin{array}{*{20}{c}} 1&{{a_1}}&{a_1^2}& \cdots &{a_1^{n - 1}} \\ 1&{{a_2}}&{a_2^2}& \cdots &{a_2^{n - 1}} \\ \vdots & \vdots & \vdots &{}& \vdots \\ 1&{{a_n}}&{a_n^2}& \cdots &{a_n^{n - 1}} \end{array}} \right)$

的矩阵称为范德蒙矩阵，其对应的范德蒙行列式 $\det ({\boldsymbol{V}})$ 具有如下计算性质：

$\det ({\boldsymbol{V}}) = \left| {\begin{array}{*{20}{c}} 1&{{a_1}}&{a_1^2}& \cdots &{a_1^{n - 1}} \\ 1&{{a_2}}&{a_2^2}& \cdots &{a_2^{n - 1}} \\ \vdots & \vdots & \vdots &{}& \vdots \\ 1&{{a_n}}&{a_n^2}& \cdots &{a_n^{n - 1}} \end{array}} \right| = \prod\limits_{1 \leqslant i \lt j \leqslant n} {({a_i} - {a_j})} .$

2. 本文方案

2.1 系统模型

本文提出的支持多密文等值测试的WBAN聚合签密方案的系统模型如图1所示，它包括6个实体：私钥生成器（private key generator, PKG）、云存储提供商、数据拥有者（即患者佩戴的无线传感器）、密文等值测试者、聚合者与数据用户（data user, DU）.

图 1 本文系统模型

Figure 1. The proposed system model

下载: 全尺寸图片幻灯片

各个实体具体介绍为：

1）私钥生成器.负责为WBAN中的数据拥有者和数据用户生成密钥.

2）云存储提供商.负责在云服务器中存储用户上传的医疗密文 $C{T_1}$ ， $C{T_2}$ ，…， $C{T_n}$ .

3）数据拥有者.即患者佩戴的无线传感器，负责对医疗数据进行签密并将医疗密文上传到云端存储.

4）测试者.对从云服务器下载的多个医疗密文执行等值测试操作，将测试结果返回给云服务器.

5）聚合者.负责对多个数据拥有者的医疗数据进行聚合签密，将聚合医疗密文上传到云端存储.

6）数据用户.即医生、医疗机构与数据处理中心等希望获取医疗密文的用户，负责将等值测试的陷门上传给测试者，并对从云服务器下载的医疗密文进行解密与认证.

2.2 安全目标

本文提出的支持多密文等值测试的聚合签密方案需要考虑2种类型的敌手，第1类敌手无法访问数据用户的测试陷门，第2类敌手可以获取数据用户的测试陷门.针对这2类敌手，本文提出的方案旨在达到的安全目标为：

1）医疗数据的机密性和完整性.WBAN中传输的大多是敏感的医疗数据，若患者的医疗数据在传输时中被恶意窃取或篡改，会造成严重后果.本文利用基于身份的加密体制，保证了所提方案在面对第1类攻击者时医疗数据的机密性与完整性.机密性指即使攻击者截取了传输的医疗密文也无法获取与明文相关的信息；完整性则指医疗数据在传输时中无法被敌手伪造或篡改.

2）数据拥有者签名的不可伪造性.本文新方案在对数据拥有者的签名的合法性进行验证的过程中，采用基于身份的签密体制，保证了在面对第1类攻击者时数据拥有者签名的不可伪造性，即攻击者不能伪造出合法的数据拥有者签名.

3）测试陷门的单向性.测试者通过数据用户上传的测试陷门对医疗密文进行等值测试操作，在测试过程中，需要保证面对第2类敌手时测试陷门满足单向性，即敌手无法通过测试陷门获取与参与测试的医疗数据明文相关的信息.

2.3 方案构造

2.3.1 系统初始化

给定安全参数 $k$ ，PKG选择大素数 $p$ ( $p \gt {2^k}$ )， $G$ 是阶为 $p$ 的循环加法群， $P$ 是 $G$ 的生成元.PKG随机选择 $s \in \mathbb{Z}_p^*$ 作为主密钥秘密保存，计算 ${P_{{\text{pub}}}} = sP$ 作为系统公钥，定义6个Hash函数： ${H_1}:{\{ 0,1\} ^*} \to \mathbb{Z}_p^*$ ， ${H_2}:{\{ 0,1\} ^*} \times G \to \mathbb{Z}_p^*$ ， ${H_3}:{\{ 0,1\} ^*} \times G \to \mathbb{Z}_p^*$ ， ${H_4}:G \to {\{ 0,1\} ^{{l_0} + {l_1}}}$ ， ${H_5}:{\{ 0,1\} ^*} \to \mathbb{Z}_p^*$ ， ${H_6}:{\{ 0,1\} ^*} \to {\{ 0,1\} ^k}$ ，其中 ${l_0}$ 是密文长度.输出系统参数 $params = \{ p,P,{P_{{\text{pub}}}},G,{H_1},{H_2},{H_3},{H_4},{H_5},{H_6}\}$ .

2.3.2 用户密钥提取

1）用户将 $I{D_i}$ 上传给PKG，PKG计算 ${Q_i} = {H_1}(I{D_i})$ ， $s{k_{i,1}} = s{Q_i}$ ；

2）PKG随机选择 ${x_i} \in \mathbb{Z}_p^*$ ，计算 $P{K_{i,1}}\; =\; {x_i}P$ ， $P{K_{i,2}}\; = {H_1}(I{D_i}||P{K_{i,1}})$ ， $s{k_{i,2}} = {x_i} + sP{K_{i,2}}$ ， $s{k_{i,3}} = {H_1}(I{D_i}||s)$ ， $P{K_{i,3}} = s{k_{i,3}}P$ ；

3）PKG输出公共参数 $P{K_i} = (P{K_{i,1}},P{K_{i,2}},P{K_{i,3}})$ 与私钥 $s{k_i} = (s{k_{i,1}},s{k_{i,2}},s{k_{i,3}})$ .

2.3.3 医疗数据签密及上传

给定参与密文等值测试与聚合签密的数据拥有者数量为 $n$ ，数据拥有者的身份标识为 $I{D_i}$ ，数据用户的身份标识为 $I{D_j}$ ，其中 $i,j \in \{ 1,2, \cdots ,n\}$ .数据拥有者执行1)~5)操作对 ${m_i}$ 进行签密：

1）随机选择 ${a_i},{b_i},{N_i} \in \mathbb{Z}_p^*$ ，计算 ${C_{i,1}} = {a_i}P$ ， ${C_{i,2}} = {b_i}P$ ， ${R_i} = {a_i}{Q_j}{P_{{\text{pub}}}}$ ；

2）计算 ${U_i} = {H_2}({m_i},I{D_i},I{D_j},{R_i},P{K_{i,1}},P{K_{j,1}})$ ， ${V_i} = {H_3} ({m_i},I{D_i},I{D_j},{R_i},P{K_{i,1}},P{K_{j,1}})$ ， ${v_i} = {a_i}{U_i} + s{k_{i,2}}{V_i}$ ， ${C_{i,3}} = {v_i}P$ ， ${C_{i,4}} = {H_4}({R_i}) \oplus ({m_i}||{v_i})$ ；

3）计算 ${f_{i,0}} = {H_5}({m_i}||n)$ ， ${f}_{i,1} = {H}_{5}({m}_{i}|\left|n\right||{f}_{i,0}),\cdots$ ， ${f_{i,n - 1}} = {H_5}({m_i}||n||{f_{i,0}}|| \cdots ||{f_{i,n - 2}})$ ；

4）计算 ${C_{i,5}} \;= \;{H_4}({b_i}P{K_{j,3}}) \;\oplus\; ({N_i}||f({N_i}))$ ， ${C_{i,6}}\; = \;{H_6} (n|| {C_{i,1}}|| \cdots ||{C_{i,5}}||{b_i}P{K_{j,3}}||{f_{i,0}}|| \cdots ||{f_{i,n - 1}})$ ，其中 $f({N_i}) = {f_{i,0}} + {f_{i,1}}{N_i} + {f_{i,2}}N_i^2 + \cdots + {f_{i,n - 1}}N_i^{n - 1}$ ；

5）将密文 $C{T_i} = ({t_i},{C_{i,1}},{C_{i,2}},{C_{i,3}},{C_{i,4}},{C_{i,5}},{C_{i,6}})$ 上传到云端存储，其中 ${t_i} = n$ .

2.3.4 多密文等值测试

$n$ 个数据用户分别将等值测试陷门 $t{k_j} = s{k_{j,3}}$ 发送给测试者，其中 $j \in \{ 1,2, \cdots ,n\}$ .测试者从云服务器分别下载 $n$ 个数据拥有者想要测试的密文 $C{T_1，CT_2，\cdots，CT_n}$ ，执行1）~3）多密文等值测试操作：

1）检查 ${t_1} = {t_2} = \cdots = {t_n} = n$ 是否成立，若成立测试者则继续执行以下操作，否则终止操作并输出“ $\bot$ ”；

2）对于 $i \in \{ 1,2, \cdots ,n\}$ ， $j \in \{ 1,2, \cdots ,n\}$ ，测试者分别计算 ${N_i}||f({N_i}) = {C_{i,5}} \oplus {H_4}({C_{i,2}}t{k_j})$ ，由签密算法有 $f({N_i}) = {f_{i,0}} + {f_{i,1}}{N_i} + {f_{i,2}}N_i^2 + \cdots + {f_{i,n - 1}}N_i^{n - 1}$ ，测试者将 $n$ 个等式合并得到方程组

$\left\{\begin{aligned} &f({N}_{1})={f}_{1,0}+{f}_{1,1}{N}_{1}+{f}_{1,2}{N}_{1}^{2}+\cdots +{f}_{1,n-1}{N}_{1}^{n-1}，\\ &f({N}_{2})={f}_{2,0}+{f}_{2,1}{N}_{2}+{f}_{2,2}{N}_{2}^{2}+\cdots +{f}_{2,n-1}{N}_{2}^{n-1}，\\ & \;\;\; \vdots \\ &f({N}_{n})={f}_{n,0}+{f}_{n,1}{N}_{n}+{f}_{n,2}{N}_{n}^{2}+\cdots +{f}_{n,n-1}{N}_{n}^{n-1}，\end{aligned}\right.$

并隐式设置 ${f_{i,k}} = {f_{j,k}}$ ，其中 $k \in \{ 0,1, \cdots ,n - 1\}$ ，测试者通过对该方程组对应的范德蒙矩阵求逆，获得方程组的唯一一组解 ${f_{1,0}},{f_{1,1}}, \cdots ,{f_{1,n - 1}}$ ；

3）检查等式 ${C_{i,6}} = {H_6}(n||{C_{i,1}}||{C_{i,2}}||{C_{i,3}}||{C_{i,4}}||{C_{i,5}}||{C_{i,2}}t{k_j}|| {f_{i,0}}||{f_{i,1}}|| \cdots ||{f_{i,n - 1}})$ 是否成立，若成立测试者则向云服务器输出测试结果为“1”，否则向云服务器输出测试结果为“0”.

2.3.5 医疗数据聚合签密及上传

若云服务器接收到的密文等值测试结果为“1”，代表 $n$ 个数据拥有者的医疗密文全部相同，云服务器将所有数据拥有者的医疗密文 $C{T}_{1}，C{T}_{2}，\cdots ，C{T}_{n}$ 发送给聚合者，聚合者执行1)~2)操作对医疗密文进行聚合签密：

1）计算 ${X_{{\text{agg}}}} = \displaystyle\sum\limits_{i = 1}^n {{C_{i,3}}}$ ；

2）将聚合医疗密文 ${\sigma _{{\text{agg}}}} = ({\{ C{T_i}\} _{i = 1,2, \cdots ,n}},{X_{{\text{agg}}}})$ 上传到云服务器存储.

2.3.6 医疗数据下载及解密

给定数据用户的身份标识为 $I{D_j}$ ，其中 $j \in \{ 1, 2, \cdots , n\}$ .数据用户从云端下载聚合医疗密文 ${\sigma _{{\text{agg}}}}$ ，对密文进行解密并验证数据来源.数据用户的具体操作如为：

1）计算 $R_{i}'= sk_{j,1} C_{i,1}$ ， $m_i'||v_i' = {C_{i,4}} \oplus {H_4}(R_i')$ ；

2）根据 $m_i'$ 的值计算 ${f}_{i,0}'\;=\;{H}_{5}({m}_{i}'||n)，$ $f_{i,1}^{{'} }\; =\; {H_5}(m_i^{{'} }||n|| f_{i,0}^{{'} }) ，\cdots$ ， $f_{i,n - 1}^{'} = {H_5}(m_i'||n||f_{i,0}'||, \cdots ||f_{i,n - 2}^{{'} })$ ， $N_i^{{'} }||f(N_i^{{'} }) = {C_{i,5}} \oplus {H_4} ({C_{i,2}}s{k_{j,3}})$ ；

3）计算 $U_i^{{'} } = {H_2}(m_i^{{'} },I{D_i},I{D_j},R_i^{{'} },P{K_{i,1}},P{K_{j,1}})$ ， $V_i' = {H_3} (m_i', \; I{D_i},\;I{D_j},\;R_i',\;P{K_{i,1}},\;P{K_{j,1}})$ ， $X_{{\text{agg}}}' = \displaystyle\sum\limits_{i = 1}^n {v_i'P}$ ， $X_{{\text{agg}}}^*= \displaystyle\sum\limits_{i = 1}^n {U_i'{C_{i,1}} +} \displaystyle\sum\limits_{i = 1}^n {V_i'P{K_{i,1}} + }\displaystyle\sum\limits_{i = 1}^n {V_i'P{K_{i,2}}{P_{{\text{pub}}}}}$ ；

4）分别检查等式 ${C_{i,6}}\; =\; {H_6}(n||{C_{i,1}}||{C_{i,2}}||{C_{i,3}}||{C_{i,4}}||{C_{i,5}}|| {C_{i,2}}s{k_{j,3}}|| f_{i,0}'||f_{i,1}'|| \cdots ||f_{i,n - 1}')$ ， $X_{{\text{agg}}}^* = X_{{\text{agg}}}'$ ， $f(N_i') = f_{i,0}' + {f_{i,1}'N_i'} +\cdots+ f_{i,n-1}'N_i^{{'}n-1}$ 是否同时成立.

若以上等式均成立，数据用户则接收医疗数据 $m_i'$ ；否则输出“ $\bot$ ”.

3. 正确性分析与安全性证明

3.1 正确性分析

1）解密等式的正确性

数据用户通过计算 $m_i'||v_i' = {C_{i,4}} \oplus {H_4}(R_i')$ 对密文进行解密，其中 $R_i' = s{k_{j,1}}{C_{i,1}}$ ， $s{k_{j,1}}$ 是数据用户的私钥，由于 $s{k_{j,1}} = s{Q_j}$ ，则有

$R_i' = s{k_{j,1}}{C_{i,1}} = s{k_{j,1}}{a_i}P = s{Q_j}{a_i}P = {a_i}{Q_j}{P_{{\text{pub}}}} = {R_i} \text{，}$

即 $R_i' = {R_i}$ ，从而有

$m_i'||v_i' = {C_{i,4}} \oplus {H_4}(R_i') = {H_4}({R_i}) \oplus ({m_i}||{v_i}) \oplus {H_4}(R_i') = {m_i}||{v_i}{\kern 1pt} .$

因此，本文方案满足密文解密等式的正确性.

2）签名验证等式的正确性

数据用户通过判断等式 $X_{{\text{agg}}}^* = X_{{\text{agg}}}'$ 是否成立以验证聚合密文签名的合法性，其中 $X_{{\text{agg}}}' = \displaystyle\sum\limits_{i = 1}^n {v_i'P}$ ， ${v_i'} = {a_i}{U_i} +s{k_{i,2}}{V_i}$ ， $s{k_{i,2}} = {x_i} + sP{K_{i,2}}$ ，则有

$\begin{aligned} X_{{\text{agg}}}' = &\sum\limits_{i = 1}^n {v_i'P} = \sum\limits_{i = 1}^n {{a_i}{U_i}P + \sum\limits_{i = 1}^n {s{k_{i,2}}{V_i}P} } = \\ &\sum\limits_{i = 1}^n {{a_i}{U_i}P + \sum\limits_{i = 1}^n {{x_i}{V_i}P + \sum\limits_{i = 1}^n {sP{K_{i,2}}{V_i}P} } } ,\end{aligned}$

结合 ${C_{i,1}} = {a_i}P$ ， $P{K_{i,1}} = {x_i}P$ ， ${P_{{\text{pub}}}} = sP$ ，从而有

$X_{{\text{agg}}}' = \sum\limits_{i = 1}^n {{U_i}{C_{i,1}} + } \sum\limits_{i = 1}^n {{V_i}P{K_{i,1}} + } \sum\limits_{i = 1}^n {{V_i}P{K_{i,2}}{P_{{\text{pub}}}}}.$

进一步，由解密等式的正确性可知 $m_i'||v_i' = {m_i}||{v_i}$ ，则有

$\begin{aligned} {U_i} =\;& {H_2}({m_i},I{D_i},I{D_j},{R_i},P{K_{i,1}},P{K_{j,1}})= \\ & {H_2}(m_i',I{D_i},I{D_j},R_i',P{K_{i,1}},P{K_{j,1}}) =U_i',\\ {V_i} = & {H_3}({m_i},I{D_i},I{D_j},{R_i},P{K_{i,1}},P{K_{j,1}}) =\\ &{H_3}(m_i',I{D_i},I{D_j},R_i',P{K_{i,1}},P{K_{j,1}}) = V_i', \end{aligned}$

即 ${U_i} = U_i'$ ， ${V_i} = V_i'$ ，于是有

$\begin{aligned} X_{{\text{agg}}}' = \;& \sum\limits_{i = 1}^n {{U_i}{C_{i,1}} + } \sum\limits_{i = 1}^n {{V_i}P{K_{i,1}} + } \sum\limits_{i = 1}^n {{V_i}P{K_{i,2}}{P_{{\text{pub}}}}} = \\ &\sum\limits_{i = 1}^n {U_i^{'}{C_{i,1}} + } \sum\limits_{i = 1}^n {V_i'P{K_{i,1}} + } \sum\limits_{i = 1}^n {V_i'P{K_{i,2}}{P_{{\text{pub}}}}} = X_{{\text{agg}}}^* \text{，} \end{aligned}$

即 $X_{{\text{agg}}}^* = X_{{\text{agg}}}'$ 成立.因此，本文所提的新方案满足签名验证等式的正确性.

3）等值测试结果的正确性

对 $i \in \{ 1,2, \cdots ,n\}$ ， $j \in \{ 1,2, \cdots ,n\}$ ，测试者通过检查 ${C_{i,6}} = {H_6}(n||{C_{i,1}}|| \cdots ||{C_{i,5}}||{C_{i,2}}t{k_j}||{f_{i,0}}|| \cdots ||{f_{i,n - 1}})$ 是否成立来判断 $n$ 个医疗密文是否相同，其中 ${f_{i,0}}\; =\; {H_5} ({m_i}|| n), \cdots ,$ ${f_{i,n - 1}} = {H_5}({m_i}||n||{f_{i,0}}|| \cdots ||{f_{i,n - 2}})$ .假设所有参与密文等值测试的医疗密文全部相同，即 ${m_1} = {m_2} = \cdots = {m_n}$ ，则有

$\begin{aligned} {H}_{5}({m}_{1}||n)={H}_{5}({m}_{2}||n)=\; &\cdots ={H}_{5}({m}_{n}||n)，\\ {H}_{5}({m}_{1}|\left|n\right||{f}_{1,0})={H}_{5}({m}_{2}|\left|n\right|| & {f}_{1,0})= \cdots ={H}_{5}({m}_{n}|\left|n\right||{f}_{1,0})，\\ &\vdots\\ {H}_{5}({m}_{1}||n||{f}_{1,0}||\cdots ||{f}_{1,n-2})= & {H}_{5}({m}_{1}||n||{f}_{2,0}||\cdots ||{f}_{2,n-2})=\cdots=\\ {H}_{5}({m}_{n}||n||{f}_{n,0}||&\cdots ||{f}_{n,n-2})， \end{aligned}$

即对于所有的 $i,j \in \{ 1,2, \cdots ,n\}$ ， $k \in \{ 0,1, \cdots ,n - 1\}$ ，等式 ${f_{i,k}} = {f_{j,k}}$ 均成立.

由医疗数据签密及上传算法可知，数据拥有者在签密过程中设置

$f({N_i}) = {f_{i,0}} + {f_{i,1}}{N_i} + {f_{i,2}}N_i^2 + \cdots + {f_{i,n - 1}}N_i^{n - 1},$

由此可以得到方程组

$\left\{\begin{aligned} f({N}_{1})&={f}_{1,0}+{f}_{1,1}{N}_{1}+{f}_{1,2}{N}_{1}^{2}+\cdots +{f}_{1,n-1}{N}_{1}^{n-1}，\\ f({N}_{2})&={f}_{2,0}+{f}_{2,1}{N}_{2}+{f}_{2,2}{N}_{2}^{2}+\cdots +{f}_{2,n-1}{N}_{2}^{n-1}，\\ & \vdots \\ f({N}_{n})&={f}_{n,0}+{f}_{n,1}{N}_{n}+{f}_{n,2}{N}_{n}^{2}+\cdots +{f}_{n,n-1}{N}_{n}^{n-1}，\end{aligned}\right.$

结合 ${f_{i,k}} = {f_{j,k}}$ ，因此可将 ${f_{1,0}},{f_{1,1}}, \cdots ,{f_{1,n - 1}}$ 作为方程组的解，将随机数 ${N_i}$ 作为方程组的系数，则该方程组对应的矩阵为

${\boldsymbol{V}} = \left({\begin{array}{*{20}{c}} 1&{{N_1}}&{N_1^2}& \cdots &{N_1^{n - 1}} \\ 1&{{N_2}}&{N_2^2}& \cdots &{N_2^{n - 1}} \\ \vdots & \vdots & \vdots &{}& \vdots \\ 1&{{N_n}}&{N_n^2}& \cdots &{N_n^{n - 1}} \end{array}} \right) ,$

由范德蒙矩阵的性质可知其对应的行列式为 $\det ({\boldsymbol{V}}) = \displaystyle\prod\limits_{1 \leqslant i \lt j \leqslant n} {({N_i} - {N_j})}$ .

从数据拥有者签密过程可知， ${N_i}$ 是由 $n$ 个不同的数据拥有者在对医疗密文进行签密时分别选择的随机数，因此 $\det ({\boldsymbol{V}}) = 0$ 的概率仅为 ${[p(p - 1) \cdots (p - n + 1)]^{ - 1}}$ ，其中 $p$ 为群 $\mathbb{Z}_p^*$ 的阶.由克拉默法则可知当 $\det ({\boldsymbol{V}}) \ne 0$ 时，方程组有且仅有唯一解 ${f_{1,0}},{f_{1,1}}, \cdots ,{f_{1,n - 1}}$ ，于是有对于所有的 $i,j \in \{ 1,2, \cdots ,n\}$ ， $k \in \{ 0,1, \cdots ,n - 1\}$ ，等式 ${f_{i,k}} = {f_{j,k}}$ 均成立，与所有参与密文等值测试的医疗密文全部相同的假设相符.因此，本文新方案满足多密文等值测试结果的正确性.

3.2 安全性证明

本文提出的方案引入了基于身份的聚合签密体制，确保了本文方案在面对第1类敌手时医疗数据的机密性与签名的存在不可伪造性，对于机密性与不可伪造性的证明过程可以参考文献[23]方案.同时，本文方案满足面对第2类敌手适应性选择密文攻击下的单向性（one-way against adaptive chosen ciphertext attack, OW-CCA2），以下通过定理1证明本文方案满足OW-CCA2安全.

定理1. 假设CDH问题是难解的，则本文方案在随机预言模型下对第2类敌手是OW-CCA2安全的.

证明.假设 $\mathcal{C}$ 是能够解决CDH困难问题的人， ${\mathcal{A}_2}$ 代表第2类敌手. $\mathcal{C}$ 以 ${\mathcal{A}_2}$ 为子程序充当以下游戏中的挑战者，若 ${\mathcal{A}_2}$ 能以不可忽略的优势在概率多项式时间内的游戏中获胜，则 $\mathcal{C}$ 能够在概率多项式时间内解决CDH困难问题.

初始化阶段.CDH问题的输入为 $(P,aP,bP)$ ，其中 $a,b \in \mathbb{Z}_p^*$ ， $\mathcal{C}$ 的目标是给出CDH困难问题的解 $abP$ . $\mathcal{C}$ 选取阶为素数 $p$ 的循环群 $G$ ，计算 $P$ 为 $G$ 的生成元，随机选择 $a \in \mathbb{Z}_p^*$ 并计算 $P_{{\text{pub}}}' = aP$ .最后，输出系统参数 $params=\{p,P,{P}_{\text{pub}},G,{H}_{1},{H}_{2},{H}_{3},{H}_{4}，{H}_{5},{H}_{6}\}$ ，将 $a$ 秘密保存并发送 $params$ 给 ${\mathcal{A}_2}$ .

询问阶段1.为了响应 ${\mathcal{A}_2}$ 的询问， $\mathcal{C}$ 维持列表 ${L}_{1}， {L}_{2}，{L}_{3}，{L}_{4}，{L}_{5}，{L}_{6}，{L}_{\text{td}}$ 分别用于跟踪 ${\mathcal{A}_2}$ 的 ${H_1}$ Hash询问、 ${H_2}$ Hash询问、 ${H_3}$ Hash询问、 ${H_4}$ Hash询问、 ${H_5}$ Hash询问、 ${H_6}$ Hash询问、测试陷门询问. ${L_1}$ 同时用于跟踪密钥提取询问，开始时每个列表都为空.

1） ${H_1}$ Hash询问.当 $\mathcal{C}$ 收到 ${\mathcal{A}_2}$ 对 ${H_1}(I{D_i},{Q_i})$ 的查询，若 $I{D_i} \in \{ I{D_i}\} _{i = 1}^n$ ，则计算 $P{K_{i,1}} = {x_i}P$ ，其中 ${x_i}$ 是未知的， $\mathcal{C}$ 保存 $( \bot ,{Q_i},I{D_i})$ 到 ${L_1}$ ；若 $i \ne 1$ ， $\mathcal{C}$ 随机选择 ${x_i},P{K_{i,2}} \in \mathbb{Z}_p^*$ 并设置 $P{K_{i,1}} = {x_i}P$ ，将 $P{K_{i,2}} = {H_1}(I{D_i}||P{K_{i,1}})$ 返回给 ${\mathcal{A}_2}$ 并保存 $({x_i},P{K_{i,1}},P{K_{i,2}},I{D_i})$ 到 ${L_1}$ .

2） ${H_2}$ Hash询问.当 $\mathcal{C}$ 收到 ${\mathcal{A}_2}$ 对 $({m_i},I{D_i},I{D_j},{R_i}, P{K_{i,1}},P{K_{j,1}},{U_i})$ 的查询后， $\mathcal{C}$ 首先在 ${L_2}$ 查找是否已有 $({m_i}, I{D_i},I{D_j},{R_i},P{K_{i,1}},P{K_{j,1}},{U_i},{t_i},{t_i}P)$ ，若 ${L_2}$ 已有 $({m_i},I{D_i}, I{D_j},{R_i},P{K_{i,1}},P{K_{j,1}},{U_i},{t_i},{t_i}P)$ ，则发送 ${U_i}$ 给 ${\mathcal{A}_2}$ ；否则， $\mathcal{C}$ 选取 ${U_i} \in \mathbb{Z}_p^*$ ，将 $({U_i},{t_i},{t_i}P)$ 加入到 ${L_2}$ 中并输出 ${t_i}P$ .

3） ${H_3}$ Hash询问.当 $\mathcal{C}$ 收到 ${\mathcal{A}_2}$ 对 $({m_i},I{D_i},I{D_j},{R_i}, P{K_{i,1}}, P{K_{j,1}},{V_i})$ 的查询后， $\mathcal{C}$ 首先在 ${L_3}$ 查找是否已有 $({m_i}, I{D_i}, I{D_j},{R_i},P{K_{i,1}},P{K_{j,1}},{V_i},{w_i},{w_i}P)$ ，若 ${L_3}$ 已有 $({m_i},I{D_i}, I{D_j},{R_i},P{K_{i,1}},P{K_{j,1}},{V_i},{w_i},{w_i}P)$ ，则返回 ${V_i}$ 给 ${\mathcal{A}_2}$ ；否则， $\mathcal{C}$ 选取 ${V_i} \in \mathbb{Z}_p^*$ ，将 $({V_i},{w_i},{w_i}P)$ 加入到 ${L_3}$ 中并输出 ${w_i}P$ .

4） ${H_4}$ Hash询问.当 $\mathcal{C}$ 收到 ${\mathcal{A}_2}$ 对 $({R_i},{H_4}({R_i}))$ 的查询后，若在 ${L_4}$ 中已有 $({R_i},{H_4}({R_i}))$ 则返回 ${H_4}({R_i})$ 给 ${\mathcal{A}_2}$ ；否则， $\mathcal{C}$ 选取 ${H_4}({R_i}) \in {\{ 0,1\} ^{{l_0} + {l_1}}}$ ，并将 $({R_i},{H_4}({R_i}))$ 加入到 ${L_4}$ 中且输出 ${H_4}({R_i})$ .

5） ${H_5}$ Hash询问.当 $\mathcal{C}$ 收到 ${\mathcal{A}_2}$ 对 ${f_{i,d}}$ 的查询，其中 $d \in \{ 1,2, \cdot \cdot \cdot n\}$ ，若 ${L_5}$ 存在 $({m_i},n,{f_{i,0}}, \cdot \cdot \cdot ,{f_{i,d - 2}},{f_{i,d}})$ 则返回 ${f_{i,d}}$ 给 ${\mathcal{A}_2}$ ；否则， $\mathcal{C}$ 选取 ${f_{i,*}} \in \mathbb{Z}_p^*$ ，将 $({m_i},n,{f_{i,0}}, \cdot \cdot \cdot ,{f_{i,d - 2}},{f_{i,d}})$ 加入到 ${L_5}$ 中并输出 ${f_{i,d}}$ .

6） ${H_6}$ Hash询问.当 $\mathcal{C}$ 收到 ${\mathcal{A}_2}$ 对 ${C_{i,6}}$ 的查询后，若在 ${L_6}$ 中已有 ${C_{i,6}}$ 则返回 ${C_{i,6}}$ 给 ${\mathcal{A}_2}$ ；否则， $\mathcal{C}$ 选取 ${C_{i,6}} \in {\{ 0,1\} ^k}$ ，将相应元组加入到 ${L_6}$ 中并输出 ${C_{i,6}}$ .

7）密钥提取询问.当 $\mathcal{C}$ 收到 ${\mathcal{A}_2}$ 对 $I{D_i}$ 的私钥的查询后， $\mathcal{C}$ 首先查询 ${L_1}$ 中是否存在 $({x_i},P{K_{i,1}},P{K_{i,2}},I{D_i})$ ，若不存在则输出“ $\bot$ ”；否则返回 $({x_i},P{K_{i,1}},*,*)$ .如果 $I{D_i} \notin \{ I{D_i}\} _{i = 1}^n$ ， $\mathcal{C}$ 将 $I{D_i}$ 作为 ${H_1}$ Hash询问的输入，得到 ${Q_i} = {H_0} (I{D_i})$ ，并计算 $s{k_{i,1}} = a{Q_i}$ ， $s{k_{i,2}} = {x_i} + aP{K_{i,2}}$ ，返回 $(P{K_{i,1}}, s{k_{i,1}}, P{K_{i,2}},I{D_i})$ 给 ${\mathcal{A}_2}$ .

8）公钥替换询问.当 $\mathcal{C}$ 收到 ${\mathcal{A}_2}$ 对 $(I{D_i},P{K_{i,1}},P{K_{i,2}})$ 的查询后，若 $({x_i},P{K_{i,1}},P{K_{i,2}},I{D_i})$ 已存在于 ${L_1}$ 中，则 $\mathcal{C}$ 用列表L₁中的 $(P{K_{i,1}},P{K_{i,2}})$ 替换 $I{D_i}$ 原有的公钥 $(P{K_{i,1}}, P{K_{i,2}})$ ；否则， $\mathcal{C}$ 将 $({x_i},P{K_{i,1}}, P{K_{i,2}},I{D_i})$ 加入到列表 ${L_1}$ 中.

9）签密询问.当 $\mathcal{C}$ 收到 ${\mathcal{A}_2}$ 对 $({m_i},I{D_i},I{D_j})$ 的询问后， $\mathcal{C}$ 执行①~②操作：

① 若 $I{D_i} \ne I{D_l}$ 且 ${\mathcal{A}_2}$ 没有对 $I{D_i}$ 的公钥执行过替换询问， $\mathcal{C}$ 通过 ${H_1}$ Hash询问与密钥提取询问分别获取 ${x_i}$ 和 $s{k_{i,2}}$ ，并对 ${m_i}$ 进行签密；若 $I{D_i}$ 对应的公钥被替换过， $\mathcal{C}$ 首先通过 ${H_1}$ 询问分别获取 $(P{K_{i,1}},P{K_{i,2}})$ 和 $(P{K_{j,1}},P{K_{j,2}})$ ，然后 $\mathcal{C}$ 利用随机数 ${a_i} \in \mathbb{Z}_p^*$ 计算 ${C_{i,1}} = {a_i}P$ ， ${R_i} = {a_i}{Q_j}P_{{\text{pub}}}'$ ，并通过 ${H_2}$ ， ${H_3}$ ， ${H_4}$ Hash询问分别获取 ${U_i} = {H_2}({m_i}, I{D_i}, I{D_j}, {R_i},P{K_{i,1}},P{K_{j,1}})$ ， ${V_i} = {H_3}({m_i},I{D_i},I{D_j},{R_i},P{K_{i,1}},P{K_{j,1}})$ . ${H_4} ({R_i})$ ，通过密钥提取询问获取私钥 $s{k_{i,2}}$ ，计算 ${v_i} = \ {a_i}{U_i} + s{k_{i,2}}{V_i}$ ， ${C_{i,3}} = {v_i}P$ ， ${C_{i,4}} = {H_4}({R_i}) \oplus ({m_i}||{v_i})$ ，最后输出密文 ${\sigma _i} = ({C_{i,1}},{C_{i,2}},{C_{i,3}},P{K_{i,1}})$ 给 ${\mathcal{A}_2}$ .

② 若 $I{D_i} = I{D_l}$ ， $\mathcal{C}$ 首先通过 ${H_1}$ 询问分别获取 $(P{K_{i,1}}, P{K_{i,2}})$ 和 $(P{K_{j,1}},P{K_{j,2}})$ ，随机选择 $y,z \in \mathbb{Z}_p^*$ 并计算 ${C_{i,1}} = zaP$ .然后 $\mathcal{C}$ 通过 ${H_1}$ Hash询问和 ${H_4}$ Hash询问分别获取 $(I{D_j}, {a_j})$ 和 ${H_4}({R_j})$ ，并计算 ${R_j} = {a_j}{Q_j}P_{{\text{pub}}}'$ ， ${U_j} = {H_2}({m_l},I{D_l},I{D_j}, {R_j}, P{K_{l,1}},P{K_{j,1}})$ ，将 $({m_l},I{D_l},I{D_j},{R_j},P{K_{l,1}},P{K_{j,1}},{U_j})$ 加入到 ${L_2}$ 中，通过 ${H_3}$ Hash询问获取 $({m_l},I{D_l},I{D_j},{R_l},P{K_{l,1}}, P{K_{j,1}}, {V_l},{w_l},{w_l}P)$ ，并计算 ${v_l} = y{U_l}$ ， ${C_{l,3}} = z{v_l}P_{{\text{pub}}}' + {w_l}P{K_{l,1}}$ ， ${C_{i,4}} = {H_4} ({R_l}) \oplus ({m_l}||{v_l})$ ，最后输出 ${\sigma _l} = ({C_{l,1}},{C_{l,2}},{C_{l,3}},P{K_{l,1}})$ 给 ${\mathcal{A}_2}$ .

10）解签密询问.当 $\mathcal{C}$ 收到 ${\mathcal{A}_2}$ 对 $(C{T_1},C{T_2}, \cdot \cdot \cdot , C{T_n}, \{ I{D_i}\} _{i = 1}^n,I{D_j})$ 的查询后， $\mathcal{C}$ 执行①~②操作：

① 对 $(I{D_1},I{D_2}, \cdot \cdot \cdot ,I{D_n},I{D_j})$ 分别执行 ${H_1}$ Hash询问以获取 $({Q_1},{Q_2}, \cdot \cdot \cdot ,{Q_n},{Q_j})$ ， $(P{K_{1,1}},P{K_{2,1}}, \cdot \cdot \cdot ,P{K_{n,1}}, P{K_{j,1}})$ ，然后 $\mathcal{C}$ 执行聚合签名验证算法，若验证未通过，则输出“ $\bot$ ”后终止模拟；否则继续执行后续操作.

② 若 $I{D_j} \ne I{D_l}$ ， $\mathcal{C}$ 则通过 ${H_1}$ Hash询问获取 $(I{D_j}, {a_j})$ 并计算 ${R_j} = {a_j}{C_{j,1}}$ ，检查 ${L_2}$ 中是否存在元组 $(*,I{D_j},{R_i}, P{K_{i,1}},P{K_{j,1}},{U_i})$ ，若存在，则 $\mathcal{C}$ 利用Hash值 ${U_i}$ 对密文进行解密；否则 $\mathcal{C}$ 随机选取 ${U_i} \in \mathbb{Z}_p^*$ 并用 ${U_i}$ 对密文进行解密.若 $I{D_j} = I{D_l}$ ， $\mathcal{C}$ 则在 ${L_2}$ 中查询是否存在元组 $(*,I{D_j},*, P{K_{i,1}},P{K_{j,1}},{U_i})$ ，若存在则利用Hash值 ${U_i}$ 对密文进行解密；否则将随机选取 ${U_i} \in \mathbb{Z}_p^*$ 并用 ${U_i}$ 对密文进行解密.

11）测试陷门询问.当 $\mathcal{C}$ 收到 ${\mathcal{A}_2}$ 对 $t{k_j}$ 的询问后，若 ${L_1}$ 中存在元组 $({x_i},P{K_{i,1}},P{K_{i,2}},I{D_i})$ ， $\mathcal{C}$ 通过 ${H_1}$ 询问获取 $s{k_{i,3}} ={H_1}(I{D_i}||s)$ 并返回 $t{k_j} = s{k_{i,3}}$ 给 ${\mathcal{A}_2}$ ；否则， $\mathcal{C}$ 选取 $t{k_j} \in \mathbb{Z}_p^*$ 发送给 ${\mathcal{A}_2}$ ，并将 $({x_i},P{K_{i,1}},P{K_{i,2}},I{D_i})$ 加入到 ${L_{{\text{td}}}}$ 中.

挑战阶段. ${\mathcal{A}_2}$ 输出2个消息 $m_0^* = \{ m_{i,0}^*\} _{i = 1}^n$ ， $m_1^* = \{ m_{i,1}^*\} _{i = 1}^n$ ，并输出身份 $\{ ID_i^*\} _{i = 1}^n$ 和 $ID_j^*$ ； $\mathcal{C}$ 以 $ID_j^*$ 作为输入进行 ${H_1}$ Hash询问，若 ${L_1}$ 中不存在与 $ID_j^*$ 相关的元组，则 $\mathcal{C}$ 挑战失败；否则， $\mathcal{C}$ 从 ${L_1}$ 中获取 $\{ ID_i^*\} _{i = 1}^n$ 对应的公钥 $\{ PK_{i,1}^*,PK_{i,2}^*\} _{i = 1}^n$ ，随机选择 $\{ s{k_{i,2}} \in \mathbb{Z}_p^*\} _{i = 1}^n$ 并计算 $\{ {C_{i,1}} = s{k_{i,2}}cP\} _{i = 1}^n$ ；然后 $\mathcal{C}$ 从 ${L_2}$ ， ${L_3}$ 中获取 $\{ {U_i}\} _{i = 1}^n$ ， $\{ {V_i}\} _{i = 1}^n$ ，并计算 $v_i^* = {a_i}{U_i} + s{k_{i,2}}{V_i} = {t_i}C_{i,1}^* + s{k_{i,2}}{w_i}PK_{i,1}^*$ ，其中 ${t_i}$ ， ${w_i}$ ， $s{k_{i,2}}$ 分别来自 ${H_2}$ Hash询问、 ${H_3}$ Hash询问与对 $ID_j^*$ 的密钥提取询问；随后 $\mathcal{C}$ 随机选择 $\mu \in \{ 0,1\}$ 并计算 $C_{i,4}^* = {H_4}({R_i}) \oplus ({m_{i,\mu }}||v_i^*)$ ， $C_{i,3}^* = v_i^*P$ ，然后通过 ${H_1}$ Hash询问获取公钥 $\{ PK_{i,1}^*\} _{i = 1}^n$ 并输出 ${\sigma ^*} = (C_{1,1}^*, \cdot \cdot \cdot ,C_{n,1}^*,C_{1,3}^*, \cdot \cdot \cdot ,C_{n,3}^*,C_{1,4}^*, \cdot \cdot \cdot ,C_{n,4}^*,PK_{1,1}^*, \cdot \cdot \cdot ,PK_{n,1}^*)$ 给 ${\mathcal{A}_2}$ .

询问阶段2. ${\mathcal{A}_2}$ 执行与询问阶段1类似的多项式有界次适应性查询，但不允许对 $ID_i^*$ 和 $ID_j^*$ 对应的密文进行解签密查询.

猜测阶段. ${\mathcal{A}_2}$ 输出1个对 $\mu$ 的猜测 $\mu {'} \in \{ 0,1\}$ ，如果 $\mu {'} = \mu$ ，则 ${\mathcal{A}_2}$ 在以上游戏中获胜. $\mathcal{C}$ 在列表 ${L_4}$ 中选取 $({R_i},{H_4}({R_i}))$ 并以 ${R_i} = abP$ 作为CDH困难问题的解，这与目前公认的CDH问题的难解性相矛盾.因此本文方案在面对A₂敌手时满足选择OW-CCA2安全. 证毕.

4. 对比分析

4.1 功能特性分析

将本文提出的方案与文献[22−26]方案在功能特性方面进行比较，对比结果如表1所示.与文献[23−24]方案相比，本文方案引入等值测试功能，实现了对存储在云端的医疗密文的安全检索.与文献[22,25−26]方案相比，本文方案引入了聚合签密技术，确保了WBAN中医疗数据的机密性、完整性与可认证性，提高了多用户环境下对医疗数据进行签密与验证的效率.文献[25−26]方案采用的等值测试方法只能对2个密文进行比较，本文方案实现了同时对多个密文进行匹配，降低了测试者执行密文等值测试时的开销.此外，与文献[22−23,25−26]方案相比，本文方案达到了适应性选择密文攻击下的单向性，安全性有所提升.

表 1 功能特性比较

Table 1. Comparison of Functional Characteristics

方案	等值测试	多密文等值测试	签密	聚合签密	安全性
文献[22]方案	√	√	×	×	选择明文攻击下的单向性
文献[23]方案	×	×	√	√	选择密文攻击下的不可区分性
文献[24]方案	×	×	√	√	适应性选择密文攻击下的不可区分性
文献[25]方案	√	×	×	×	选择密文攻击下的单向性
文献[26]方案	√	×	√	×	选择密文攻击下的单向性
本文方案	√	√	√	√	适应性选择密文攻击下的单向性
注：“×”表示不具有某种特定功能；“√”表示具有某种特定功能.

下载: 导出CSV

| 显示表格

4.2 范德蒙矩阵求逆算法复杂度分析

本文所提新方案在执行多密文等值测试算法时，测试者通过对范德蒙矩阵求逆以提取出与数据拥有者明文相关的系数.其中，n阶范德蒙矩阵求逆算法的时间复杂度取决于所使用的求逆方法，已有许多学者提出了求解范德蒙矩阵逆矩阵的串行^[27-28]与并行^[29-30]方法，其时间复杂度如表2所示：

表 2 范德蒙矩阵求逆算法复杂度

Table 2. Complexity of Inversion for Vandermonde Matrix

方案	时间复杂度
文献[27]方案	$O({n^2})$
文献[28]方案	$O({n^2})$
文献[29]方案	$O((\log n))$
文献[30]方案	$O({(\log n)^2})$

下载: 导出CSV

| 显示表格

4.3 计算开销分析

将本文提出的方案在计算时间开销方面与文献[25−26]方案进行对比，假设参与密文等值测试的用户数量为n，使用i7-8750h，2.20 GHz处理器，8 GB内存和Win10操作系统在VC6.0环境下用PBC库分别对本文方案与对比方案进行了仿真模拟，对比结果如表3所示.其中标量乘法运算时间T_sm = 0.0004 ms，群元素乘法运算时间T_mul = 0.0314 ms，Hash函数运算时间T_h = 0.0001 ms，指数运算时间T_e = 6.9866 ms，双线性配对时间T_bp = 9.6231 ms，范德蒙矩阵求逆时间T_inv取决于矩阵求逆方法.从表3可以看出，由于本文方案中不存在计算开销较大的双线性配对运算，因此在密文生成阶段的计算时间开销相比于文献[25−26]的方案有显著降低.在数据解密及验证阶段，非聚合模式下的文献[25−26]方案需要所有数据用户逐一对数据进行验证并解密，而本文方案中的数据用户能够对聚合密文进行批量验证，验证效率相比于文献[25−26]的方案有所提高.

表 3 计算量比较

Table 3. Computation Amount Comparison ms

方案	密文生成时间	密文等值测试时间	数据解密及验证时间
文献[25]方案	$\begin{aligned} & n{T_{ {\text{mul} } } } + 3n{T_{ {\text{bp} } } } + 6n{T_{\text{h} } } + 5n{T_{\text{e} } } \\ &\quad( 63.8343n )\end{aligned}$	$\begin{aligned} & (n - 1)(4{T_{ {\text{bp} } } } + 2{T_{\text{h} } }) \\ &\quad ( 38.4926n - 38.4926) \end{aligned}$	$\begin{aligned} & 2n{T_{ {\text{bp} } } } + 4n{T_{\text{h} } } + 2n{T_{{\rm{e}} } }\\ &\quad (33.2198n) \end{aligned}$
文献[26]方案	$\begin{aligned} & 6n{T_{ {\text{sm} } } } + 2n{T_{ {\text{bp} } } } + 7n{T_{\text{h} } } + 2n{T_{\text{e} } } \\ &\quad( 33.2250n) \end{aligned}$	$\begin{aligned} & (n - 1)(4{T_{ {\text{bp} } } } + 2{T_{\text{h} } }) \\ &\quad( 38.4926n - 38.4926) \end{aligned}$	$\begin{aligned}& 3n{T_{ {\text{sm} } } } + n{T_{ {\text{mul} } } } + 5n{T_{ {\text{bp} } } } + 5n{T_{\text{h} } }\\ &\quad ( 48.1486n )\end{aligned}$
本文方案	$\begin{aligned} & 7n{T_{ {\text{sm} } } } + n{T_{ {\text{mul} } } } + n(n + 4){T_{\text{h} } }\\ &\quad ( 0.0346n + 0.0001{n^2})\end{aligned}$	$\begin{aligned} & n{T_{ {\text{sm} } } } + 2n{T_{\text{h} } } + {T_{ {\text{inv} } } }\\ &\quad ( {T_{ {\text{inv} } } } + 0.0006n) \end{aligned}$	$\begin{aligned} & n(2 + 4n){T_{ {\text{sm} } } } + {n^2}{T_{ {\text{mul} } } } + n(n + 4){T_{\text{h} } } \\ &\quad ( 0.0012n + 0.0331{n^2}) \end{aligned}$
注： $n$ 表示参与密文等值测试的用户数量； $T_{\text{sm}}$ 表示标量乘法运算时间； $T_{\text{mul}}$ 表示群元素乘法运算时间； $T_{\text{h}}$ 表示Hash函数运算时间； $T_{\text{e}}$ 表示指数运算时间； $T_{\text{bp}}$ 表示双线性配对时间； $T_{\text{inv}}$ 表示范德蒙矩阵求逆时间.

下载: 导出CSV

| 显示表格

此外，文献[25−26]方案仅支持将多个用户的密文两两一组进行匹配，其密文等值测试算法中双线性配对运算数量与参与测试的用户数量呈线性关系；而本文方案中，测试者可以同时对 $n$ 个用户的密文进行匹配，且测试过程中不存在双线性配对运算.本文方案的等值测试时间主要取决于测试者对范德蒙行列式求逆时所选取的算法，而在对范德蒙矩阵求逆的过程中仅进行标量加法与乘法等计算效率较高的运算^[28]，因此本文方案的密文等值测试效率同样高于文献[25−26]方案的效率.

5. 结束语

针对现有的WBAN密码方案在多用户环境下计算效率较低等问题，本文提出了支持多密文等值测试的WBAN聚合签密方案.该方案采用基于身份的密码体制，消除了传统公钥方案中证书管理的开销；引入多密文等值测试技术，实现了多数据用户对多医疗密文的同时检索；减少了多用户环境下密文等值测试的计算开销；利用聚合签密技术，提高了对多个用户的医疗数据进行签密的效率.本文方案满足医疗数据在传输过程中的机密性、完整性和可认证性，同时保证了数据拥有者签名的不可伪造性与测试陷门的单向性.与同类方案的对比分析结果表明，本文方案支持更多安全属性且计算开销更低.在未来的工作中，将尝试设计抗量子计算攻击的支持多密文等值测试的WBAN签密方案.

作者贡献声明：杨小东负责论文整体思路与实验方案的设计；周航负责设计方案与撰写论文；任宁宁负责方案仿真与效率分析；袁森负责搜集应用场景相关资料；王彩芬提出指导意见并修改论文.

图 1 基于软件度量的缺陷预测模型

Figure 1. Defect prediction model based on software metrics

下载: 全尺寸图片幻灯片

图 2 基于语法语义的缺陷预测模型

Figure 2. Defect prediction model based on semantic and syntactic

下载: 全尺寸图片幻灯片

图 3 缺陷预测和漏洞预测相关文献数量

Figure 3. Number of literatures related to defect prediction and vulnerability prediction

下载: 全尺寸图片幻灯片

图 4 缺陷预测框架

Figure 4. Defect prediction framework

下载: 全尺寸图片幻灯片

图 5 评估指标统计

Figure 5. Summary of evaluation indicators

下载: 全尺寸图片幻灯片

图 6 度量元发展时间线

Figure 6. Timeline of metrics development

下载: 全尺寸图片幻灯片

图 7 代码示例

Figure 7. Code example

下载: 全尺寸图片幻灯片

表 1 软件缺陷状态描述

Table 1 Software Defect State Description

状态	描述
新建（New）	缺陷在测试中首次出现，并被质量工程师标记
待确认（Pending）	缺陷已被报告，并等待确认
开放（Open）	被确定为缺陷，等待被分配和修复
已分配（Assigned）	初步筛选后，被分配给适当的团队进行修复
拒绝（Rejected）	缺陷不需要修复或者不是缺陷
修复中（In Progress）	缺陷已被确认，并且开发人员正在处理修复
已修复（Fixed）	开发人员修改代码或者配置，并将缺陷标记为已修复
待测试（Test）	修复后的缺陷等待再次进行测试以验证修复是否有效
重新开放（Re-open）	经过修复并重新测试后，缺陷再次出现并被重新标记
已解决（Resolved）	缺陷已经修复，并且通过再次测试验证了修复的有效性
已关闭（Closed）	缺陷被确认为已解决，不需要进一步处理

下载: 导出CSV

表 2 缺陷检测与缺陷预测方法对比

Table 2 Comparison of Defect Detection and Defect Prediction Methods

方法	类别	准确性	范围	时间	局限性
手动测试	缺陷检测	较为准确	较小	很多	可能出现人为错误
自动化分析	缺陷检测	基本准确	较大	适中	难以处理视觉、用户体验等问题
静态分析	缺陷检测	基本准确	较小	少	无法检测运行时行为和集成问题
代码审查	缺陷检测	较为准确	较小	很多	取决于审查者的经验和技能水平
人工智能	缺陷预测	基本准确	大	适中	取决于数据质量和技术

下载: 导出CSV

表 3 软件缺陷模型的公共仓库数据来源

Table 3 Public Warehouse Data Sources for Software Defect Modeling

数据集	项目数	度量名称	文献数量	粒度	数据链接
NASA	13	代码度量	33	函数	http://promise.site.uottawa.ca/SERepository/datasets-page.html
SOFTLAB	5	代码度量	2	函数	https://github.com/bharlow058/AEEEM-and-other-SDP-datasets/tree/master/dataset/SOFTLAB
PROMISE	38	代码度量	41	类	https://zenodo.org/search?page=1&size=20&q=Marian%20Jureckzo&file_type=csv#
Relink	3	代码度量	7	文件	https://github.com/ai-se/HDP_pyjnius/tree/master/dataset/Relink
AEEEM	5	代码度量过程度量	14	类	https://bug.inf.usi.ch/download.php
MORPH	9	代码度量	2	类	https://github.com/bharlow058/AEEEM-and-other-SDP-datasets/tree/master/dataset/MORPH

下载: 导出CSV

表 4 公共数据集属性列表

Table 4 Attributes List of the Publicly Available Datasets

数据集	缺陷仓库	语言	属性	行数	缺陷行	缺陷率/%
CM1	NASA	C	22	498	49	9.84
JM1	NASA	C	22	10885	8779	80.65
KC1	NASA	C++	22	2109	326	15.46
KC2	NASA	C++	22	522	105	20.11
KC3	NASA	Java	40	458	43	9.39
KC4	NASA	Perl	40	125	61	48.80
MC1	NASA	C++	39	9466	68	0.72
MC2	NASA	C++	40	161	52	32.30
MW1	NASA	C	40	403	61	15.14
PC1	NASA	C	40	1107	76	6.87
PC2	NASA	C	40	5589	23	0.41
PC3	NASA	C	40	1563	160	10.24
PC4	NASA	C	40	1458	178	12.21
PC5	NASA	C++	39	17186	516	3.00
ant-1.7	PROMISE	Java	21	745	166	22.30
ivy-2.0	PROMISE	Java	21	352	40	11.40
camel-1.6	PROMISE	Java	21	965	188	19.50
jedit-4.0	PROMISE	Java	21	306	75	24.50
log4j-1.2	PROMISE	Java	21	109	37	33.90
Lucene-2.4	PROMISE	Java	21	195	91	46.70
poi-2.0	PROMISE	Java	21	314	37	11.80
Synapse-1.1	PROMISE	Java	21	222	60	27.00
velocity-1.6	PROMISE	Java	21	229	78	34.10
Xerces-1.3	PROMISE	Java	21	453	60	15.20
tomcat	PROMISE	Java	21	858	77	8.90
Xalan-2.4	PROMISE	Java	21	723	110	15.20
EQ	AEEEM	Java	62	324	129	39.81
JDT	AEEEM	Java	62	997	206	20.66
LC	AEEEM	Java	62	691	64	9.26
ML	AEEEM	Java	62	1862	245	13.16
PDE	AEEEM	Java	62	1497	209	13.96

下载: 导出CSV

表 5 开源软件项目缺陷数量列表

Table 5 Number of Defects List in Open-Source Software Projects

来源	开源软件	版本数量	细粒度	代码行	缺陷数量
文献[12]	Camel	2	文件	112367	62.00
文献[12]	Flume	2	文件	95782	47.00
文献[12]	Tika	2	文件	85341	16.00
文献[12]	Gedit	2	文件	60441	18.50
文献[12]	Nginx	2	文件	80618	18.00
文献[12]	Redis	2	文件	45991	21.00
文献[13]	Gedit	314	函数	2012	58.96
文献[13]	Nagios Core	93	函数	1750	4.82
文献[13]	Nginx	455	函数	1975	6.17
文献[13]	Redis	173	函数	2350	57.31

下载: 导出CSV

表 6 开源软件项目代码变更对缺陷的影响

Table 6 Impact of Code Changes on Defects in Open-Source Software Projects

开源软件	代码更改时间段	文件数量	每次更改的文件数	平均变更的代码行	代码更改诱发的缺陷率/%
Bugzilla	08/1998−12/2006	4620	2.3	37.5	36
Platform	05/2001−12/2007	64250	4.3	72.2	14
Mozilla	01/2000−12/2006	98275	5.3	106.5	5
JDT	05/2001−12/2007	35386	4.3	71.4	14
Columba	11/2002−07/2006	4455	6.2	149.4	31
PostgreSQL	07/1996−05/2010	20431	4.5	101.3	25

下载: 导出CSV

表 7 数据预处理方法

Table 7 Data Preprocessing Methods

来源	年份	数据集	数据预处理模型	分类方法	评价指标
文献[23]	2022	AEEM，NASA	AJCC-Ram	XGBoost	F1-Score
文献[27]	2018	PROMISE	NCL，RUS	Adaboost	PD，PF，G-mean，AUC
文献[30]	2018	NASA，SOFTLAB，ReLink， AEEEM，MORPH	CTKCCA	逻辑回归	PD，PF，F-measure， G-mean，AUC
文献[33]	2019	NASA，PROMISE	STr-NN+TCA	集成学习	F-measure，AUC，Recall，PF
文献[34]	2022	NASA，AEEEM，Relink	BiGAN	随机森林、支持向量机、朴素贝叶斯	AUC，G-mean，F1-Score
文献[39]	2021	NASA，PROMISE，AEEEM，ReLink	EWFS	决策树、朴素贝叶斯	F-measure，AUC
文献[45]	2017	ReLink，AEEEM	FESCH	决策树、朴素贝叶斯、逻辑回归	Precision，Recall， F-measure，AUC
文献[46]	2019	NASA	LSKDSA	逻辑回归	F-measure，AUC
文献[49]	2018	MORPH	HAL，KCPA	逻辑回归	F-measure，G-mean，Balance
文献[50]	2020	MORPH	CDS	随机森林、逻辑回归、朴素贝叶斯	F-measure，G-mean，Balance

下载: 导出CSV

表 8 常用评价指标及其描述

Table 8 Common Evaluation Indicators and Their Descriptions

评价指标	具体描述
Accuracy	模型预测正确的个数占实例总数中的比例
Precision，Correctness	模型预测有缺陷的实例中真实类别为缺陷所占的比例
Recall，TPR	模型预测有缺陷的实际数量占真实有缺陷中的比例
Specificity，TNR	模型预测无缺陷模块的实际数量占真实无缺陷中的比例
FPR	模型预测有缺陷的模块占真实无缺陷中的比例
FNR	模型预测无缺陷的模块占真实有缺陷的比例
AUC	ROC曲线下面积、AUC值越大，模型的有效性越好
MCC	观察到的分类与预测分类的比值
Balance	PF的最佳截止点，ROC曲线中(0, 1)点的归一化欧几里得距离
F-measure	是召回率和精确度之间的调和平均值
F1-Score，F2-Score	不平衡数据集学习的评价标准，表示精准率和召回率的组合
G-mean	Recall和Precision的几何平均数
Error Rate	所有实例中错误分类的比率
AAE	平均绝对误差，表示预测值和实际值之间的绝对差
ARE	平均相对误差，表示预测值和实际值的绝对差与实际值的比值
Completeness	实际缺陷值与预测缺陷值的比值

下载: 导出CSV

表 9 度量元的演进对比

Table 9 Comparison of Metric Evolution

来源	时间	度量元	机器学习/深度学习	评价指标	对比结果
文献[59]	2021	代码气味代码度量	RF，SVM，MLP，DT，NB	ROC，AUC，PR，F1-Score	代码气味优于代码度量且优于这两者的混合度量
文献[62]	2008	代码变更代码度量	LR，NB，DT	FP，Recall，PC	过程度量比代码度量更有效
文献[64]	2016	演化模式度量元代码变更度量元	NB、二元逻辑回归、J48决策树	Precision，Recall，F-measure，ROC	与代码和代码变更相比，演化度量元有相对较好的预测性能
文献[81]	2018	交叉熵	基于LSTM的循环神经网络（RNN）	Precision，Recall，F1-Score，AUC	交叉熵度量比50%的传统度量有更好的预测能力

下载: 导出CSV

表 10 缺陷和漏洞的区别与联系

Table 10 Differences and Connections Between Defects and Vulnerabilities

区别与联系	角度	缺陷	漏洞
区别	概念	软件或程序中存在的某种错误或隐藏的功能故障	软件在设计、实现、配置策略及使用过程中出现的缺陷，它可能导致攻击者在未授权的情况下访问或破坏系统.
	来源	软件架构和设计	软件代码（源代码或二进制代码）
	产生原因	测试范围过小，需求分析不精准，团队职责不规范，硬件配置、固件、处理器中的缺陷，软件配置、操作系统中的缺陷	编程人员的能力，硬件缺陷，软件缺陷，协议缺陷
	披露方式	软件存储库中会对缺陷进行披露，缺陷数据的质量高于漏洞数据的质量	漏洞的披露会引发一系列的攻击，开发人员和漏洞研究人员通常会限制公开披露漏洞的信息
	数量	较多	较少
联系	概念	漏洞是可能被攻击者利用从而实施入侵的软件缺陷
	来源	与硬件、代码的复杂性以及编程人员的能力有关
	影响	会对企业和人们的生活造成巨大的伤害
	检测和预测方法	手动测试、自动化、静态分析、动态分析、代码审查、人工智能

下载: 导出CSV

表 11 缺陷预测和漏洞预测任务的挑战与机遇

Table 11 Opportunities and Challenges of Defect Prediction and Vulnerability Prediction Tasks

挑战	机遇
数据集的来源与处理	建立一个高质量平衡且无噪音的基准数据集
代码向量的表征方法	构建一种最大程度蕴含语法语义信息的表征方法
预训练模型的提高	利用在其他领域训练好的词向量嵌入提升模型性能
深度学习模型的探索	探索更适合具体预测任务的深度学习模型
细粒度预测技术	更加精确地定位缺陷和漏洞可能出现的位置
预训练模型的迁移	通过模型的迁移节约时间和资源成本

下载: 导出CSV

参考文献(124)

[1]	Pachouly J, Ahirrao S, Kotecha K, et al. A systematic literature review on software defect prediction using artificial intelligence: Datasets, data validation methods, approaches, and tools[J]. Engineering Applications of Artificial Intelligence, 2022, 111: 1−33 doi: 10.1016/j.engappai.2022.104773
[2]	陈翔,顾庆,刘望舒,等. 静态软件缺陷预测方法研究[J]. 软件学报,2016,27(1):1−25 doi: 10.13328/j.cnki.jos.004923 Chen Xiang, Gu Qing, Liu Wangshu, et al. Survey of static software defect prediction[J]. Journal of Software, 2016, 27(1): 1−25 (in Chinese) doi: 10.13328/j.cnki.jos.004923
[3]	顾绵雪,孙鸿宇,韩丹,等. 基于深度学习的软件安全漏洞挖掘[J]. 计算机研究与发展,2021,58(10):2140−2162 doi: 10.7544/issn1000-1239.2021.20210620 Gu Mianxue, Sun Hongyu, Han Dan, et al. Software security vulnerability mining based on deep learning[J]. Journal of Computer Research and Development, 2021, 58(10): 2140−2162 (in Chinese) doi: 10.7544/issn1000-1239.2021.20210620
[4]	Trachtenberg M. Discovering how to ensure software reliability[J]. Radio Corporation of America Engineer, 1982, 27(1): 53−57
[5]	Qian Lianfen, Yao Qingchuan, Khoshgoftaar T M. Dynamic two-phase truncated Rayleigh model for release date prediction of software[J]. Journal of Software Engineering and Applications, 2010, 3(06): 603−609 doi: 10.4236/jsea.2010.36070
[6]	Bustamante A, Bustamante B. Multinomial-exponential reliability function: A software reliability model[J]. Reliability Engineering & System Safety, 2003, 79(3): 281−288
[7]	Zheng Yanyan, Xu Renzuo. An adaptive exponential smoothing approach for software reliability prediction[C]//Proc of 2008 4th Int Conf on Wireless Communications, Networking and Mobile Computing. Piscataway, NJ: IEEE, 2008: 1−4
[8]	Yamada S, Ohba M, Osaki S. S-shaped reliability growth modeling for software error detection[J]. IEEE Transactions on Reliability, 1983, 32(5): 475−484
[9]	Kececioglu D, Jiang S, Vassiliou P. The modified Gompertz reliability growth model[C]//Proc of Annual Reliability and Maintainability Symp (RAMS). Piscataway, NJ: IEEE, 1994: 160−165
[10]	Ahmad N, Imam M Z. Software reliability growth models with log-logistic testing-effort function: A comparative study[J]. International Journal of Computer Applications, 2014, 75(12): 8−11
[11]	宫丽娜,姜淑娟,姜丽. 软件缺陷预测技术研究进展[J]. 软件学报,2019,30(10):3090−3114 doi: 10.13328/j.cnki.jos.005790 Gong Lina, Jiang Shujuan, Jiang Li. Research progress of software defect prediction[J]. Journal of Software, 2019, 30(10): 3090−3114 (in Chinese) doi: 10.13328/j.cnki.jos.005790
[12]	Li Yiyao, Lee S Y, Wotawa F, et al. Using tri-relation networks for effective software fault-proneness prediction[J]. IEEE Access, 2019, 7: 63066−63080 doi: 10.1109/ACCESS.2019.2916615
[13]	Lee S Y, Wong W E, Li Yiyao, et al. Software fault-proneness analysis based on composite developer-module networks[J]. IEEE Access, 2021, 9: 155314−155334 doi: 10.1109/ACCESS.2021.3128438
[14]	Zhu Kun, Zhang Nana, Ying Shi, et al. Within-project and cross-project software defect prediction based on improved transfer naive Bayes algorithm[J]. Computers, Materials and Continua, 2020, 63(2): 891−910
[15]	Akiyama F. An example of software system debugging.[J]. IFIP Congress, 1971, 71(1): 353−359
[16]	Halstead M H. Elements of Software Science (Operating and Programming Systems Series)[M]. New York: Elsevier Science Inc, 1977
[17]	Shepperd M, Song Qinbao, Sun Zhongbin, et al. Data quality: Some comments on the NASA software defect datasets[J]. IEEE Transactions on Software Engineering, 2013, 39(9): 1208−1215 doi: 10.1109/TSE.2013.11
[18]	Khoshgoftaar T M, Gao Kehan, Napolitano A, et al. A comparative study of iterative and non-iterative feature selection techniques for software defect prediction[J]. Information Systems Frontiers, 2014, 16(5): 801−822 doi: 10.1007/s10796-013-9430-0
[19]	Li Zhiqiang, Jing Xiaoyuan, Zhu Xiaoke, et al. Heterogeneous defect prediction through multiple kernel learning and ensemble learning[C]//Proc of 2017 IEEE Int Conf on Software Maintenance and Evolution (ICSME). Piscataway, NJ: IEEE, 2017: 91−102
[20]	Kubat M, Matwin S. Addressing the curse of imbalanced training sets: One-sided selection[C]//Proc of the 14th Int Conf on Machine Learning. San Francisco: Morgan Kaufmann, 1997: 179−186
[21]	Kotsiantis S B, Pintelas P E. Mixture of expert agents for handling imbalanced data sets[J]. Annals of Mathematics, Computing & Teleinformatics, 2003, 1(1): 46−55
[22]	Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: Synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321−357 doi: 10.1613/jair.953
[23]	饶珍丹. 软件缺陷预测中不平衡数据分类算法研究[D]. 哈尔滨: 哈尔滨师范大学, 2022 Yao ZhenDan. Research on unbalanced data classification algorithm in software defect prediction[D]. Harbin: Harbin Normal University, 2022(in Chinese)
[24]	He Haibo, Bai Yang, Garcia E A, et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning[C]//Proc of 2008 IEEE Int Joint Conf on Neural Networks (IEEE World Congress on Computational Intelligence). Piscataway, NJ: IEEE, 2008: 1322−1328
[25]	Ma Li, Fan Suohai. CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests[J]. BMC Bioinformatics, 2017, 18(1): 1−18 doi: 10.1186/s12859-016-1414-x
[26]	Kim S, Zhang Hongyu, Wu Rongxin, et al. Dealing with noise in defect prediction[C]//Proc of 2011 33rd Int Conf on Software Engineering (ICSE). Piscataway, NJ: IEEE, 2011: 481−490
[27]	Chen Liu, Fang Bin, Shang Zhaowei, et al. Tackling class overlap and imbalance problems in software defect prediction[J]. Software Quality Journal, 2018, 26(1): 97−125 doi: 10.1007/s11219-016-9342-6
[28]	Tang Wei, Khoshgoftaar T M. Noise identification with the k-means algorithm[C]//Proc of 16th IEEE Int Conf on Tools with Artificial Intelligence. Piscataway, NJ: IEEE, 2004: 373−378
[29]	Goyal S. Handling class-imbalance with KNN (neighbourhood) under-sampling for software defect prediction[J]. Artificial Intelligence Review, 2022, 55(3): 2023−2064 doi: 10.1007/s10462-021-10044-w
[30]	Li Zhiqiang, Jing Xiaoyuan, Wu Fei, et al. Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction[J]. Automated Software Engineering, 2018, 25(2): 201−245 doi: 10.1007/s10515-017-0220-7
[31]	Yang Zhenyu, Jin Chufeng, Zhang Yue, et al. Software defect prediction: An ensemble learning approach[J]. Journal of Physics:Conf Series, 2022, 2171(1): 012008 doi: 10.1088/1742-6596/2171/1/012008
[32]	Jiang Feng, Yu Xu, Gong Dunwei, et al. A random approximate reduct-based ensemble learning approach and its application in software defect prediction[J]. Information Sciences, 2022, 609: 1147−1168 doi: 10.1016/j.ins.2022.07.130
[33]	Gong Lina, Jiang Shujuan, Bo Lili, et al. A novel class-imbalance learning approach for both within-project and cross-project defect prediction[J]. IEEE Transactions on Reliability, 2019, 69(1): 40−54
[34]	Zhang Shenggang, Jiang Shujuan, Yan Yue. A software defect prediction approach based on BiGAN anomaly detection[J]. Scientific Programming, 2022, 2022(1): 1−13
[35]	Rodriguez D, Herraiz I, Harrison R, et al. Preliminary comparison of techniques for dealing with imbalance in software defect prediction[C]//Proc of the 18th Int Conf on Evaluation and Assessment in Software Engineering. New York: ACM, 2014: 1−10
[36]	Eivazpour Z, Keyvanpour M R. CSSG: A cost-sensitive stacked generalization approach for software defect prediction[J]. Software Testing, Verification and Reliability, 2021, 31(5): e1761
[37]	Kohavi R, John G H. Wrappers for feature subset selection[J]. Artificial Intelligence, 1997, 97(1-2): 273−324 doi: 10.1016/S0004-3702(97)00043-X
[38]	He Xiaofei, Cai Deng, Niyogi P. Laplacian score for feature selection[C]//Proc of the 18th Int Conf on Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 2005
[39]	Balogun A O, Basri S, Capretz L F, et al. Software defect prediction using wrapper feature selection based on dynamic re-ranking strategy[J]. Symmetry, 2021, 13(11): 2166−2189 doi: 10.3390/sym13112166
[40]	Thirumoorthy K. A feature selection model for software defect prediction using binary Rao optimization algorithm[J]. Applied Soft Computing, 2022, 131: 109737−109753 doi: 10.1016/j.asoc.2022.109737
[41]	Bahaweres R B, Suroso A I, Hutomo A W, et al. Tackling feature selection problems with genetic algorithms in software defect prediction for optimization[C]//Proc of 2020 Int Conf on Informatics, Multimedia, Cyber and Information System (ICIMCIS). Piscataway, NJ: IEEE, 2020: 64−69
[42]	Miao Linsong, Liu Mingxia, Zhang Daoqiang. Cost-sensitive feature selection with application in software defect prediction[C]//Proc of the 21st Int Conf on Pattern Recognition (ICPR2012). Piscataway, NJ: IEEE, 2012: 967−970
[43]	Liu Shulong, Chen Xiang, Liu Wangshu, et al. FECAR: A feature selection framework for software defect prediction[C]//Proc of 2014 IEEE 38th Annual Computer Software and Applications Conf. Piscataway, NJ: IEEE, 2014: 426−435
[44]	Nam J, Pan S J, Kim S. Transfer defect learning[C]//Proc of 2013 35th Int Conf on Software Engineering (ICSE). Piscataway, NJ: IEEE, 2013: 382−391
[45]	Ni Chao, Liu Wangshu, Chen Xiang, et al. A cluster based feature selection method for cross-project software defect prediction[J]. Journal of Computer Science and Technology, 2017, 32(6): 1090−1107 doi: 10.1007/s11390-017-1785-0
[46]	Li Zhiqiang, Qi Chao, Zhang Li, et al. Discriminant subspace alignment for cross-project defect prediction[C]//Proc of 2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). Piscataway, NJ: IEEE, 2019: 1728−1733
[47]	Chen Jinfu, Wang Xiaoli, Cai Saihua, et al. A software defect prediction method with metric compensation based on feature selection and transfer learning[J]. Frontiers of Information Technology & Electronic Engineering, 2022, 23(5): 715−731
[48]	Lu Huihua, Kocaguneli E, Cukic B. Defect prediction between software versions with active learning and dimensionality reduction[C]//Proc of 2014 IEEE 25th Int Symp on Software Reliability Engineering. Piscataway, NJ: IEEE, 2014: 312−322
[49]	Xu Zhou, Liu Jin, Luo Xiapu, et al. Cross-version defect prediction via hybrid active learning with kernel principal component analysis[C]//Proc of 2018 IEEE 25th Int Conf on Software Analysis, Evolution and Reengineering (SANER). Piscataway, NJ: IEEE, 2018: 209−220
[50]	Zhang Jie, Wu Jiajing, Chen C, et al. Cds: A cross–version software defect prediction model with data selection[J]. IEEE Access, 2020, 8: 110059−110072 doi: 10.1109/ACCESS.2020.3001440
[51]	Marcus A, Maletic J I. Recovering documentation-to-source-code traceability links using latent semantic indexing[C]//Proc of 25th Int Conf on Software Engineering (ICSE). Piscataway, NJ: IEEE, 2003: 125−135
[52]	Menzies T, Dekhtyar A, Distefano J, et al. Problems with precision: A response to “comments on ‘data mining static code attributes to learn defect predictors’”[J]. IEEE Transactions on Software Engineering, 2007, 33(9): 637−640 doi: 10.1109/TSE.2007.70721
[53]	Yao Jingxiu, Shepperd M. The impact of using biased performance metrics on software defect prediction research[J]. Information and Software Technology, 2021, 139(11): 1−14
[54]	乔辉. 软件缺陷预测技术研究[D]. 郑州: 解放军信息工程大学, 2013 Qiao Hui. Research on software defect prediction techniques[D]. Zhengzhou: Information Engineering University, 2013 (in Chinese)
[55]	McCabe T J. A complexity measure[J]. IEEE Transactions on Software Engineering, 1976, 2(4): 308−320
[56]	Chidamber S R, Kemerer C F. A metrics suite for object oriented design[J]. IEEE Transactions on Software Engineering, 1994, 20(6): 476−493 doi: 10.1109/32.295895
[57]	Brito E A F, Carapuça R. Candidate metrics for object-oriented software within a taxonomy framework[J]. Journal of Systems and Software, 1994, 26(1): 87−96 doi: 10.1016/0164-1212(94)90099-X
[58]	Bansiya J, Davis C G. A hierarchical model for object-oriented design quality assessment[J]. IEEE Transactions on Software Engineering, 2002, 28(1): 4−17 doi: 10.1109/32.979986
[59]	Sotto-Mayor B, Kalech M. Cross-project smell-based defect prediction[J]. Soft Computing, 2021, 25(22): 14171−14181 doi: 10.1007/s00500-021-06254-7
[60]	Khoshgoftaar T M, Szabo R M. Improving code churn predictions during the system test and maintenance phases[C]//Proc of 1994 Int Conf on Software Maintenance. Piscataway, NJ: IEEE, 1994: 58−67
[61]	Nagappan N, Ball T. Use of relative code churn measures to predict system defect density[C]//Proc of the 27th Int Conf on Software Engineering. New York: ACM, 2005: 284−292
[62]	Moser R, Pedrycz W, Succi G. A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction[C]//Proc of the 30th Int Conf on Software Engineering. New York: ACM, 2008: 181−190
[63]	Knab P, Pinzger M, Bernstein A. Predicting defect densities in source code files with decision tree learners[C]//Proc of the 2006 Int Workshop on Mining Software Repositories. New York: ACM, 2006: 119−125
[64]	王丹丹,王青. 基于演化数据的软件缺陷预测性能改进[J]. 软件学报,2016,27(12):3014−3029 doi: 10.13328/j.cnki.jos.004869 Wang Dandan, Wang Qing. Improving the performance of defect prediction based on evolution data[J]. Journal of Software, 2016, 27(12): 3014−3029 (in Chinese) doi: 10.13328/j.cnki.jos.004869
[65]	Liu Yibin, Li Yanhui, Guo Jianbo, et al. Connecting software metrics across versions to predict defects[C]//Proc of 2018 IEEE 25th Int Conf on Software Analysis, Evolution and Reengineering (SANER). Piscataway, NJ: IEEE, 2018: 232−243
[66]	Mockus A, Weiss D M. Predicting risk of software changes[J]. Bell Labs Technical Journal, 2000, 5(2): 169−180
[67]	Weyuker E J, Ostrand T J, Bell R M. Using developer information as a factor for fault prediction[C]//Proc of Third Int Workshop on Predictor Models in Software Engineering. Piscataway, NJ: IEEE, 2007: 8−15
[68]	Ostrand T J, Weyuker E J, Bell R M. Programmer-based fault prediction[C]//Proc of the 6th Int Conf on Predictive Models in Software Engineering. New York: ACM, 2010: 1−10
[69]	Pinzger M, Nagappan N, Murphy B. Can developer-module networks predict failures?[C]//Proc of the 16th ACM SIGSOFT Int Symp on Foundations of Software Engineering. New York: ACM, 2008: 2−12
[70]	Nagappan N, Murphy B, Basili V. The influence of organizational structure on software quality[C]//Proc of 2008 ACM/IEEE 30th Int Conf on Software Engineering. New York: ACM, 2008: 521−530
[71]	Mockus A. Organizational volatility and its effects on software defects[C]//Proc of the 18th ACM SIGSOFT Int Symp on Foundations of Software Engineering. New York: ACM, 2010: 117−126
[72]	Zhou Yuming, Leung H. Empirical analysis of object-oriented design metrics for predicting high and low severity faults[J]. IEEE Transactions on Software Engineering, 2006, 32(10): 771−789 doi: 10.1109/TSE.2006.102
[73]	Pai G J, Dugan J B. Empirical analysis of software fault content and fault proneness using Bayesian methods[J]. IEEE Transactions on Software Engineering, 2007, 33(10): 675−686 doi: 10.1109/TSE.2007.70722
[74]	Seliya N, Khoshgoftaar T M. Software quality analysis of unlabeled program modules with semisupervised clustering[J]. IEEE Transactions on Systems, Man, and Cybernetics-Part A:Systems and Humans, 2007, 37(2): 201−211 doi: 10.1109/TSMCA.2006.889473
[75]	Catal C, Sevim U, Diri B. Clustering and metrics thresholds based software fault prediction of unlabeled program modules[C]//Proc of 2009 6th Int Conf on Information Technology: New Generations. Piscataway, NJ: IEEE, 2009: 199−204
[76]	Arisholm E, Briand L C, Johannessen E B. A systematic and comprehensive investigation of methods to build and evaluate fault prediction models[J]. Journal of Systems and Software, 2010, 83(1): 2−17 doi: 10.1016/j.jss.2009.06.055
[77]	Gyimóthy T, Ferenc R, Siket I. Empirical validation of object-oriented metrics on open source software for fault prediction[J]. IEEE Transactions on Software Engineering, 2005, 31(10): 897−910 doi: 10.1109/TSE.2005.112
[78]	Zheng Jun. Cost-sensitive boosting neural networks for software defect prediction[J]. Expert Systems with Applications, 2010, 37(6): 4537−4543 doi: 10.1016/j.eswa.2009.12.056
[79]	Shukla S, Radhakrishnan T, Muthukumaran K, et al. Multi-objective cross-version defect prediction[J]. Soft Computing, 2018, 22(6): 1959−1980 doi: 10.1007/s00500-016-2456-8
[80]	Zhao Liuchang, Shang Zhaowei, Zhao Ling, et al. Siamese dense neural network for software defect prediction with small data[J]. IEEE Access, 2018, 7: 7663−7677
[81]	Zhang Xian, Ben K, Zeng Jie. Cross-entropy: A new metric for software defect prediction[C]//Proc of 2018 IEEE Int Conf on Software Quality, Reliability and Security (QRS). Piscataway, NJ: IEEE, 2018: 111−122
[82]	Yang Xinli, Lo D, Xia Xin, et al. Deep learning for just-in-time defect prediction[C]//Proc of 2015 IEEE Int Conf on Software Quality, Reliability and Security. Piscataway, NJ: IEEE, 2015: 17−26
[83]	Wang Song, Liu Taiyue, Tan Lin. Automatically learning semantic features for defect prediction[C]//Proc of 2016 IEEE/ACM 38th Int Conf on Software Engineering (ICSE). Piscataway, NJ: IEEE, 2016: 297−308
[84]	Wang Song, Liu Taiyue, Nam J, et al. Deep semantic feature learning for software defect prediction[J]. IEEE Transactions on Software Engineering, 2018, 46(12): 1267−1293
[85]	Li Jian, He Pinjia, Zhu Jieming, et al. Software defect prediction via convolutional neural network[C]//Proc of 2017 IEEE Int Conf on Software Quality, Reliability and Security (QRS). Piscataway, NJ: IEEE, 2017: 318−328
[86]	Fan Guisheng, Diao Xuyang, Yu Huiqun, et al. Software defect prediction via attention-based recurrent neural network[J]. Scientific Programming, 2019, 2019(4): 1−14
[87]	Qiu Shaojian, Lu Lu, Cai Ziyi, et al. Cross-project defect prediction via transferable deep learning-generated and handcrafted features[C]//Proc of the 31st Int Conf on Software Engineering and Knowledge Engineering. Skokie: Knowledge Systems Institute Graduate School, 2019: 431−552
[88]	Liu Wangshu, Zhu Yongteng, Chen Xiang, et al. S² LMMD: Cross-project software defect prediction via statement semantic learning and maximum mean discrepancy[C]//Proc of 2021 28th Asia-Pacific Software Engineering Conf (APSEC). Piscataway, NJ: IEEE, 2021: 369−379
[89]	Dam H K, Pham T, Ng S W, et al. A deep tree-based model for software defect prediction[J]. ArXiv Preprint ArXiv: 1802.00921, 2018
[90]	Šikić L, Kurdija A S, Vladimir K, et al. Graph neural network for source code defect prediction[J]. IEEE Access, 2022, 10: 10402−10415 doi: 10.1109/ACCESS.2022.3144598
[91]	Phan A V, Le Nguyen M, Bui L T. Convolutional neural networks over control flow graphs for software defect prediction[C]//Proc of 2017 IEEE 29th Int Conf on Tools with Artificial Intelligence (ICTAI). Piscataway, NJ: IEEE, 2017: 45−52
[92]	Xu Jiaxi, Ai Jun, Liu Jingyu, et al. ACGDP: An augmented code graph-based system for software defect prediction[J]. IEEE Transactions on Reliability, 2022, 71(2): 850−864 doi: 10.1109/TR.2022.3161581
[93]	Li Zhen, Zou Deqing, Xu Shouhuai, et al. VulDeePecker: A deep learning-based system for vulnerability detection[J]. ArXiv Preprint ArXiv: 1801.01681, 2018
[94]	Huo Xuan, Yang Yang, Li Ming, et al. Learning semantic features for software defect prediction by code comments embedding[C]//Proc of 2018 IEEE Int Conf on Data Mining (ICDM). Piscataway, NJ: IEEE, 2018: 1049−1054
[95]	Qu Yu, Liu Ting, Chi Jianlei, et al. Node2defect: Using network embedding to improve software defect prediction[C]//Proc of 2018 33rd IEEE/ACM Int Conf on Automated Software Engineering (ASE). Piscataway, NJ: IEEE, 2018: 844−849
[96]	Zeng Cheng, Zhou Chunying, Lv Shengkai, et al. GCN2defect: Graph convolutional networks for smotetomek-based software defect prediction[C]//Proc of 2021 IEEE 32nd Int Symp on Software Reliability Engineering (ISSRE). Piscataway, NJ: IEEE, 2021: 69−79
[97]	Zhou Chunying, He Peng, Zeng Cheng, et al. Software defect prediction with semantic and structural information of codes based on graph neural networks[J]. Information and Software Technology, 2022, 152: 107057 doi: 10.1016/j.infsof.2022.107057
[98]	Yang Fengyu, Huang Yaxuan, Xu Haoming, et al. Fine-grained software defect prediction based on the method-call sequence[J]. Computational Intelligence and Neuroscience, 2022, 2022(8): 1−15
[99]	Uddin M N, Li Bixin, Ali Z, et al. Software defect prediction employing BiLSTM and BERT-based semantic feature[J]. Soft Computing, 2022, 26(16): 7877−7891 doi: 10.1007/s00500-022-06830-5
[100]	Shin Y, Williams L. An empirical model to predict security vulnerabilities using code complexity metrics[C]//Proc of the Second ACM-IEEE Int Symp on Empirical Software Engineering and Measurement. New York: ACM, 2008: 315−317
[101]	Gegick M, Williams L, Osborne J, et al. Prioritizing software security fortification through code-level security metrics[C]//Proc of Workshop on Quality of Protection. New York: ACM, 2008: 31−38
[102]	Meneely A, Williams L. Secure open source collaboration: An empirical study of Linus’ law[C]//Proc of the 16th ACM Conf on Computer and Communications Security. New York: ACM, 2009: 453−462
[103]	Shin Y, Meneely A, Williams L, et al. Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities[J]. IEEE Transactions on Software Engineering, 2010, 37(6): 772−787
[104]	Chowdhury I, Zulkernine M. Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities[J]. Journal of Systems Architecture, 2011, 57(3): 294−313 doi: 10.1016/j.sysarc.2010.06.003
[105]	Hovsepyan A, Scandariato R, Joosen W, et al. Software vulnerability prediction using text analysis techniques[C]//Proc of the 4th Int Workshop on Security Measurements and Metrics. New York: ACM, 2012: 7−10
[106]	Scandariato R, Walden J, Hovsepyan A, et al. Predicting vulnerable software components via text mining[J]. IEEE Transactions on Software Engineering, 2014, 40(10): 993−1006 doi: 10.1109/TSE.2014.2340398
[107]	Yamaguchi F, Lottmann M, Rieck K. Generalized vulnerability extrapolation using abstract syntax trees[C]//Proc of the 28th Annual Computer Security Applications Conf. New York: ACM, 2012: 359−368
[108]	Meng Qingkun, Wen Shameng, Feng Chao, et al. Predicting buffer overflow using semi-supervised learning[C]//Proc of 2016 9th Int Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI). Piscataway, NJ: IEEE, 2016: 1959−1963
[109]	Pang Yulei, Xue Xiaozhen, Wang Huaying. Predicting vulnerable software components through deep neural network[C]//Proc of the 2017 Int Conf on Deep Learning Technologies. New York: ACM, 2017: 6−10
[110]	Dam H K, Tran T, Pham T, et al. Automatic feature learning for predicting vulnerable software components[J]. IEEE Transactions on Software Engineering, 2018, 47(1): 67−85
[111]	Kalouptsoglou I, Siavvas M, Kehagias D, et al. An empirical evaluation of the usefulness of word embedding techniques indeep learning-based vulnerability prediction[C]//Proc of Int ISCIS Security Workshop. Berlin: Springer, 2022: 23−37
[112]	马倩华. 基于深度学习的软件源码漏洞预测[D]. 北京: 北京邮电大学, 2020 Ma Qianhua. Deep learning-based software vulnerability prediction[D]. Beijing: Beijing University of Posts and Telecommunications, 2020 (in Chinese)
[113]	Zhou Yaqin, Liu Shangqing, Siow J, et al. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks[C]//Proc of 33rd Conf on Neural Information Processing Systems (NeurIPS). San Diego, CA, USA: NIPS, 2019: 10197−10207
[114]	Li Zhen, Zou Deqing, Xu Shouhuai, et al. SySeVR: A framework for using deep learning to detect software vulnerabilities[J]. IEEE Transactions on Dependable and Secure Computing, 2021, 19(4): 2244−2258
[115]	Li Yi, Wang Shouhua, Nguyen T N. Vulnerability detection with fine-grained interpretations[C]//Proc of the 29th ACM Joint Meeting on European Software Engineering Conf and Symp on the Foundations of Software Engineering. New York: ACM, 2021: 292−303
[116]	Fu M, Tantithamthavorn C. LineVul: A transformer-based line-level vulnerability prediction[C]//Proc of 2022 IEEE/ACM 19th Int Conf on Mining Software Repositories (MSR). Piscataway, NJ: IEEE, 2022: 608−620
[117]	Shin Y, Williams L. Is complexity really the enemy of software security?[C]//Proc of the 4th ACM Workshop on Quality of Protection. New York: ACM, 2008: 47−50
[118]	Viega J, McGraw G R. Building Secure Software: How to Avoid Security Problems the Right Way, Portable Documents[M]. London: Pearson Education, 2001
[119]	高志伟,姚尧,饶飞,等. 基于漏洞严重程度分类的漏洞预测模型[J]. 电子学报,2013,41(9):1784−1787 doi: 10.3969/j.issn.0372-2112.2013.09.018 Gao Zhiwei, Yao Yao, Rao Fei, et al. Prediction model of vulnerabilities based on the type of vulnerability severity[J]. Acta Electronica Sinica, 2013, 41(9): 1784−1787 (in Chinese) doi: 10.3969/j.issn.0372-2112.2013.09.018
[120]	Pan Zhixin, Mishra P. A survey on hardware vulnerability analysis using machine learning[J]. IEEE Access, 2022, 10: 49508−49527 doi: 10.1109/ACCESS.2022.3173287
[121]	Palix N, Thomas G, Saha S, et al. Faults in Linux: Ten years later[C]//Proc of the 16th Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2011: 305−318
[122]	Zimmermann T, Nagappan N, Williams L. Searching for a needle in a haystack: Predicting security vulnerabilities for windows vista[C]//Proc of 2010 3rd Int Conf on Software Testing, Verification and Validation. Piscataway, NJ: IEEE, 2010: 421−428
[123]	Shin Y, Williams L A. Can fault prediction models and metrics be used for vulnerability prediction?[R]. North Carolina, USA: North Carolina State University, Department of Computer Science, 2010
[124]	Shin Y, Williams L. Can traditional fault prediction models be used for vulnerability prediction?[J]. Empirical Software Engineering, 2013, 18(1): 25−59 doi: 10.1007/s10664-011-9190-8