数据库索引调优技术综述

赖思超; 吴小莹; 彭煜玮; 彭智勇

doi:10.7544/issn1000-1239.202220931

数据库索引调优技术综述

赖思超^1,,
吴小莹¹,
彭煜玮¹,
彭智勇^{1, 2, ,}

1.
武汉大学计算机学院　武汉　430072
2.
武汉大学大数据研究院　武汉　430072

基金项目: 国家自然科学基金项目（U1811263）；CCF-华为数据库创新研究计划项目（CCF-HuaweiDBIR003A）

详细信息

作者简介:
赖思超: 1993年生. 博士研究生. CCF学生会员. 主要研究方向为数据库调优、数据库索引、AI4DB

吴小莹: 1973年生. 博士，副教授. CCF会员. 主要研究方向为数据管理、数据查询处理和优化、关键字查询、模式挖掘、语义网、数据集成

彭煜玮: 1980年生. 博士，副教授. CCF会员. 主要研究方向为数据库系统、数字水印

彭智勇: 1963年生. 博士，教授. CCF会士. 主要研究方向为数据库、大数据管理与分析、可信数据管理、复杂数据管理

通讯作者:
彭智勇（peng@whu.edu.cn）

中图分类号: TP18
计量
- 文章访问数: 483
- HTML全文浏览量: 71
- PDF下载量: 157
出版历程
- 收稿日期: 2022-11-10
- 修回日期: 2023-06-12
- 网络出版日期: 2024-01-29
- 刊出日期: 2024-04-05

Survey on Database Index Tuning Techniques

1.
School of Computer Science, Wuhan University, Wuhan 430072
2.
Big Data Institute, Wuhan University, Wuhan 430072

Funds: This work was supported by the National Natural Science Foundation of China (U1811263) and the CCF-Huawei Database Innovation Research Program (CCF-HuaweiDBIR003A).

More Information

Author Bio:
Lai Sichao: born in 1993. PhD candidate. Student member of CCF. His main research interests include database tuning, database indexes, and AI4DB

Wu Xiaoying: born in 1973. PhD, associate professor. Member of CCF. Her main research interests include data management, data processing and optimization, keyword search, pattern mining, web net, and data integration

Peng Yuwei: born in 1980. PhD, associate professor. Member of CCF. His main research interests include database systems and digital watermarks

Peng Zhiyong: born in 1963. PhD, professor. Fellow of CCF. His main research interests include databases, big data management and analysis, trusted data management, and complex data management

摘要

摘要:
索引调优是数据库调优的重要组成部分，一直受到广泛关注. 由于索引调优问题的理论复杂性和大数据时代的到来，通过DBA手动调优的方案已经无法满足现代数据库的发展需求，调优方案逐渐开始向自动化、智能化的方向发展. 随着机器学习技术的发展，越来越多的索引选择方案开始引入机器学习技术，并取得了一定的研究成果. 将索引调优问题的解决方案归结为一种基于搜索的调优范式，归纳了其研究内容，阐述了其面临的挑战，对调优范式内的索引配置空间的生成、索引配置的评价以及索引配置的枚举与搜索3方面的研究成果进行了归纳、总结和对比. 对动态工作负载下的索引选择问题（index selection problem，ISP）所面临的新挑战进行了分析，并基于在线反馈控制回路框架对其解决方案进行梳理. 讨论了索引调优工具的发展与现状，通过对现有研究的分析论述，为后来研究者提供参考和研究思路，并对索引选择方案的未来进行了展望.
- 数据库索引 /
- 索引选择 /
- 索引调优 /
- 性能调优 /
- 机器学习
Abstract:
Index tuning is an important problem in database performance tuning and has been studied consistently by worldwide researchers. Due to the theoretical complexity of index tuning as well as the advent of the big data era, manual tuning by DBA is no longer feasible for modern database systems, hence automated and intelligent solutions have been developed. With the development of machine learning techniques, more and more index tuning solutions have integrated with machine learning techniques for better performance and significant progress has been made recently. In this survey, we formulate the problem of index tuning under a search-based paradigm, and under this context, we analyze the main tasks and challenges of this problem. We categorize relevant studies into three main components of the search-based paradigm, namely the generation of the index configurations’ search space, the evaluation of specific index configurations, and the enumeration or the search of index configurations. Then we discuss and compare the related work in each category. We further identify and analyze new challenges for the online index tuning problem where the workload is ad hoc, dynamic, and shifting. We summarize the existing solutions under the online feedback control loop framework. Finally, we discuss the state-of-the-art index tuning tools. Hopefully, with the thorough discussion and evaluation of current research, this survey can provide insights to interested researchers and conclude with future research directions for index tuning.
- database index /
- index selection /
- index tuning /
- performance tuning /
- machine learning

HTML全文

无线体域网^[1]（wireless body area network, WBAN）指由佩戴或嵌入在人体的各种无线传感器（wireless sensor, WS）组成的无线通信网络.WBAN技术在医疗数据监测方面的应用极为广泛，不同类型的无线医疗传感器负责监测患者各个方面的医疗数据并将数据发送给各种远端服务器，方便对患者的医疗数据做出专业的分析与整合.然而，开放的WBAN在传输患者敏感的医疗数据时，面临着患者的隐私被泄露或医疗数据被恶意篡改等风险^[2].

许多国内外学者提出将密码体制应用到WBAN中，以确保WBAN的医疗数据在传输与共享时的机密性.Mykletun等人^[3]基于传统公钥密码（public key cryptography, PKC）体制，设计了一种保证无线传感网络数据机密性的加密方案.Nadir等人^[4]基于PKC体制与椭圆曲线密码体制为用户生成对称密钥来加密数据，确保医疗数据在无线传感网络中传输与共享时的机密性.然而，基于PKC体制的方案^[3-4]需要可信中心对用户证书进行管理，为消除证书管理的开销，一些基于身份加密体制的WBAN方案^[5-7]相继被提出.上述文献[3−7]利用对数据进行加密的方式确保了医疗数据传输时的机密性，但这种方式没有实现对医疗数据来源的认证.如果无法实现医疗数据的可认证性，不仅会导致医院浪费宝贵的医疗资源进行无效的诊断，还可能基于被篡改的医疗数据而对患者的病情做出错误诊断.

为了实现WBAN中医疗数据的可认证性，Ahn等人^[8]构造了一种基于高级加密标准（advanced encryption standard，AES）对称密码体制的认证方案.黄一才等人^[9]基于身份密码体制设计了一种签名方案，该方案实现了抗重放攻击.Cagalaban等人^[10]将数字签密技术引入医疗保健系统，在确保医疗数据机密性的同时实现了数据的可认证性.Ullah等人^[11]利用超椭圆曲线的概念，设计了一种基于证书的签密方案.尽管文献[8−11]实现了医疗数据的可认证性，但都没有考虑在多用户环境下的应用场景.为解决密码方案在多用户环境下的WBAN中计算效率较低的问题，基于聚合签名与聚合加密等技术，一些支持聚合模式的方案^[12-15]相继被提出.然而，文献[8−15]没有考虑如何对WBAN云端密文进行有效的搜索，导致数据用户在对医疗数据进行检索时开销较大.

基于可搜索加密技术^[16]与密文等值测试技术^[17]，国内外学者提出了一些适用于WBAN的密文检索方案^[18-21].但这些WBAN密文检索方案均存在一些缺陷，例如张嘉懿^[18]与Andrew等人^[19]提出的可搜索加密方案仅支持对用相同公钥加密的医疗数据进行搜索；Ramadan等人^[20]设计的等值测试加密方案无法实现对医疗数据来源的认证；Elhabob等人^[21]设计的基于证书的密文等值测试方案存在证书管理问题等.此外，医生或医疗机构有时需要判断多个患者某些特定方面的医疗数据是否相同，或对有相同病症的患者的医疗数据进行整合与存档，但密文检索文献[18− 21]均没有考虑到多用户检索以及对多密文同时进行检索的情况，在用户节点众多的WBAN实际应用环境中存在一定局限性.

WBAN通常会面临需要对2个以上的密文进行匹配的情况，而传统的密文等值测试技术只能将多个密文两两分为一组，再对所有的分组逐个进行测试，在多用户环境下的密文检索效率较低.为提高密文等值测试技术在多密文测试时的计算效率，Susilo等人^[22]提出了一种支持多密文等值测试的公钥加密（public-key encryption with multi-ciphertext equality test, PKE-MET）方案，实现了对2个以上的密文同时进行匹配的功能.在PKE-MET方案中，每个参与多密文等值测试的数据拥有者都可以指定1个数字n，并将自己的密文与其他n−1个数据拥有者的密文进行匹配.PKE-MET在支持同时对多密文进行等值测试的同时，还支持对多个用户同时进行密文检索，当测试者接收到n个希望进行密文检索的数据用户分别上传的n个测试陷门时，才可以对数据拥有者的密文进行测试，实现了多数据用户同时进行密文匹配的功能.然而，PKE-MET方案中存在证书管理开销较大、无法对数据的来源进行认证等问题.

针对以上问题，本文提出了一种支持多密文等值测试的WBAN聚合签密方案.该方案的创新点主要包括3个方面：

1）基于身份签密体制.本文方案采用基于身份的签密体制，消除了传统公钥加密方案中存在的证书管理开销，确保了WBAN中医疗数据的机密性、完整性、可认证性与数据拥有者签名的不可伪造性.

2）支持多用户密文聚合签密.引入聚合签密技术，验证者可以实现对多个数据拥有者医疗数据密文的批量验证，提高了签密方案在多用户环境下的验证效率.

3）支持多密文等值测试.引入多密文等值测试技术，测试者可以利用数据用户上传的测试陷门同时对多个密文进行匹配，实现了多用户检索与多密文等值测试，降低了多用户环境下等值测试过程的计算开销.

1. 预备知识

1.1 困难问题

计算性Diffie-Hellman（computation Diffie-Hellman, CDH）问题：给定 $(P,aP,bP)$ ，其中 $a,b \in \mathbb{Z}_p^*$ ，计算 $abP$ .

1.2 克拉默法则

由含有 $n$ 个未知数 ${x_1},{x_2}, …,{x_n}$ 的 $n$ 个线性方程所组成的非齐次线性方程组

$\left\{ \begin{gathered} {a_{11}}{x_1} + {a_{12}}{x_2} + \cdots + {a_{1n}}{x_n} = {b_1} , \\ {a_{21}}{x_1} + {a_{22}}{x_2} + \cdots + {a_{2n}}{x_n} = {b_2} , \\ {\text{ }} \vdots \\ {a_{n1}}{x_1} + {a_{n2}}{x_2} + \cdots + {a_{nn}}{x_n} = {b_n} , \\ \end{gathered} \right.$

所对应的系数矩阵为

${\boldsymbol{A}} = \left({\begin{array}{*{20}{c}} {{a_{11}}}&{{a_{12}}}& \cdots &{{a_{1n}}} \\ {{a_{21}}}&{{a_{22}}}& \cdots &{{a_{2n}}} \\ \vdots & \vdots &{}& \vdots \\ {{a_{n1}}}&{{a_{n2}}}& \cdots &{{a_{nn}}} \end{array}} \right),$

矩阵A对应的行列式为

$\det ({\boldsymbol{A}}) = \left| {\begin{array}{*{20}{c}} {{a_{11}}}&{{a_{12}}}& \cdots &{{a_{1n}}} \\ {{a_{21}}}&{{a_{22}}}& \cdots &{{a_{2n}}} \\ \vdots & \vdots &{}& \vdots \\ {{a_{n1}}}&{{a_{n2}}}& \cdots &{{a_{nn}}} \end{array}} \right| \text{，}$

若 $\det ({\boldsymbol{A}}) \ne 0$ ，则该方程组有唯一解.

1.3 范德蒙矩阵与范德蒙行列式

形如

${\boldsymbol{V}} = \left( {\begin{array}{*{20}{c}} 1&{{a_1}}&{a_1^2}& \cdots &{a_1^{n - 1}} \\ 1&{{a_2}}&{a_2^2}& \cdots &{a_2^{n - 1}} \\ \vdots & \vdots & \vdots &{}& \vdots \\ 1&{{a_n}}&{a_n^2}& \cdots &{a_n^{n - 1}} \end{array}} \right)$

的矩阵称为范德蒙矩阵，其对应的范德蒙行列式 $\det ({\boldsymbol{V}})$ 具有如下计算性质：

$\det ({\boldsymbol{V}}) = \left| {\begin{array}{*{20}{c}} 1&{{a_1}}&{a_1^2}& \cdots &{a_1^{n - 1}} \\ 1&{{a_2}}&{a_2^2}& \cdots &{a_2^{n - 1}} \\ \vdots & \vdots & \vdots &{}& \vdots \\ 1&{{a_n}}&{a_n^2}& \cdots &{a_n^{n - 1}} \end{array}} \right| = \prod\limits_{1 \leqslant i \lt j \leqslant n} {({a_i} - {a_j})} .$

2. 本文方案

2.1 系统模型

本文提出的支持多密文等值测试的WBAN聚合签密方案的系统模型如图1所示，它包括6个实体：私钥生成器（private key generator, PKG）、云存储提供商、数据拥有者（即患者佩戴的无线传感器）、密文等值测试者、聚合者与数据用户（data user, DU）.

图 1 本文系统模型

Figure 1. The proposed system model

下载: 全尺寸图片幻灯片

各个实体具体介绍为：

1）私钥生成器.负责为WBAN中的数据拥有者和数据用户生成密钥.

2）云存储提供商.负责在云服务器中存储用户上传的医疗密文 $C{T_1}$ ， $C{T_2}$ ，…， $C{T_n}$ .

3）数据拥有者.即患者佩戴的无线传感器，负责对医疗数据进行签密并将医疗密文上传到云端存储.

4）测试者.对从云服务器下载的多个医疗密文执行等值测试操作，将测试结果返回给云服务器.

5）聚合者.负责对多个数据拥有者的医疗数据进行聚合签密，将聚合医疗密文上传到云端存储.

6）数据用户.即医生、医疗机构与数据处理中心等希望获取医疗密文的用户，负责将等值测试的陷门上传给测试者，并对从云服务器下载的医疗密文进行解密与认证.

2.2 安全目标

本文提出的支持多密文等值测试的聚合签密方案需要考虑2种类型的敌手，第1类敌手无法访问数据用户的测试陷门，第2类敌手可以获取数据用户的测试陷门.针对这2类敌手，本文提出的方案旨在达到的安全目标为：

1）医疗数据的机密性和完整性.WBAN中传输的大多是敏感的医疗数据，若患者的医疗数据在传输时中被恶意窃取或篡改，会造成严重后果.本文利用基于身份的加密体制，保证了所提方案在面对第1类攻击者时医疗数据的机密性与完整性.机密性指即使攻击者截取了传输的医疗密文也无法获取与明文相关的信息；完整性则指医疗数据在传输时中无法被敌手伪造或篡改.

2）数据拥有者签名的不可伪造性.本文新方案在对数据拥有者的签名的合法性进行验证的过程中，采用基于身份的签密体制，保证了在面对第1类攻击者时数据拥有者签名的不可伪造性，即攻击者不能伪造出合法的数据拥有者签名.

3）测试陷门的单向性.测试者通过数据用户上传的测试陷门对医疗密文进行等值测试操作，在测试过程中，需要保证面对第2类敌手时测试陷门满足单向性，即敌手无法通过测试陷门获取与参与测试的医疗数据明文相关的信息.

2.3 方案构造

2.3.1 系统初始化

给定安全参数 $k$ ，PKG选择大素数 $p$ ( $p \gt {2^k}$ )， $G$ 是阶为 $p$ 的循环加法群， $P$ 是 $G$ 的生成元.PKG随机选择 $s \in \mathbb{Z}_p^*$ 作为主密钥秘密保存，计算 ${P_{{\text{pub}}}} = sP$ 作为系统公钥，定义6个Hash函数： ${H_1}:{\{ 0,1\} ^*} \to \mathbb{Z}_p^*$ ， ${H_2}:{\{ 0,1\} ^*} \times G \to \mathbb{Z}_p^*$ ， ${H_3}:{\{ 0,1\} ^*} \times G \to \mathbb{Z}_p^*$ ， ${H_4}:G \to {\{ 0,1\} ^{{l_0} + {l_1}}}$ ， ${H_5}:{\{ 0,1\} ^*} \to \mathbb{Z}_p^*$ ， ${H_6}:{\{ 0,1\} ^*} \to {\{ 0,1\} ^k}$ ，其中 ${l_0}$ 是密文长度.输出系统参数 $params = \{ p,P,{P_{{\text{pub}}}},G,{H_1},{H_2},{H_3},{H_4},{H_5},{H_6}\}$ .

2.3.2 用户密钥提取

1）用户将 $I{D_i}$ 上传给PKG，PKG计算 ${Q_i} = {H_1}(I{D_i})$ ， $s{k_{i,1}} = s{Q_i}$ ；

2）PKG随机选择 ${x_i} \in \mathbb{Z}_p^*$ ，计算 $P{K_{i,1}}\; =\; {x_i}P$ ， $P{K_{i,2}}\; = {H_1}(I{D_i}||P{K_{i,1}})$ ， $s{k_{i,2}} = {x_i} + sP{K_{i,2}}$ ， $s{k_{i,3}} = {H_1}(I{D_i}||s)$ ， $P{K_{i,3}} = s{k_{i,3}}P$ ；

3）PKG输出公共参数 $P{K_i} = (P{K_{i,1}},P{K_{i,2}},P{K_{i,3}})$ 与私钥 $s{k_i} = (s{k_{i,1}},s{k_{i,2}},s{k_{i,3}})$ .

2.3.3 医疗数据签密及上传

给定参与密文等值测试与聚合签密的数据拥有者数量为 $n$ ，数据拥有者的身份标识为 $I{D_i}$ ，数据用户的身份标识为 $I{D_j}$ ，其中 $i,j \in \{ 1,2, \cdots ,n\}$ .数据拥有者执行1)~5)操作对 ${m_i}$ 进行签密：

1）随机选择 ${a_i},{b_i},{N_i} \in \mathbb{Z}_p^*$ ，计算 ${C_{i,1}} = {a_i}P$ ， ${C_{i,2}} = {b_i}P$ ， ${R_i} = {a_i}{Q_j}{P_{{\text{pub}}}}$ ；

2）计算 ${U_i} = {H_2}({m_i},I{D_i},I{D_j},{R_i},P{K_{i,1}},P{K_{j,1}})$ ， ${V_i} = {H_3} ({m_i},I{D_i},I{D_j},{R_i},P{K_{i,1}},P{K_{j,1}})$ ， ${v_i} = {a_i}{U_i} + s{k_{i,2}}{V_i}$ ， ${C_{i,3}} = {v_i}P$ ， ${C_{i,4}} = {H_4}({R_i}) \oplus ({m_i}||{v_i})$ ；

3）计算 ${f_{i,0}} = {H_5}({m_i}||n)$ ， ${f}_{i,1} = {H}_{5}({m}_{i}|\left|n\right||{f}_{i,0}),\cdots$ ， ${f_{i,n - 1}} = {H_5}({m_i}||n||{f_{i,0}}|| \cdots ||{f_{i,n - 2}})$ ；

4）计算 ${C_{i,5}} \;= \;{H_4}({b_i}P{K_{j,3}}) \;\oplus\; ({N_i}||f({N_i}))$ ， ${C_{i,6}}\; = \;{H_6} (n|| {C_{i,1}}|| \cdots ||{C_{i,5}}||{b_i}P{K_{j,3}}||{f_{i,0}}|| \cdots ||{f_{i,n - 1}})$ ，其中 $f({N_i}) = {f_{i,0}} + {f_{i,1}}{N_i} + {f_{i,2}}N_i^2 + \cdots + {f_{i,n - 1}}N_i^{n - 1}$ ；

5）将密文 $C{T_i} = ({t_i},{C_{i,1}},{C_{i,2}},{C_{i,3}},{C_{i,4}},{C_{i,5}},{C_{i,6}})$ 上传到云端存储，其中 ${t_i} = n$ .

2.3.4 多密文等值测试

$n$ 个数据用户分别将等值测试陷门 $t{k_j} = s{k_{j,3}}$ 发送给测试者，其中 $j \in \{ 1,2, \cdots ,n\}$ .测试者从云服务器分别下载 $n$ 个数据拥有者想要测试的密文 $C{T_1，CT_2，\cdots，CT_n}$ ，执行1）~3）多密文等值测试操作：

1）检查 ${t_1} = {t_2} = \cdots = {t_n} = n$ 是否成立，若成立测试者则继续执行以下操作，否则终止操作并输出“ $\bot$ ”；

2）对于 $i \in \{ 1,2, \cdots ,n\}$ ， $j \in \{ 1,2, \cdots ,n\}$ ，测试者分别计算 ${N_i}||f({N_i}) = {C_{i,5}} \oplus {H_4}({C_{i,2}}t{k_j})$ ，由签密算法有 $f({N_i}) = {f_{i,0}} + {f_{i,1}}{N_i} + {f_{i,2}}N_i^2 + \cdots + {f_{i,n - 1}}N_i^{n - 1}$ ，测试者将 $n$ 个等式合并得到方程组

$\left\{\begin{aligned} &f({N}_{1})={f}_{1,0}+{f}_{1,1}{N}_{1}+{f}_{1,2}{N}_{1}^{2}+\cdots +{f}_{1,n-1}{N}_{1}^{n-1}，\\ &f({N}_{2})={f}_{2,0}+{f}_{2,1}{N}_{2}+{f}_{2,2}{N}_{2}^{2}+\cdots +{f}_{2,n-1}{N}_{2}^{n-1}，\\ & \;\;\; \vdots \\ &f({N}_{n})={f}_{n,0}+{f}_{n,1}{N}_{n}+{f}_{n,2}{N}_{n}^{2}+\cdots +{f}_{n,n-1}{N}_{n}^{n-1}，\end{aligned}\right.$

并隐式设置 ${f_{i,k}} = {f_{j,k}}$ ，其中 $k \in \{ 0,1, \cdots ,n - 1\}$ ，测试者通过对该方程组对应的范德蒙矩阵求逆，获得方程组的唯一一组解 ${f_{1,0}},{f_{1,1}}, \cdots ,{f_{1,n - 1}}$ ；

3）检查等式 ${C_{i,6}} = {H_6}(n||{C_{i,1}}||{C_{i,2}}||{C_{i,3}}||{C_{i,4}}||{C_{i,5}}||{C_{i,2}}t{k_j}|| {f_{i,0}}||{f_{i,1}}|| \cdots ||{f_{i,n - 1}})$ 是否成立，若成立测试者则向云服务器输出测试结果为“1”，否则向云服务器输出测试结果为“0”.

2.3.5 医疗数据聚合签密及上传

若云服务器接收到的密文等值测试结果为“1”，代表 $n$ 个数据拥有者的医疗密文全部相同，云服务器将所有数据拥有者的医疗密文 $C{T}_{1}，C{T}_{2}，\cdots ，C{T}_{n}$ 发送给聚合者，聚合者执行1)~2)操作对医疗密文进行聚合签密：

1）计算 ${X_{{\text{agg}}}} = \displaystyle\sum\limits_{i = 1}^n {{C_{i,3}}}$ ；

2）将聚合医疗密文 ${\sigma _{{\text{agg}}}} = ({\{ C{T_i}\} _{i = 1,2, \cdots ,n}},{X_{{\text{agg}}}})$ 上传到云服务器存储.

2.3.6 医疗数据下载及解密

给定数据用户的身份标识为 $I{D_j}$ ，其中 $j \in \{ 1, 2, \cdots , n\}$ .数据用户从云端下载聚合医疗密文 ${\sigma _{{\text{agg}}}}$ ，对密文进行解密并验证数据来源.数据用户的具体操作如为：

1）计算 $R_{i}'= sk_{j,1} C_{i,1}$ ， $m_i'||v_i' = {C_{i,4}} \oplus {H_4}(R_i')$ ；

2）根据 $m_i'$ 的值计算 ${f}_{i,0}'\;=\;{H}_{5}({m}_{i}'||n)，$ $f_{i,1}^{{'} }\; =\; {H_5}(m_i^{{'} }||n|| f_{i,0}^{{'} }) ，\cdots$ ， $f_{i,n - 1}^{'} = {H_5}(m_i'||n||f_{i,0}'||, \cdots ||f_{i,n - 2}^{{'} })$ ， $N_i^{{'} }||f(N_i^{{'} }) = {C_{i,5}} \oplus {H_4} ({C_{i,2}}s{k_{j,3}})$ ；

3）计算 $U_i^{{'} } = {H_2}(m_i^{{'} },I{D_i},I{D_j},R_i^{{'} },P{K_{i,1}},P{K_{j,1}})$ ， $V_i' = {H_3} (m_i', \; I{D_i},\;I{D_j},\;R_i',\;P{K_{i,1}},\;P{K_{j,1}})$ ， $X_{{\text{agg}}}' = \displaystyle\sum\limits_{i = 1}^n {v_i'P}$ ， $X_{{\text{agg}}}^*= \displaystyle\sum\limits_{i = 1}^n {U_i'{C_{i,1}} +} \displaystyle\sum\limits_{i = 1}^n {V_i'P{K_{i,1}} + }\displaystyle\sum\limits_{i = 1}^n {V_i'P{K_{i,2}}{P_{{\text{pub}}}}}$ ；

4）分别检查等式 ${C_{i,6}}\; =\; {H_6}(n||{C_{i,1}}||{C_{i,2}}||{C_{i,3}}||{C_{i,4}}||{C_{i,5}}|| {C_{i,2}}s{k_{j,3}}|| f_{i,0}'||f_{i,1}'|| \cdots ||f_{i,n - 1}')$ ， $X_{{\text{agg}}}^* = X_{{\text{agg}}}'$ ， $f(N_i') = f_{i,0}' + {f_{i,1}'N_i'} +\cdots+ f_{i,n-1}'N_i^{{'}n-1}$ 是否同时成立.

若以上等式均成立，数据用户则接收医疗数据 $m_i'$ ；否则输出“ $\bot$ ”.

3. 正确性分析与安全性证明

3.1 正确性分析

1）解密等式的正确性

数据用户通过计算 $m_i'||v_i' = {C_{i,4}} \oplus {H_4}(R_i')$ 对密文进行解密，其中 $R_i' = s{k_{j,1}}{C_{i,1}}$ ， $s{k_{j,1}}$ 是数据用户的私钥，由于 $s{k_{j,1}} = s{Q_j}$ ，则有

$R_i' = s{k_{j,1}}{C_{i,1}} = s{k_{j,1}}{a_i}P = s{Q_j}{a_i}P = {a_i}{Q_j}{P_{{\text{pub}}}} = {R_i} \text{，}$

即 $R_i' = {R_i}$ ，从而有

$m_i'||v_i' = {C_{i,4}} \oplus {H_4}(R_i') = {H_4}({R_i}) \oplus ({m_i}||{v_i}) \oplus {H_4}(R_i') = {m_i}||{v_i}{\kern 1pt} .$

因此，本文方案满足密文解密等式的正确性.

2）签名验证等式的正确性

数据用户通过判断等式 $X_{{\text{agg}}}^* = X_{{\text{agg}}}'$ 是否成立以验证聚合密文签名的合法性，其中 $X_{{\text{agg}}}' = \displaystyle\sum\limits_{i = 1}^n {v_i'P}$ ， ${v_i'} = {a_i}{U_i} +s{k_{i,2}}{V_i}$ ， $s{k_{i,2}} = {x_i} + sP{K_{i,2}}$ ，则有

$\begin{aligned} X_{{\text{agg}}}' = &\sum\limits_{i = 1}^n {v_i'P} = \sum\limits_{i = 1}^n {{a_i}{U_i}P + \sum\limits_{i = 1}^n {s{k_{i,2}}{V_i}P} } = \\ &\sum\limits_{i = 1}^n {{a_i}{U_i}P + \sum\limits_{i = 1}^n {{x_i}{V_i}P + \sum\limits_{i = 1}^n {sP{K_{i,2}}{V_i}P} } } ,\end{aligned}$

结合 ${C_{i,1}} = {a_i}P$ ， $P{K_{i,1}} = {x_i}P$ ， ${P_{{\text{pub}}}} = sP$ ，从而有

$X_{{\text{agg}}}' = \sum\limits_{i = 1}^n {{U_i}{C_{i,1}} + } \sum\limits_{i = 1}^n {{V_i}P{K_{i,1}} + } \sum\limits_{i = 1}^n {{V_i}P{K_{i,2}}{P_{{\text{pub}}}}}.$

进一步，由解密等式的正确性可知 $m_i'||v_i' = {m_i}||{v_i}$ ，则有

$\begin{aligned} {U_i} =\;& {H_2}({m_i},I{D_i},I{D_j},{R_i},P{K_{i,1}},P{K_{j,1}})= \\ & {H_2}(m_i',I{D_i},I{D_j},R_i',P{K_{i,1}},P{K_{j,1}}) =U_i',\\ {V_i} = & {H_3}({m_i},I{D_i},I{D_j},{R_i},P{K_{i,1}},P{K_{j,1}}) =\\ &{H_3}(m_i',I{D_i},I{D_j},R_i',P{K_{i,1}},P{K_{j,1}}) = V_i', \end{aligned}$

即 ${U_i} = U_i'$ ， ${V_i} = V_i'$ ，于是有

$\begin{aligned} X_{{\text{agg}}}' = \;& \sum\limits_{i = 1}^n {{U_i}{C_{i,1}} + } \sum\limits_{i = 1}^n {{V_i}P{K_{i,1}} + } \sum\limits_{i = 1}^n {{V_i}P{K_{i,2}}{P_{{\text{pub}}}}} = \\ &\sum\limits_{i = 1}^n {U_i^{'}{C_{i,1}} + } \sum\limits_{i = 1}^n {V_i'P{K_{i,1}} + } \sum\limits_{i = 1}^n {V_i'P{K_{i,2}}{P_{{\text{pub}}}}} = X_{{\text{agg}}}^* \text{，} \end{aligned}$

即 $X_{{\text{agg}}}^* = X_{{\text{agg}}}'$ 成立.因此，本文所提的新方案满足签名验证等式的正确性.

3）等值测试结果的正确性

对 $i \in \{ 1,2, \cdots ,n\}$ ， $j \in \{ 1,2, \cdots ,n\}$ ，测试者通过检查 ${C_{i,6}} = {H_6}(n||{C_{i,1}}|| \cdots ||{C_{i,5}}||{C_{i,2}}t{k_j}||{f_{i,0}}|| \cdots ||{f_{i,n - 1}})$ 是否成立来判断 $n$ 个医疗密文是否相同，其中 ${f_{i,0}}\; =\; {H_5} ({m_i}|| n), \cdots ,$ ${f_{i,n - 1}} = {H_5}({m_i}||n||{f_{i,0}}|| \cdots ||{f_{i,n - 2}})$ .假设所有参与密文等值测试的医疗密文全部相同，即 ${m_1} = {m_2} = \cdots = {m_n}$ ，则有

$\begin{aligned} {H}_{5}({m}_{1}||n)={H}_{5}({m}_{2}||n)=\; &\cdots ={H}_{5}({m}_{n}||n)，\\ {H}_{5}({m}_{1}|\left|n\right||{f}_{1,0})={H}_{5}({m}_{2}|\left|n\right|| & {f}_{1,0})= \cdots ={H}_{5}({m}_{n}|\left|n\right||{f}_{1,0})，\\ &\vdots\\ {H}_{5}({m}_{1}||n||{f}_{1,0}||\cdots ||{f}_{1,n-2})= & {H}_{5}({m}_{1}||n||{f}_{2,0}||\cdots ||{f}_{2,n-2})=\cdots=\\ {H}_{5}({m}_{n}||n||{f}_{n,0}||&\cdots ||{f}_{n,n-2})， \end{aligned}$

即对于所有的 $i,j \in \{ 1,2, \cdots ,n\}$ ， $k \in \{ 0,1, \cdots ,n - 1\}$ ，等式 ${f_{i,k}} = {f_{j,k}}$ 均成立.

由医疗数据签密及上传算法可知，数据拥有者在签密过程中设置

$f({N_i}) = {f_{i,0}} + {f_{i,1}}{N_i} + {f_{i,2}}N_i^2 + \cdots + {f_{i,n - 1}}N_i^{n - 1},$

由此可以得到方程组

$\left\{\begin{aligned} f({N}_{1})&={f}_{1,0}+{f}_{1,1}{N}_{1}+{f}_{1,2}{N}_{1}^{2}+\cdots +{f}_{1,n-1}{N}_{1}^{n-1}，\\ f({N}_{2})&={f}_{2,0}+{f}_{2,1}{N}_{2}+{f}_{2,2}{N}_{2}^{2}+\cdots +{f}_{2,n-1}{N}_{2}^{n-1}，\\ & \vdots \\ f({N}_{n})&={f}_{n,0}+{f}_{n,1}{N}_{n}+{f}_{n,2}{N}_{n}^{2}+\cdots +{f}_{n,n-1}{N}_{n}^{n-1}，\end{aligned}\right.$

结合 ${f_{i,k}} = {f_{j,k}}$ ，因此可将 ${f_{1,0}},{f_{1,1}}, \cdots ,{f_{1,n - 1}}$ 作为方程组的解，将随机数 ${N_i}$ 作为方程组的系数，则该方程组对应的矩阵为

${\boldsymbol{V}} = \left({\begin{array}{*{20}{c}} 1&{{N_1}}&{N_1^2}& \cdots &{N_1^{n - 1}} \\ 1&{{N_2}}&{N_2^2}& \cdots &{N_2^{n - 1}} \\ \vdots & \vdots & \vdots &{}& \vdots \\ 1&{{N_n}}&{N_n^2}& \cdots &{N_n^{n - 1}} \end{array}} \right) ,$

由范德蒙矩阵的性质可知其对应的行列式为 $\det ({\boldsymbol{V}}) = \displaystyle\prod\limits_{1 \leqslant i \lt j \leqslant n} {({N_i} - {N_j})}$ .

从数据拥有者签密过程可知， ${N_i}$ 是由 $n$ 个不同的数据拥有者在对医疗密文进行签密时分别选择的随机数，因此 $\det ({\boldsymbol{V}}) = 0$ 的概率仅为 ${[p(p - 1) \cdots (p - n + 1)]^{ - 1}}$ ，其中 $p$ 为群 $\mathbb{Z}_p^*$ 的阶.由克拉默法则可知当 $\det ({\boldsymbol{V}}) \ne 0$ 时，方程组有且仅有唯一解 ${f_{1,0}},{f_{1,1}}, \cdots ,{f_{1,n - 1}}$ ，于是有对于所有的 $i,j \in \{ 1,2, \cdots ,n\}$ ， $k \in \{ 0,1, \cdots ,n - 1\}$ ，等式 ${f_{i,k}} = {f_{j,k}}$ 均成立，与所有参与密文等值测试的医疗密文全部相同的假设相符.因此，本文新方案满足多密文等值测试结果的正确性.

3.2 安全性证明

本文提出的方案引入了基于身份的聚合签密体制，确保了本文方案在面对第1类敌手时医疗数据的机密性与签名的存在不可伪造性，对于机密性与不可伪造性的证明过程可以参考文献[23]方案.同时，本文方案满足面对第2类敌手适应性选择密文攻击下的单向性（one-way against adaptive chosen ciphertext attack, OW-CCA2），以下通过定理1证明本文方案满足OW-CCA2安全.

定理1. 假设CDH问题是难解的，则本文方案在随机预言模型下对第2类敌手是OW-CCA2安全的.

证明.假设 $\mathcal{C}$ 是能够解决CDH困难问题的人， ${\mathcal{A}_2}$ 代表第2类敌手. $\mathcal{C}$ 以 ${\mathcal{A}_2}$ 为子程序充当以下游戏中的挑战者，若 ${\mathcal{A}_2}$ 能以不可忽略的优势在概率多项式时间内的游戏中获胜，则 $\mathcal{C}$ 能够在概率多项式时间内解决CDH困难问题.

初始化阶段.CDH问题的输入为 $(P,aP,bP)$ ，其中 $a,b \in \mathbb{Z}_p^*$ ， $\mathcal{C}$ 的目标是给出CDH困难问题的解 $abP$ . $\mathcal{C}$ 选取阶为素数 $p$ 的循环群 $G$ ，计算 $P$ 为 $G$ 的生成元，随机选择 $a \in \mathbb{Z}_p^*$ 并计算 $P_{{\text{pub}}}' = aP$ .最后，输出系统参数 $params=\{p,P,{P}_{\text{pub}},G,{H}_{1},{H}_{2},{H}_{3},{H}_{4}，{H}_{5},{H}_{6}\}$ ，将 $a$ 秘密保存并发送 $params$ 给 ${\mathcal{A}_2}$ .

询问阶段1.为了响应 ${\mathcal{A}_2}$ 的询问， $\mathcal{C}$ 维持列表 ${L}_{1}， {L}_{2}，{L}_{3}，{L}_{4}，{L}_{5}，{L}_{6}，{L}_{\text{td}}$ 分别用于跟踪 ${\mathcal{A}_2}$ 的 ${H_1}$ Hash询问、 ${H_2}$ Hash询问、 ${H_3}$ Hash询问、 ${H_4}$ Hash询问、 ${H_5}$ Hash询问、 ${H_6}$ Hash询问、测试陷门询问. ${L_1}$ 同时用于跟踪密钥提取询问，开始时每个列表都为空.

1） ${H_1}$ Hash询问.当 $\mathcal{C}$ 收到 ${\mathcal{A}_2}$ 对 ${H_1}(I{D_i},{Q_i})$ 的查询，若 $I{D_i} \in \{ I{D_i}\} _{i = 1}^n$ ，则计算 $P{K_{i,1}} = {x_i}P$ ，其中 ${x_i}$ 是未知的， $\mathcal{C}$ 保存 $( \bot ,{Q_i},I{D_i})$ 到 ${L_1}$ ；若 $i \ne 1$ ， $\mathcal{C}$ 随机选择 ${x_i},P{K_{i,2}} \in \mathbb{Z}_p^*$ 并设置 $P{K_{i,1}} = {x_i}P$ ，将 $P{K_{i,2}} = {H_1}(I{D_i}||P{K_{i,1}})$ 返回给 ${\mathcal{A}_2}$ 并保存 $({x_i},P{K_{i,1}},P{K_{i,2}},I{D_i})$ 到 ${L_1}$ .

2） ${H_2}$ Hash询问.当 $\mathcal{C}$ 收到 ${\mathcal{A}_2}$ 对 $({m_i},I{D_i},I{D_j},{R_i}, P{K_{i,1}},P{K_{j,1}},{U_i})$ 的查询后， $\mathcal{C}$ 首先在 ${L_2}$ 查找是否已有 $({m_i}, I{D_i},I{D_j},{R_i},P{K_{i,1}},P{K_{j,1}},{U_i},{t_i},{t_i}P)$ ，若 ${L_2}$ 已有 $({m_i},I{D_i}, I{D_j},{R_i},P{K_{i,1}},P{K_{j,1}},{U_i},{t_i},{t_i}P)$ ，则发送 ${U_i}$ 给 ${\mathcal{A}_2}$ ；否则， $\mathcal{C}$ 选取 ${U_i} \in \mathbb{Z}_p^*$ ，将 $({U_i},{t_i},{t_i}P)$ 加入到 ${L_2}$ 中并输出 ${t_i}P$ .

3） ${H_3}$ Hash询问.当 $\mathcal{C}$ 收到 ${\mathcal{A}_2}$ 对 $({m_i},I{D_i},I{D_j},{R_i}, P{K_{i,1}}, P{K_{j,1}},{V_i})$ 的查询后， $\mathcal{C}$ 首先在 ${L_3}$ 查找是否已有 $({m_i}, I{D_i}, I{D_j},{R_i},P{K_{i,1}},P{K_{j,1}},{V_i},{w_i},{w_i}P)$ ，若 ${L_3}$ 已有 $({m_i},I{D_i}, I{D_j},{R_i},P{K_{i,1}},P{K_{j,1}},{V_i},{w_i},{w_i}P)$ ，则返回 ${V_i}$ 给 ${\mathcal{A}_2}$ ；否则， $\mathcal{C}$ 选取 ${V_i} \in \mathbb{Z}_p^*$ ，将 $({V_i},{w_i},{w_i}P)$ 加入到 ${L_3}$ 中并输出 ${w_i}P$ .

4） ${H_4}$ Hash询问.当 $\mathcal{C}$ 收到 ${\mathcal{A}_2}$ 对 $({R_i},{H_4}({R_i}))$ 的查询后，若在 ${L_4}$ 中已有 $({R_i},{H_4}({R_i}))$ 则返回 ${H_4}({R_i})$ 给 ${\mathcal{A}_2}$ ；否则， $\mathcal{C}$ 选取 ${H_4}({R_i}) \in {\{ 0,1\} ^{{l_0} + {l_1}}}$ ，并将 $({R_i},{H_4}({R_i}))$ 加入到 ${L_4}$ 中且输出 ${H_4}({R_i})$ .

5） ${H_5}$ Hash询问.当 $\mathcal{C}$ 收到 ${\mathcal{A}_2}$ 对 ${f_{i,d}}$ 的查询，其中 $d \in \{ 1,2, \cdot \cdot \cdot n\}$ ，若 ${L_5}$ 存在 $({m_i},n,{f_{i,0}}, \cdot \cdot \cdot ,{f_{i,d - 2}},{f_{i,d}})$ 则返回 ${f_{i,d}}$ 给 ${\mathcal{A}_2}$ ；否则， $\mathcal{C}$ 选取 ${f_{i,*}} \in \mathbb{Z}_p^*$ ，将 $({m_i},n,{f_{i,0}}, \cdot \cdot \cdot ,{f_{i,d - 2}},{f_{i,d}})$ 加入到 ${L_5}$ 中并输出 ${f_{i,d}}$ .

6） ${H_6}$ Hash询问.当 $\mathcal{C}$ 收到 ${\mathcal{A}_2}$ 对 ${C_{i,6}}$ 的查询后，若在 ${L_6}$ 中已有 ${C_{i,6}}$ 则返回 ${C_{i,6}}$ 给 ${\mathcal{A}_2}$ ；否则， $\mathcal{C}$ 选取 ${C_{i,6}} \in {\{ 0,1\} ^k}$ ，将相应元组加入到 ${L_6}$ 中并输出 ${C_{i,6}}$ .

7）密钥提取询问.当 $\mathcal{C}$ 收到 ${\mathcal{A}_2}$ 对 $I{D_i}$ 的私钥的查询后， $\mathcal{C}$ 首先查询 ${L_1}$ 中是否存在 $({x_i},P{K_{i,1}},P{K_{i,2}},I{D_i})$ ，若不存在则输出“ $\bot$ ”；否则返回 $({x_i},P{K_{i,1}},*,*)$ .如果 $I{D_i} \notin \{ I{D_i}\} _{i = 1}^n$ ， $\mathcal{C}$ 将 $I{D_i}$ 作为 ${H_1}$ Hash询问的输入，得到 ${Q_i} = {H_0} (I{D_i})$ ，并计算 $s{k_{i,1}} = a{Q_i}$ ， $s{k_{i,2}} = {x_i} + aP{K_{i,2}}$ ，返回 $(P{K_{i,1}}, s{k_{i,1}}, P{K_{i,2}},I{D_i})$ 给 ${\mathcal{A}_2}$ .

8）公钥替换询问.当 $\mathcal{C}$ 收到 ${\mathcal{A}_2}$ 对 $(I{D_i},P{K_{i,1}},P{K_{i,2}})$ 的查询后，若 $({x_i},P{K_{i,1}},P{K_{i,2}},I{D_i})$ 已存在于 ${L_1}$ 中，则 $\mathcal{C}$ 用列表L₁中的 $(P{K_{i,1}},P{K_{i,2}})$ 替换 $I{D_i}$ 原有的公钥 $(P{K_{i,1}}, P{K_{i,2}})$ ；否则， $\mathcal{C}$ 将 $({x_i},P{K_{i,1}}, P{K_{i,2}},I{D_i})$ 加入到列表 ${L_1}$ 中.

9）签密询问.当 $\mathcal{C}$ 收到 ${\mathcal{A}_2}$ 对 $({m_i},I{D_i},I{D_j})$ 的询问后， $\mathcal{C}$ 执行①~②操作：

① 若 $I{D_i} \ne I{D_l}$ 且 ${\mathcal{A}_2}$ 没有对 $I{D_i}$ 的公钥执行过替换询问， $\mathcal{C}$ 通过 ${H_1}$ Hash询问与密钥提取询问分别获取 ${x_i}$ 和 $s{k_{i,2}}$ ，并对 ${m_i}$ 进行签密；若 $I{D_i}$ 对应的公钥被替换过， $\mathcal{C}$ 首先通过 ${H_1}$ 询问分别获取 $(P{K_{i,1}},P{K_{i,2}})$ 和 $(P{K_{j,1}},P{K_{j,2}})$ ，然后 $\mathcal{C}$ 利用随机数 ${a_i} \in \mathbb{Z}_p^*$ 计算 ${C_{i,1}} = {a_i}P$ ， ${R_i} = {a_i}{Q_j}P_{{\text{pub}}}'$ ，并通过 ${H_2}$ ， ${H_3}$ ， ${H_4}$ Hash询问分别获取 ${U_i} = {H_2}({m_i}, I{D_i}, I{D_j}, {R_i},P{K_{i,1}},P{K_{j,1}})$ ， ${V_i} = {H_3}({m_i},I{D_i},I{D_j},{R_i},P{K_{i,1}},P{K_{j,1}})$ . ${H_4} ({R_i})$ ，通过密钥提取询问获取私钥 $s{k_{i,2}}$ ，计算 ${v_i} = \ {a_i}{U_i} + s{k_{i,2}}{V_i}$ ， ${C_{i,3}} = {v_i}P$ ， ${C_{i,4}} = {H_4}({R_i}) \oplus ({m_i}||{v_i})$ ，最后输出密文 ${\sigma _i} = ({C_{i,1}},{C_{i,2}},{C_{i,3}},P{K_{i,1}})$ 给 ${\mathcal{A}_2}$ .

② 若 $I{D_i} = I{D_l}$ ， $\mathcal{C}$ 首先通过 ${H_1}$ 询问分别获取 $(P{K_{i,1}}, P{K_{i,2}})$ 和 $(P{K_{j,1}},P{K_{j,2}})$ ，随机选择 $y,z \in \mathbb{Z}_p^*$ 并计算 ${C_{i,1}} = zaP$ .然后 $\mathcal{C}$ 通过 ${H_1}$ Hash询问和 ${H_4}$ Hash询问分别获取 $(I{D_j}, {a_j})$ 和 ${H_4}({R_j})$ ，并计算 ${R_j} = {a_j}{Q_j}P_{{\text{pub}}}'$ ， ${U_j} = {H_2}({m_l},I{D_l},I{D_j}, {R_j}, P{K_{l,1}},P{K_{j,1}})$ ，将 $({m_l},I{D_l},I{D_j},{R_j},P{K_{l,1}},P{K_{j,1}},{U_j})$ 加入到 ${L_2}$ 中，通过 ${H_3}$ Hash询问获取 $({m_l},I{D_l},I{D_j},{R_l},P{K_{l,1}}, P{K_{j,1}}, {V_l},{w_l},{w_l}P)$ ，并计算 ${v_l} = y{U_l}$ ， ${C_{l,3}} = z{v_l}P_{{\text{pub}}}' + {w_l}P{K_{l,1}}$ ， ${C_{i,4}} = {H_4} ({R_l}) \oplus ({m_l}||{v_l})$ ，最后输出 ${\sigma _l} = ({C_{l,1}},{C_{l,2}},{C_{l,3}},P{K_{l,1}})$ 给 ${\mathcal{A}_2}$ .

10）解签密询问.当 $\mathcal{C}$ 收到 ${\mathcal{A}_2}$ 对 $(C{T_1},C{T_2}, \cdot \cdot \cdot , C{T_n}, \{ I{D_i}\} _{i = 1}^n,I{D_j})$ 的查询后， $\mathcal{C}$ 执行①~②操作：

① 对 $(I{D_1},I{D_2}, \cdot \cdot \cdot ,I{D_n},I{D_j})$ 分别执行 ${H_1}$ Hash询问以获取 $({Q_1},{Q_2}, \cdot \cdot \cdot ,{Q_n},{Q_j})$ ， $(P{K_{1,1}},P{K_{2,1}}, \cdot \cdot \cdot ,P{K_{n,1}}, P{K_{j,1}})$ ，然后 $\mathcal{C}$ 执行聚合签名验证算法，若验证未通过，则输出“ $\bot$ ”后终止模拟；否则继续执行后续操作.

② 若 $I{D_j} \ne I{D_l}$ ， $\mathcal{C}$ 则通过 ${H_1}$ Hash询问获取 $(I{D_j}, {a_j})$ 并计算 ${R_j} = {a_j}{C_{j,1}}$ ，检查 ${L_2}$ 中是否存在元组 $(*,I{D_j},{R_i}, P{K_{i,1}},P{K_{j,1}},{U_i})$ ，若存在，则 $\mathcal{C}$ 利用Hash值 ${U_i}$ 对密文进行解密；否则 $\mathcal{C}$ 随机选取 ${U_i} \in \mathbb{Z}_p^*$ 并用 ${U_i}$ 对密文进行解密.若 $I{D_j} = I{D_l}$ ， $\mathcal{C}$ 则在 ${L_2}$ 中查询是否存在元组 $(*,I{D_j},*, P{K_{i,1}},P{K_{j,1}},{U_i})$ ，若存在则利用Hash值 ${U_i}$ 对密文进行解密；否则将随机选取 ${U_i} \in \mathbb{Z}_p^*$ 并用 ${U_i}$ 对密文进行解密.

11）测试陷门询问.当 $\mathcal{C}$ 收到 ${\mathcal{A}_2}$ 对 $t{k_j}$ 的询问后，若 ${L_1}$ 中存在元组 $({x_i},P{K_{i,1}},P{K_{i,2}},I{D_i})$ ， $\mathcal{C}$ 通过 ${H_1}$ 询问获取 $s{k_{i,3}} ={H_1}(I{D_i}||s)$ 并返回 $t{k_j} = s{k_{i,3}}$ 给 ${\mathcal{A}_2}$ ；否则， $\mathcal{C}$ 选取 $t{k_j} \in \mathbb{Z}_p^*$ 发送给 ${\mathcal{A}_2}$ ，并将 $({x_i},P{K_{i,1}},P{K_{i,2}},I{D_i})$ 加入到 ${L_{{\text{td}}}}$ 中.

挑战阶段. ${\mathcal{A}_2}$ 输出2个消息 $m_0^* = \{ m_{i,0}^*\} _{i = 1}^n$ ， $m_1^* = \{ m_{i,1}^*\} _{i = 1}^n$ ，并输出身份 $\{ ID_i^*\} _{i = 1}^n$ 和 $ID_j^*$ ； $\mathcal{C}$ 以 $ID_j^*$ 作为输入进行 ${H_1}$ Hash询问，若 ${L_1}$ 中不存在与 $ID_j^*$ 相关的元组，则 $\mathcal{C}$ 挑战失败；否则， $\mathcal{C}$ 从 ${L_1}$ 中获取 $\{ ID_i^*\} _{i = 1}^n$ 对应的公钥 $\{ PK_{i,1}^*,PK_{i,2}^*\} _{i = 1}^n$ ，随机选择 $\{ s{k_{i,2}} \in \mathbb{Z}_p^*\} _{i = 1}^n$ 并计算 $\{ {C_{i,1}} = s{k_{i,2}}cP\} _{i = 1}^n$ ；然后 $\mathcal{C}$ 从 ${L_2}$ ， ${L_3}$ 中获取 $\{ {U_i}\} _{i = 1}^n$ ， $\{ {V_i}\} _{i = 1}^n$ ，并计算 $v_i^* = {a_i}{U_i} + s{k_{i,2}}{V_i} = {t_i}C_{i,1}^* + s{k_{i,2}}{w_i}PK_{i,1}^*$ ，其中 ${t_i}$ ， ${w_i}$ ， $s{k_{i,2}}$ 分别来自 ${H_2}$ Hash询问、 ${H_3}$ Hash询问与对 $ID_j^*$ 的密钥提取询问；随后 $\mathcal{C}$ 随机选择 $\mu \in \{ 0,1\}$ 并计算 $C_{i,4}^* = {H_4}({R_i}) \oplus ({m_{i,\mu }}||v_i^*)$ ， $C_{i,3}^* = v_i^*P$ ，然后通过 ${H_1}$ Hash询问获取公钥 $\{ PK_{i,1}^*\} _{i = 1}^n$ 并输出 ${\sigma ^*} = (C_{1,1}^*, \cdot \cdot \cdot ,C_{n,1}^*,C_{1,3}^*, \cdot \cdot \cdot ,C_{n,3}^*,C_{1,4}^*, \cdot \cdot \cdot ,C_{n,4}^*,PK_{1,1}^*, \cdot \cdot \cdot ,PK_{n,1}^*)$ 给 ${\mathcal{A}_2}$ .

询问阶段2. ${\mathcal{A}_2}$ 执行与询问阶段1类似的多项式有界次适应性查询，但不允许对 $ID_i^*$ 和 $ID_j^*$ 对应的密文进行解签密查询.

猜测阶段. ${\mathcal{A}_2}$ 输出1个对 $\mu$ 的猜测 $\mu {'} \in \{ 0,1\}$ ，如果 $\mu {'} = \mu$ ，则 ${\mathcal{A}_2}$ 在以上游戏中获胜. $\mathcal{C}$ 在列表 ${L_4}$ 中选取 $({R_i},{H_4}({R_i}))$ 并以 ${R_i} = abP$ 作为CDH困难问题的解，这与目前公认的CDH问题的难解性相矛盾.因此本文方案在面对A₂敌手时满足选择OW-CCA2安全. 证毕.

4. 对比分析

4.1 功能特性分析

将本文提出的方案与文献[22−26]方案在功能特性方面进行比较，对比结果如表1所示.与文献[23−24]方案相比，本文方案引入等值测试功能，实现了对存储在云端的医疗密文的安全检索.与文献[22,25−26]方案相比，本文方案引入了聚合签密技术，确保了WBAN中医疗数据的机密性、完整性与可认证性，提高了多用户环境下对医疗数据进行签密与验证的效率.文献[25−26]方案采用的等值测试方法只能对2个密文进行比较，本文方案实现了同时对多个密文进行匹配，降低了测试者执行密文等值测试时的开销.此外，与文献[22−23,25−26]方案相比，本文方案达到了适应性选择密文攻击下的单向性，安全性有所提升.

表 1 功能特性比较

Table 1. Comparison of Functional Characteristics

方案	等值测试	多密文等值测试	签密	聚合签密	安全性
文献[22]方案	√	√	×	×	选择明文攻击下的单向性
文献[23]方案	×	×	√	√	选择密文攻击下的不可区分性
文献[24]方案	×	×	√	√	适应性选择密文攻击下的不可区分性
文献[25]方案	√	×	×	×	选择密文攻击下的单向性
文献[26]方案	√	×	√	×	选择密文攻击下的单向性
本文方案	√	√	√	√	适应性选择密文攻击下的单向性
注：“×”表示不具有某种特定功能；“√”表示具有某种特定功能.

下载: 导出CSV

| 显示表格

4.2 范德蒙矩阵求逆算法复杂度分析

本文所提新方案在执行多密文等值测试算法时，测试者通过对范德蒙矩阵求逆以提取出与数据拥有者明文相关的系数.其中，n阶范德蒙矩阵求逆算法的时间复杂度取决于所使用的求逆方法，已有许多学者提出了求解范德蒙矩阵逆矩阵的串行^[27-28]与并行^[29-30]方法，其时间复杂度如表2所示：

表 2 范德蒙矩阵求逆算法复杂度

Table 2. Complexity of Inversion for Vandermonde Matrix

方案	时间复杂度
文献[27]方案	$O({n^2})$
文献[28]方案	$O({n^2})$
文献[29]方案	$O((\log n))$
文献[30]方案	$O({(\log n)^2})$

下载: 导出CSV

| 显示表格

4.3 计算开销分析

将本文提出的方案在计算时间开销方面与文献[25−26]方案进行对比，假设参与密文等值测试的用户数量为n，使用i7-8750h，2.20 GHz处理器，8 GB内存和Win10操作系统在VC6.0环境下用PBC库分别对本文方案与对比方案进行了仿真模拟，对比结果如表3所示.其中标量乘法运算时间T_sm = 0.0004 ms，群元素乘法运算时间T_mul = 0.0314 ms，Hash函数运算时间T_h = 0.0001 ms，指数运算时间T_e = 6.9866 ms，双线性配对时间T_bp = 9.6231 ms，范德蒙矩阵求逆时间T_inv取决于矩阵求逆方法.从表3可以看出，由于本文方案中不存在计算开销较大的双线性配对运算，因此在密文生成阶段的计算时间开销相比于文献[25−26]的方案有显著降低.在数据解密及验证阶段，非聚合模式下的文献[25−26]方案需要所有数据用户逐一对数据进行验证并解密，而本文方案中的数据用户能够对聚合密文进行批量验证，验证效率相比于文献[25−26]的方案有所提高.

表 3 计算量比较

Table 3. Computation Amount Comparison ms

方案	密文生成时间	密文等值测试时间	数据解密及验证时间
文献[25]方案	$\begin{aligned} & n{T_{ {\text{mul} } } } + 3n{T_{ {\text{bp} } } } + 6n{T_{\text{h} } } + 5n{T_{\text{e} } } \\ &\quad( 63.8343n )\end{aligned}$	$\begin{aligned} & (n - 1)(4{T_{ {\text{bp} } } } + 2{T_{\text{h} } }) \\ &\quad ( 38.4926n - 38.4926) \end{aligned}$	$\begin{aligned} & 2n{T_{ {\text{bp} } } } + 4n{T_{\text{h} } } + 2n{T_{{\rm{e}} } }\\ &\quad (33.2198n) \end{aligned}$
文献[26]方案	$\begin{aligned} & 6n{T_{ {\text{sm} } } } + 2n{T_{ {\text{bp} } } } + 7n{T_{\text{h} } } + 2n{T_{\text{e} } } \\ &\quad( 33.2250n) \end{aligned}$	$\begin{aligned} & (n - 1)(4{T_{ {\text{bp} } } } + 2{T_{\text{h} } }) \\ &\quad( 38.4926n - 38.4926) \end{aligned}$	$\begin{aligned}& 3n{T_{ {\text{sm} } } } + n{T_{ {\text{mul} } } } + 5n{T_{ {\text{bp} } } } + 5n{T_{\text{h} } }\\ &\quad ( 48.1486n )\end{aligned}$
本文方案	$\begin{aligned} & 7n{T_{ {\text{sm} } } } + n{T_{ {\text{mul} } } } + n(n + 4){T_{\text{h} } }\\ &\quad ( 0.0346n + 0.0001{n^2})\end{aligned}$	$\begin{aligned} & n{T_{ {\text{sm} } } } + 2n{T_{\text{h} } } + {T_{ {\text{inv} } } }\\ &\quad ( {T_{ {\text{inv} } } } + 0.0006n) \end{aligned}$	$\begin{aligned} & n(2 + 4n){T_{ {\text{sm} } } } + {n^2}{T_{ {\text{mul} } } } + n(n + 4){T_{\text{h} } } \\ &\quad ( 0.0012n + 0.0331{n^2}) \end{aligned}$
注： $n$ 表示参与密文等值测试的用户数量； $T_{\text{sm}}$ 表示标量乘法运算时间； $T_{\text{mul}}$ 表示群元素乘法运算时间； $T_{\text{h}}$ 表示Hash函数运算时间； $T_{\text{e}}$ 表示指数运算时间； $T_{\text{bp}}$ 表示双线性配对时间； $T_{\text{inv}}$ 表示范德蒙矩阵求逆时间.

下载: 导出CSV

| 显示表格

此外，文献[25−26]方案仅支持将多个用户的密文两两一组进行匹配，其密文等值测试算法中双线性配对运算数量与参与测试的用户数量呈线性关系；而本文方案中，测试者可以同时对 $n$ 个用户的密文进行匹配，且测试过程中不存在双线性配对运算.本文方案的等值测试时间主要取决于测试者对范德蒙行列式求逆时所选取的算法，而在对范德蒙矩阵求逆的过程中仅进行标量加法与乘法等计算效率较高的运算^[28]，因此本文方案的密文等值测试效率同样高于文献[25−26]方案的效率.

5. 结束语

针对现有的WBAN密码方案在多用户环境下计算效率较低等问题，本文提出了支持多密文等值测试的WBAN聚合签密方案.该方案采用基于身份的密码体制，消除了传统公钥方案中证书管理的开销；引入多密文等值测试技术，实现了多数据用户对多医疗密文的同时检索；减少了多用户环境下密文等值测试的计算开销；利用聚合签密技术，提高了对多个用户的医疗数据进行签密的效率.本文方案满足医疗数据在传输过程中的机密性、完整性和可认证性，同时保证了数据拥有者签名的不可伪造性与测试陷门的单向性.与同类方案的对比分析结果表明，本文方案支持更多安全属性且计算开销更低.在未来的工作中，将尝试设计抗量子计算攻击的支持多密文等值测试的WBAN签密方案.

作者贡献声明：杨小东负责论文整体思路与实验方案的设计；周航负责设计方案与撰写论文；任宁宁负责方案仿真与效率分析；袁森负责搜集应用场景相关资料；王彩芬提出指导意见并修改论文.

图 1 基于搜索的索引选择框架

Figure 1. Search-based index selection framework

下载: 全尺寸图片幻灯片

图 2 最优索引配置的搜索

Figure 2. The search for the best index configuration

下载: 全尺寸图片幻灯片

图 3 索引选择问题相关方法分类

Figure 3. Classification of related ISP methods

下载: 全尺寸图片幻灯片

图 4 基于在线反馈控制回路的在线索引选择框架

Figure 4. Online index selection framework based on online feedback control loop

下载: 全尺寸图片幻灯片

表 1 代表性方法对比

Table 1 Comparison of Representative Methods

方法	搜索空间生成	配置评价	配置形成策略	导向指标	终止条件	索引间影响的考虑
AutoAdmin^[3]	规则+优化器	what-if+推导	自底向上	代价缩减	索引数量限额	隐式
DB2Advis^[11]	规则+优化器	what-if	自底向上	每单位存储收益	存储空间限额	有限
BEN_KNAP^[9]	规则+优化器	what-if+推导	自底向上	每单位存储收益	存储空间限额	显式
Extend^[23]	规则	what-if	自底向上	每单位存储代价缩减	存储空间限额	隐式
DROP^[39]	完整	外部代价模型	自顶向下	最低代价	最低代价	隐式
Relaxation^[36]	优化器	what-if+推导	自顶向下	松弛惩罚	调优时间限额	隐式
CoPhy^[52]	规则	what-if+推导	IP	IP	求解结束	隐式
DingIdxAdvis^[29]	规则+优化器	机器学习模型	自底向上	代价缩减	索引数量限额	隐式
NoDBA^[85]	单列索引	实际执行	自底向上（RL）	代价缩减	索引数量限额	隐式
LanIdxAdvis^[86]	规则	what-if	自底向上（RL）	代价缩减+ε-greedy	约束破坏	隐式
MCTS_IT^[88]	规则+优化器	what-if+推导	自底向上（RL）	代价缩减+ε-greedy	索引数量限额	隐式

下载: 导出CSV

参考文献(137)

[1]	Van Aken D, Pavlo A, Gordon G J, et al. Automatic database management system tuning through large-scale machine learning [C] //Proc of the 2017 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2017: 1009–1024
[2]	崔跃生,张勇,曾春,等. 数据库物理结构优化技术[J]. 软件学报,2013,24(4):761−780 Cui Yuesheng, Zhang Yong, Zeng Chun, et al. Database physical structure optimization technology[J]. Journal of Software, 2013, 24(4): 761−780 (in Chinese)
[3]	Chaudhuri S, Narasayya V R. An efficient cost-driven index selection tool for Microsoft SQL Server [C] //Proc of the 23rd Int Conf on Very Large Data Bases. San Francisco, CA: Morgan Kaufmann, 1997: 146–155
[4]	Lum V Y, Ling H. An optimization problem on the selection of secondary keys [C] //Proc of the 26th Annual Conf. New York: ACM, 1971: 349–356
[5]	Bayer R, McCreight E. Organization and maintenance of large ordered indices [C] //Proc of the 1970 ACM SIGFIDET Workshop on Data Description, Access and Control. New York: ACM, 1970: 107–141
[6]	Lum V Y. Multi-attribute retrieval with combined indexes[J]. Communications of the ACM, 1970, 13(11): 660−665 doi: 10.1145/362790.362794
[7]	Stonebraker M. The choice of partial inversions and combined indices[J]. International Journal of Computer & Information Sciences, 1974, 3(2): 167−188
[8]	Comer D. The difficulty of optimum index selection[J]. ACM Transactions on Database Systems, 1978, 3(4): 440−445 doi: 10.1145/320289.320296
[9]	Chaudhuri S, Datar M, Narasayya V. Index selection for databases: A hardness study and a principled heuristic solution[J]. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(11): 1313−1323 doi: 10.1109/TKDE.2004.75
[10]	Agrawal S, Chaudhuri S, Kollar L, et al. Database tuning advisor for Microsoft SQL Server 2005 [C] //Proc of the 30th Int Conf on Very Large Data Bases. San Francisco, CA: Morgan Kaufmann, 2004: 1110–1121
[11]	Valentin G, Zuliani M, Zilio D C, et al. DB2Advisor: An optimizer smart enough to recommend its own indexes [C] //Proc of the 16th Int Conf on Data Engineering. Los Alamitos, CA: IEEE Computer Society, 2000: 101–110
[12]	Zilio D C, Rao Jun, Lightstone S, et al. DB2 design advisor: Integrated automatic physical database design [C] //Proc of the 30th Int Conf on Very Large Data Bases. San Francisco, CA: Morgan Kaufmann, 2004: 1087–1097
[13]	Dageville B, Das D, Dias K, et al. Automatic SQL tuning in Oracle 10g [C] //Proc of the 30th Int Conf on Very Large Data bases. San Francisco, CA: Morgan Kaufmann, 2004: 1098–1109
[14]	李国良,周煊赫,孙佶,等. 基于机器学习的数据库技术综述[J]. 计算机学报,2020,43(11):2019−2049 Li Guoliang, Zhou Xuanhe, Sun Ji, et. al. A survey of machine learning based database techniques[J]. Chinese Journal of Computers, 2020, 43(11): 2019−2049 (in Chinese)
[15]	Sadri Z, Gruenwald L, Lead E. DRLindex: Deep reinforcement learning index advisor for a cluster database [C/OL] //Proc of the 24th Symp on Int Database Engineering & Applications. New York: ACM, 2020[2022-09-01].https://dl.acm.org/doi/10.1145/3410566.3410603
[16]	Das S, Grbic M, Ilic I, et al. Automatically indexing millions of databases in Microsoft Azure SQL database [C] //Proc of the 2019 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2019: 666–679
[17]	Idreos S, Kersten M L, Manegold S. Database cracking [C/OL] //Proc of the 3rd Biennial Conf on Innovative Data Systems Research. 2007[2020-09-01].https://www.cidrdb.org/cidr2007/
[18]	Graefe G, Kuno H. Adaptive indexing for relational keys [C] //Proc of the 26th Int Conf on Data Engineering. Piscataway, NJ: IEEE, 2010: 69–74
[19]	Graefe G, Idreos S, Kuno H, et al. Benchmarking adaptive indexing [G]//LNPSE 6417: Proc of the 2nd Technology Conf on Performance Evaluation and Benchmarking. Berlin: Springer, 2011: 169–184
[20]	Bruno N. Automated Physical Database Design and Tuning [M]. 1st ed. Boca Raton, FL: CRC Press, 2011
[21]	Kossmann J, Halfpap S, Jankrift M, et al. Magic mirror in my hand, which is the best in the land? An experimental evaluation of index selection algorithms[J]. Proceedings of the VLDB Endowment, 2020, 13(11): 2382−2395
[22]	Faerber F, Kemper A, Larson P Å, et al. Main memory database systems[J]. Foundations and Trends in Databases, 2017, 8(1/2): 1−130
[23]	Schlosser R, Kossmann J, Boissier M. Efficient scalable multi-attribute index selection using recursive strategies [C] //Proc of the 35th Int Conf on Data Engineering. Piscataway, NJ: IEEE, 2019: 1238–1249
[24]	Weikum G, Moenkeberg A, Hasse C, et al. Self-tuning database technology and information services: From wishful thinking to viable engineering [C] //Proc of the 28th Int Conf on Very Large Data Bases. San Francisco, CA: Morgan Kaufmann, 2002: 20–31
[25]	Chaudhuri S, Narasayya V. AutoAdmin “what-if” index analysis utility [C] //Proc of the 1998 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 1998: 367–378
[26]	Dias K, Ramacher M, Shaft U, et al. Automatic performance diagnosis and tuning in Oracle [C/OL] //Proc of the 2nd Biennial Conf on Innovative Data Systems Research. 2005[2022-09-01].https://www.cidrdb.org/cidr2005/
[27]	Chaudhuri S, Narasayya V, Weikum G. Database tuning using combinatorial search [M]//Encyclopedia of Database Systems. New York: Springer Press, 2018: 985–989
[28]	Bruno N, Nehme R V. Configuration-parametric query optimization for physical design tuning [C] //Proc of the 2008 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2008: 941−952
[29]	Ding Bailu, Das S, Marcus R, et al. AI meets AI: Leveraging query executions to improve index recommendations [C] //Proc of the 2019 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2019: 1241–1258
[30]	Schnaitter K, Polyzotis N, Getoor L. Index interactions in physical design tuning: Modeling, analysis, and applications[J]. Proceedings of the VLDB Endowment, 2009, 2(1): 1234−1245 doi: 10.14778/1687627.1687766
[31]	Finkelstein S, Schkolnick M, Tiberio P. Physical database design for relational databases[J]. ACM Transactions on Database Systems, 1988, 13(1): 91−128 doi: 10.1145/42201.42205
[32]	Siddiqui T, Jo S, Wu Wentao, et al. ISUM: Efficiently compressing large and complex workloads for scalable index tuning [C] //Proc of the 2022 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2022: 660–673
[33]	Sattler K-U, Schallehn E, Geist I. Autonomous query-driven index tuning [C] //Proc of the 8th Int Database Engineering and Applications Symp. Los Alamitos, CA: IEEE Computer Society, 2004: 439–448
[34]	Schnaitter K, Abiteboul S, Milo T, et al. On-line index selection for shifting workloads [C] //Proc of the 23rd Int Conf on Data Engineering Workshop. Piscataway, NJ: IEEE, 2007: 459–468
[35]	Jimenez I, Sanchez H, Tran Q T, et al. Kaizen: A semi-automatic index advisor [C] //Proc of the 2012 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2012: 685−688
[36]	Bruno N, Chaudhuri S. Automatic physical database tuning: A relaxation-based approach [C] //Proc of the 2005 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2005: 227–238
[37]	Bruno N, Chaudhuri S. To tune or not to tune? A lightweight physical design alerter [C] //Proc of the 32nd Int Conf on Very Large Data Bases. New York: ACM, 2006: 499–510
[38]	Hammer M, Chan A. Index selection in a self-adaptive data base management system [C] //Proc of the 1976 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 1976: 1–8
[39]	Whang K Y. Index selection in relational databases [C] //Proc of the 2nd Int Conf on Foundations of Data Organization. New York: Plenum, 1985: 487–500
[40]	Schkolnick M. The optimal selection of secondary indices for files[J]. Information Systems, 1975, 1(4): 141−146 doi: 10.1016/0306-4379(75)90003-4
[41]	Ip M Y L, Saxton L V, Raghavan V V. On the selection of an optimal set of indexes[J]. IEEE Transactions on Software Engineering, 1983, SE-9(2): 135−143 doi: 10.1109/TSE.1983.236458
[42]	Chaudhuri S, Narasayya V. Index merging [C] //Proc of the 15th Int Conf on Data Engineering. Piscataway, NJ: IEEE, 1999: 296–303
[43]	Nehme R, Bruno N. Automated partitioning design in parallel database systems [C] //Proc of the 2011 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2011: 1137–1148
[44]	Bruno N, Chaudhuri S. Physical design refinement: The ‘merge-reduce’ approach[J]. ACM Transactions on Database Systems, 2007, 32(4): 28−71 doi: 10.1145/1292609.1292618
[45]	Deep S, Gruenheid A, Koutris P, et al. Comprehensive and efficient workload compression[J]. Proceedings of the VLDB Endowment, 2020, 14(3): 418−430
[46]	Chaudhuri S, Gupta A K, Narasayya V. Compressing SQL workloads [C] //Proc of the 2002 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2002: 488–499
[47]	Kołaczkowski P. Compressing very large database workloads for continuous online index selection [G]//LNISA 5181: Proc of the 19th Int Conf on Database and Expert Systems Applications. Berlin: Springer, 2008: 791–799
[48]	Ma Lin, Van Aken D, Hefny A, et al. Query-based workload forecasting for self-driving database management systems [C] //Proc of the 2018 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2018: 631–645
[49]	Perera R M, Oetomo B, Rubinstein B I P, et al. DBA bandits: Self-driving index tuning under ad-hoc, analytical workloads with safety guarantees [C] //Proc of the 37th Int Conf on Data Engineering. Piscataway, NJ: IEEE, 2021: 600–611
[50]	Kossmann J, Kastius A, Schlosser R. SWIRL: Selection of workload-aware indexes using reinforcement learning [C/OL] //Proc of the 25th Int Conf on Extending Database Technology. 2022[2022-12-01].https://openproceedings.org/2022/conf/edbt/
[51]	Zhou Xuanhe, Liu Luyang, Li Wenbo, et al. AutoIndex: An incremental index management system for dynamic workloads [C] //Proc of the 38th Int Conf on Data Engineering. Piscataway, NJ: IEEE, 2022: 2196–2208
[52]	Dash D, Polyzotis N, Ailamaki A. CoPhy: A scalable, portable, and interactive index advisor for large workloads[J]. Proceedings of the VLDB Endowment, 2011, 4(6): 362−372 doi: 10.14778/1978665.1978668
[53]	Jain S, Howe B, Yan Jiaqi, et al. Query2Vec: An evaluation of NLP techniques for generalized workload analytics [J]. arXiv preprint, arXiv: 1801.05613, 2018
[54]	Chaudhuri S, Ganesan P, Narasayya V R. Primitives for workload summarization and implications for SQL [C] //Proc of the 29th Int Conf on Very Large Data Bases. San Diego, CA: Morgan Kaufmann, 2003: 730–741
[55]	Kul G, Luong D, Xie T, et al. Ettu: Analyzing query intents in corporate databases [C] //Proc of the 25th Int Conf Companion on World Wide Web. Geneva: Int World Wide Web Conf Steering Committee, 2016: 463−466
[56]	Whang K Y, Wiederhold, Sagalowicz. Separability—An approach to physical database design[J]. IEEE Transactions on Computers, 1984, 33(3): 209−222
[57]	Choenni S, Blanken H M, Chang T. On the selection of secondary indices in relational databases[J]. Data & Knowledge Engineering, 1993, 11(3): 207−233
[58]	Papadomanolakis S, Dash D, Ailamaki A. Efficient use of the query optimizer for automated physical design [C] //Proc of the 33rd Int Conf on Very Large Data Bases. New York: ACM, 2007: 1093–1104
[59]	Papadomanolakis S, Ailamaki A. An integer linear programming approach to database design [C] //Proc of the 23rd Int Conf on Data Engineering Workshop. Piscataway, NJ: IEEE, 2007: 442–449
[60]	Chaudhuri S, Narasayya V. Anytime algorithm of database tuning advisor for Microsoft SQL Server [EB/OL]. [2020–11–11]. https://www.microsoft.com/en-us/research/publication/anytime-algorithm-of-database-tuning-advisor-for-microsoft-sql-server/
[61]	Konig A C, Nabar S U. Scalable exploration of physical database design [C] //Proc of the 22nd Int Conf on Data Engineering. Piscataway, NJ: IEEE, 2006: 37–37
[62]	孟小峰,马超红,杨晨. 机器学习化数据库系统研究综述[J]. 计算机研究与发展,2019,56(9):1803−1820 doi: 10.7544/issn1000-1239.2019.20190446 Meng Xiaofeng, Ma Chaohong, Yang Chen. Survey on machine learning for database systems[J]. Journal of Computer Research and Development, 2019, 56(9): 1803−1820 (in Chinese) doi: 10.7544/issn1000-1239.2019.20190446
[63]	Leis V, Gubichev A, Mirchev A, et al. How good are query optimizers, really?[J]. Proceedings of the VLDB Endowment, 2015, 9(3): 204−215 doi: 10.14778/2850583.2850594
[64]	Wang Xiaoying, Qu Changbo, Wu Weiyuan, et al. Are we ready for learned cardinality estimation?[J]. Proceedings of the VLDB Endowment, 2021, 14(9): 1640−1654 doi: 10.14778/3461535.3461552
[65]	Kipf A, Kipf T, Radke B, et al. Learned cardinalities: Estimating correlated joins with deep learning [C/OL] //Proc of the 9th Biennial Conf on Innovative Data Systems Research. 2019[2022-09-01]. https://www.cidrdb.org/cidr2019/
[66]	Sun Ji, Li Guoliang. An end-to-end learning-based cost estimator[J]. Proceedings of the VLDB Endowment, 2019, 13(3): 307−319 doi: 10.14778/3368289.3368296
[67]	Siddiqui T, Wu Wentao, Narasayya V, et al. DISTILL: Low-overhead data-driven techniques for filtering and costing indexes for scalable index tuning[J]. Proceedings of the VLDB Endowment, 2022, 15(10): 2019−2031 doi: 10.14778/3547305.3547309
[68]	Gao Jianling, Zhao Nan, Wang Ning, et al. Automatic index selection with learned cost estimator[J]. Information Sciences, 2022, 612: 706−723 doi: 10.1016/j.ins.2022.08.051
[69]	Yuan Haitao, Li Guoliang, Feng Ling, et al. Automatic view generation with deep learning and reinforcement learning [C] //Proc of the 36th Int Conf on Data Engineering. Piscataway, NJ: IEEE, 2020: 1501–1512
[70]	Kimura H, Huo G, Rasin A, et al. CORADD: Correlation aware database designer for materialized views and indexes[J]. Proceedings of the VLDB Endowment, 2010, 3(1/2): 1103−1113
[71]	Bruno N, Chaudhuri S. An online approach to physical design tuning [C] //Proc of the 23rd Int Conf on Data Engineering. Piscataway, NJ: IEEE, 2007: 826–835
[72]	Bruno N, Chaudhuri S. Constrained physical design tuning[J]. Proceedings of the VLDB Endowment, 2008, 1(1): 4−15 doi: 10.14778/1453856.1453863
[73]	Caprara A, Fischetti M, Maio D. Exact and approximate algorithms for the index selection problem in physical database design[J]. IEEE Transactions on Knowledge and Data Engineering, 1995, 7(6): 955−967 doi: 10.1109/69.476501
[74]	Frank M R, Omiecinski E, Navathe S B. Adaptive and automated index selection in RDBMS [G]//LNCS 580: Proc of the 3rd Int Conf on Extending Database Technology. Berlin: Springer, 1992: 277–292
[75]	Gurobi. Gurobi optimizer [EB/OL]. [2022-09-01].https://www.gurobi.com
[76]	IBM. IBM ILOG CPLEX optimizer [EB/OL]. [2022-09-01].https://www.ibm.com/products/ilog-cplex-optimization-studio/cplex-optimizer
[77]	Li Guoliang, Zhou Xuanhe, Li Shifu, et al. QTune: A query-aware database tuning system with deep reinforcement learning[J]. Proceedings of the VLDB Endowment, 2019, 12(12): 2118−2130 doi: 10.14778/3352063.3352129
[78]	Zhang Ji, Zhou Ke, Li Guoliang, et al. CDBTune+: An efficient deep reinforcement learning-based automatic cloud database tuning system[J]. The VLDB Journal, 2021, 30(6): 959−987 doi: 10.1007/s00778-021-00670-9
[79]	Hilprecht B, Binnig C, Roehm U. Learning a partitioning advisor with deep reinforcement learning [J]. arXiv preprint, arXiv: 1904.01279, 2019
[80]	Heitz J, Stockinger K. Join query optimization with deep reinforcement learning algorithms [J]. arXiv preprint, arXiv: 1911.11689, 2019
[81]	Krishnan S, Yang Z, Goldberg K, et al. Learning to optimize join queries with deep reinforcement learning [J]. arXiv preprint, arXiv: 1808.03196, 2019
[82]	Marcus R, Papaemmanouil O. Deep reinforcement learning for join order enumeration [C/OL] //Proc of the 1st Int Workshop on Exploiting Artificial Intelligence Techniques for Data Management. New York: ACM, 2018[2020-09-01].https://dl.acm.org/doi/10.1145/3211954.3211957
[83]	Liang Xi, Elmore A J, Krishnan S. Opportunistic view materialization with deep reinforcement learning [J]. arXiv preprint, arXiv: 1903.01363, 2019
[84]	Basu D, Lin Qian, Chen Weidong, et al. Regularized cost-model oblivious database tuning with reinforcement learning [J]. Transactions on Large-Scale Data-and Knowledge-Centered Systems XXVIII. Berlin: Springer, 2016: 96–132
[85]	Sharma A, Schuhknecht F M, Dittrich J. The case for automatic database administration using deep reinforcement learning [J]. arXiv preprint, arXiv: 1801.05643, 2018
[86]	Lan Hai, Bao Zhifeng, Peng Yuwei. An index advisor using deep reinforcement learning [C] //Proc of the 29th ACM Int Conf on Information and Knowledge Management. New York: ACM, 2020: 2105–2108
[87]	Lai Sichao, Wu Xiaoying, Wang Senyang, et al. Learning an index advisor with deep reinforcement learning [G] // LNISA 12859: Proc of the 5th APWeb and WAIM Joint Int Conf on Web and Big Data. Berlin: Springer, 2021: 178–185
[88]	Wu Wentao, Wang Chi, Siddiqui T, et al. Budget-aware index tuning with reinforcement learning [C] //Proc of the 2022 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2022: 1528–1541
[89]	Schaarschmidt M, Kuhnle A, Ellis B, et al. LIFT: Reinforcement learning in computer systems by learning from demonstrations [J]. arXiv preprint, arXiv: 1808.07903, 2018
[90]	Sutton R S, Barto A G. Reinforcement Learning: An Introduction [M]. 2nd ed. Cambridge, MA: MIT Press, 2018
[91]	Licks G P, Couto J C, Miehe P F, et al. SmartIX: A database indexing agent based on reinforcement learning[J]. Applied Intelligence, 2020, 50(8): 2575−2588 doi: 10.1007/s10489-020-01674-8
[92]	Mnih V, Kavukcuoglu K, Silver D, et al. Playing Atari with deep reinforcement learning [J]. arXiv preprint, arXiv: 1312.5602, 2013
[93]	TPC. TPC-H benchmark [EB/OL]. [2022-09-01].https://www.tpc.org/tpch
[94]	TPC. TPC-DS benchmark [EB/OL]. [2022-09-01].https://www.tpc.org/tpcds
[95]	Schnaitter K, Polyzotis N. A benchmark for online index selection [C] //Proc of the 25th Int Conf on Data Engineering. Piscataway, NJ: IEEE, 2009: 1701–1708
[96]	Stillger M, Lohman G M, Markl V, et al. LEO-DB2’s learning optimizer [C] //Proc of the 27th Int Conf on Very Large Data Bases. San Francisco, CA: Morgan Kaufmann, 2001: 19–28
[97]	Holze M, Ritter N. Towards workload shift detection and prediction for autonomic databases [C]//Proc of the 1st ACM PhD Workshop in CIKM. New York: ACM, 2007: 109–116
[98]	Schnaitter K, Polyzotis N. Semi-automatic index tuning: Keeping DBAs in the loop[J]. Proceedings of the VLDB Endowment, 2012, 5(5): 478−489 doi: 10.14778/2140436.2140444
[99]	Bruno N, Chaudhuri S. Interactive physical design tuning [C] //Proc of the 26th Int Conf on Data Engineering. Piscataway, NJ: IEEE, 2010: 1161–1164
[100]	Chaudhuri S, Konig A C, Narasayya V. SQLCM: A continuous monitoring framework for relational database engines [C] //Proc of the 20th Int Conf on Data Engineering. Piscataway, NJ: IEEE, 2004: 473–484
[101]	Thiem A, Sattler K U. An integrated approach to performance monitoring for autonomous tuning [C] //Proc of the 25th Int Conf on Data Engineering. Piscataway, NJ: IEEE, 2009: 1671–1678
[102]	Calzarossa M, Serazzi G. Workload characterization: A survey[J]. Proceedings of the IEEE, 1993, 81(8): 1136−1150 doi: 10.1109/5.236191
[103]	Yu P S, Chen M S, Heiss H U, et al. On workload characterization of relational database environments[J]. IEEE Transactions on Software Engineering, 1992, 18(4): 347−355 doi: 10.1109/32.129222
[104]	Elnaffar S, Martin P, Schiefer B, et al. Is it DSS or OLTP: Automatically identifying DBMS workloads[J]. Journal of Intelligent Information Systems, 2008, 30(3): 249−271 doi: 10.1007/s10844-006-0036-6
[105]	Wasserman T J, Martin P, Skillicorn D B, et al. Developing a characterization of business intelligence workloads for sizing new database systems [C] //Proc of the 7th ACM Int Workshop on Data Warehousing and OLAP. New York: ACM, 2004: 7–13
[106]	Jain S, Yan Jiaqi, Cruanes T, et al. Database-agnostic workload management [C/OL] //Proc of the 9th Biennial Conf on Innovative Data Systems Research. 2019[2022-09-01]. www. cidrdb. org/cidr2019/
[107]	Paul D, Cao Jie, Li Feifei, et al. Database workload characterization with query plan encoders[J]. Proceedings of the VLDB Endowment, 2021, 15(4): 923−935 doi: 10.14778/3503585.3503600
[108]	Elnaffar S S, Martin P. An intelligent framework for predicting shifts in the workloads of autonomic database management systems [C/OL] //Proc of the 2004 IEEE Int Conf on Advances in Intelligent Systems–Theory and Applications. Los Alamitos, CA. IEEE Computer Society, 2004[2022-09-01].https://research.cs.queensu.ca/home/cords2/aista04.pdf
[109]	Holze M, Ritter N. Autonomic databases: Detection of workload shifts with n-Gram-Models [G] // LNISA 5207: Proc of the 12th East European Conf on Advances in Databases and Information Systems. Berlin: Springer, 2008: 127–142
[110]	Huang Xiangji, Peng Fuchun, An Aijun, et al. Dynamic web log session identification with statistical language models[J]. Journal of the American Society for Information Science and Technology, 2004, 55(14): 1290−1303 doi: 10.1002/asi.20084
[111]	Yao Qingsong, An Aijun, Huang Xiangqi. Finding and analyzing database user sessions [G] // LNISA 3453: Proc of the 10th Int Conf on Database Systems for Advanced Applications. Berlin: Springer, 2005: 851–862
[112]	Luhring M, Sattler K U, Schmidt K, et al. Autonomous management of soft indexes [C] //Proc of the 23rd Int Conf on Data Engineering Workshop. Piscataway, NJ: IEEE, 2007: 450–458
[113]	Sattler K U, Geist I, Schallehn E. QUIET: Continuous query-driven index tuning [C] //Proc of the 29th Int Conf on Very Large Data Bases. San Francisco, CA: Morgan Kaufmann, 2003: 1129–1132
[114]	Deerwester S, Dumais S T, Furnas G W, et al. Indexing by latent semantic analysis[J]. Journal of the American Society for Information Science, 1990, 41(6): 391−407 doi: 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
[115]	Agrawal S, Chu E, Narasayya V. Automatic physical design tuning: workload as a sequence [C] //Proc of the 2006 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2006: 683–694
[116]	Kimura H, Coffrin C, Rasin A, et al. Optimizing index deployment order for evolving OLAP [C] //Proc of the 15th Int Conf on Extending Database Technology. New York: ACM, 2012: 276–287
[117]	Agrawal S, Chaudhuri S, Narasayya V R. Automated selection of materialized views and indexes in SQL databases [C] //Proc of the 26th Int Conf on Very Large Data Bases. San Francisco, CA: Morgan Kaufmann, 2000: 496–505
[118]	Microsoft. AutoAdmin project [EB/OL]. [2022-09-01].https://www.microsoft.com/research/project/autoadmin
[119]	IBM. db2advis: Tools for designing indexes [EB/OL]. [2022-09-01].https://www.ibm.com/docs/en/db2/11.5?topic=indexes-tools-designing
[120]	Percona. pt-index-usage: Percona toolkit documentation [EB/OL]. [2022-09-01].https://docs.percona.com/percona-toolkit/pt-index-usage.html
[121]	ankane. dexter: The automatic indexer for Postgres [EB/OL]. [2022-09-01].https://github.com/ankane/dexter
[122]	powa-team. PoWA: PostgreSQL workload analyzer [EB/OL]. [2022-09-01].https://github.com/powa-team/powa
[123]	Duboce Labs, Inc. Index advisor (in-app in pganalyze) [EB/OL]. [2022-09-01].https://pganalyze.com/docs/index-advisor
[124]	EnterpriseDB. EDB Postgres advanced server guide: Index advisor [EB/OL]. [2022-09-01].https://www.enterprisedb.com/docs/epas/latest/epas_guide/03_database_administration/02_index_advisor
[125]	EverSQL. Online PostgreSQL/MySQL index advisor: Automatic indexing recommendations [EB/OL]. [2022-09-01].https://www.eversql.com/index-advisor-automatic-indexing-recommendations/
[126]	Oracle. Oracle database performance tuning guide: Release 21 [EB/OL]. [2022-09-01]. https://docs.oracle.com/en/database/oracle/oracle-database/21/tdppt/index.html
[127]	Microsoft. Automatic tuning [EB/OL]. [2022-09-01]. https://learn.microsoft.com/sql/relational-databases/automatic-tuning/automatic-tuning
[128]	openGauss. Index advisor: Index recommendation [EB/OL]. [2023-01-25]. https://docs.opengauss.org/en/docs/3.1.0/docs/Developerguide/index-advisor-index-recommendation.html
[129]	OtterTune. Index recommendations [EB/OL]. [2023-01-25].https://docs.ottertune.com/documentation/database-instance-dashboard-and-recommendations/recommendations/index-recommendations
[130]	Alibaba Cloud. Database autonomy service [EB/OL]. [2023-01-25].https://www.alibabacloud.com/help/en/database-autonomy-service
[131]	Microsoft. Azure SQL — Family of SQL cloud databases [DB/OL]. [2022-09-01].https://azure.microsoft.com/en-us/products/azure-sql/
[132]	Oracle. Oracle Exadata [DB/OL]. [2022-09-01].https://www.oracle.com/engineered-systems/exadata
[133]	IBM. IBM Cloud database solutions [DB/OL]. [2022-09-01].https://www.ibm.com/cloud/databases
[134]	Kraska T, Alizadeh M, Beutel A, et al. SageDB: A learned database system [C/OL] //Proc of the 9th Biennial Conf on Innovative Data Systems Research. 2019[2022-09-01]. http://www.cidrdb.org/cidr2019
[135]	Marcus R, Negi P, Mao Hongzi, et al. Bao: Making learned query optimization practical[J]. ACM SIGMOD Record, 2022, 51(1): 6−13 doi: 10.1145/3542700.3542703
[136]	Chockchowwat S. Tuning hierarchical learned indexes on disk and beyond [C] //Proc of the 2022 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2022: 2515–2517
[137]	Abu-Libdeh H, Altınbüken D, Beutel A, et al. Learned indexes for a Google-scale disk-based database [J]. arXiv preprint, arXiv: 2012.12501, 2020