Processing math: 0%
  • 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

多模态深度伪造及检测技术综述

李泽宇, 张旭鸿, 蒲誉文, 伍一鸣, 纪守领

李泽宇, 张旭鸿, 蒲誉文, 伍一鸣, 纪守领. 多模态深度伪造及检测技术综述[J]. 计算机研究与发展, 2023, 60(6): 1396-1416. DOI: 10.7544/issn1000-1239.202111119
引用本文: 李泽宇, 张旭鸿, 蒲誉文, 伍一鸣, 纪守领. 多模态深度伪造及检测技术综述[J]. 计算机研究与发展, 2023, 60(6): 1396-1416. DOI: 10.7544/issn1000-1239.202111119
Li Zeyu, Zhang Xuhong, Pu Yuwen, Wu Yiming, Ji Shouling. A Survey on Multimodal Deepfake and Detection Techniques[J]. Journal of Computer Research and Development, 2023, 60(6): 1396-1416. DOI: 10.7544/issn1000-1239.202111119
Citation: Li Zeyu, Zhang Xuhong, Pu Yuwen, Wu Yiming, Ji Shouling. A Survey on Multimodal Deepfake and Detection Techniques[J]. Journal of Computer Research and Development, 2023, 60(6): 1396-1416. DOI: 10.7544/issn1000-1239.202111119
李泽宇, 张旭鸿, 蒲誉文, 伍一鸣, 纪守领. 多模态深度伪造及检测技术综述[J]. 计算机研究与发展, 2023, 60(6): 1396-1416. CSTR: 32373.14.issn1000-1239.202111119
引用本文: 李泽宇, 张旭鸿, 蒲誉文, 伍一鸣, 纪守领. 多模态深度伪造及检测技术综述[J]. 计算机研究与发展, 2023, 60(6): 1396-1416. CSTR: 32373.14.issn1000-1239.202111119
Li Zeyu, Zhang Xuhong, Pu Yuwen, Wu Yiming, Ji Shouling. A Survey on Multimodal Deepfake and Detection Techniques[J]. Journal of Computer Research and Development, 2023, 60(6): 1396-1416. CSTR: 32373.14.issn1000-1239.202111119
Citation: Li Zeyu, Zhang Xuhong, Pu Yuwen, Wu Yiming, Ji Shouling. A Survey on Multimodal Deepfake and Detection Techniques[J]. Journal of Computer Research and Development, 2023, 60(6): 1396-1416. CSTR: 32373.14.issn1000-1239.202111119

多模态深度伪造及检测技术综述

详细信息
    作者简介:

    李泽宇: 1999年生. 硕士研究生. 主要研究方向为人工智能安全和计算机视觉

    张旭鸿: 1988年生. 博士,副研究员. CCF会员. 主要研究方向为人工智能安全、数据驱动软件与系统安全、大数据系统与分析

    蒲誉文: 1993年生. 博士. 主要研究方向为隐私计算和人工智能安全

    伍一鸣: 1996年生. 博士研究生. 主要研究方向为数据驱动安全、黑灰产业挖掘、网络犯罪研究

    纪守领: 1986年生. 博士,教授,博士生导师. CCF高级会员. 主要研究方向为数据驱动安全和隐私、人工智能安全、大数据挖掘与分析

    通讯作者:

    纪守领(sji@zju.edu.cn

  • 中图分类号: TP391

A Survey on Multimodal Deepfake and Detection Techniques

More Information
    Author Bio:

    Li Zeyu: born in 1999. Master candidate. His main research interests include AI security and computer vision

    Zhang Xuhong: born in 1988. PhD, associate professor. Member of CCF. His main research interests include AI security, data driven software and system security, big data systems and analytics

    Pu Yuwen: born in 1993. PhD. His main research interests include privacy computing and AI security

    Wu Yiming: born in 1996. PhD candidate. Her main research interests include data driven security, black industry mining, and cybercrime research

    Ji Shouling: born in 1986. PhD, professor, PhD supervisor. Senior member of CCF. His main research interests include data-driven security and privacy, AI security, big data mining and analytics

  • 摘要:

    随着各种深度学习生成模型在各领域的应用,生成的多媒体文件的真伪越来越难以辨别,深度伪造技术也因此得以诞生和发展. 深度伪造技术通过深度学习相关技术能够篡改视频或者图片中的人脸身份信息、表情和肢体动作,以及生成特定人物的虚假语音. 自2018年Deepfakes技术在社交网络上掀起换脸热潮开始,大量的深度伪造方法被提出,并展现了其在教育、娱乐等领域的潜在应用. 但同时深度伪造技术在社会舆论、司法刑侦等方面产生的负面影响也不容忽视. 因此有越来越多的对抗手段被提出用于防止深度伪造被不法分子所应用,如深度伪造的检测和水印. 首先,针对不同模态类型的深度伪造技术以及相应的检测技术进行了回顾和总结,并根据研究目的和研究方法对现有的研究进行了分析和归类;其次,总结了近年研究中广泛使用的视频和音频数据集;最后,探讨了该领域未来发展面临的机遇和挑战.

    Abstract:

    With the application of all kinds of deep learning generation models in various fields, the authenticity of their generated multimedia files has become increasingly difficult to distinguish, therefore, deepfake technology has been born and developed. Utilizing deep learning related techniques, the deepfake technology can tamper with the facial identity information, expressions, and body movements in videos or pictures, and generate fake voice of a specific person. Since 2018, when Deepfakes sparked a wave of face swapping on social networks, a large number of deepfake methods have been proposed, which had demonstrated their potential applications in education, entertainment, and some other fields. But at the same time, the negative impact of deepfake on public opinion, judicial and criminal investigations, etc. can not be ignored. As a consequence, more and more countermeasures have been proposed to prevent deepfake from being utilized by the criminals, such as the detection of deepfake and watermark. Firstly, a review and summary of deepfake technologies of different modal types and corresponding detection technologies are carried out, and the existing researches are analyzed and classified according to the research purpose and research method. Secondly, the video and audio datasets widely used in the recent studies are summarized. Finally, the opportunities and challenges for future development in this field are discussed.

  • 在大数据时代背景下,利用数据进行分析和决策成为了不同领域的重要工作,各种人工智能算法为此提供了不可忽视的助力. 然而,“没有免费午餐”定理表明,不存在一个在所有问题上具备优越性能的“最优”算法[1]. 如何从大量可行方法中为给定任务选择满足需求的合适算法成为了工程中的关键问题,即算法选择问题[2]. 算法选择问题可以通过人工选择方法和机器学习方法解决. 人工选择方法包括实验试错法和专家知识法. 前者通过大量的实验获得算法性能,分析结果并选择算法;后者则按照领域专家的指导进行算法选择. 然而,实验试错法成本较高,专家知识法基于专家的经验知识,存在人为偏差且获取较为困难. 机器学习方法提取问题的抽象特征,设计模型实现自动化的算法选择,包括基于元学习的方法和基于协同过滤的方法[3-4]. 其中,基于元学习的方法研究基础较为深厚,具有灵活性高、适用范围广、计算成本低等优点,成为了算法选择的主要方法[5-7].

    基于元学习的算法选择利用问题的元特征(抽象特征)和历史学习经验训练元算法,构建问题元特征到合适算法的映射. 元特征和元算法是其中的关键内容,而现有研究在元特征和元算法的选择和使用上存在一些缺点:在元特征方面,多数研究选择固定的元特征[8-10],可扩展性较弱,且难以充分利用元特征的互补性;在元算法方面,研究或采用单个元算法,泛化性能不足,或采用同构集成元算法[11-14],没有利用不同元算法的优势,导致多样性不足.

    集成学习的相关研究表明,利用准确且多样的基学习器进行集成可以获得更精确、更稳定的学习性能[15-16]. 选择性集成方法根据基学习器的准确性和多样性从多个基学习器中选取部分进行集成,可以减少集成算法的存储空间和预测时间并提升其泛化性能,是集成学习的热点方向之一[17-19]. 演化算法有着鲁棒性强、适用性高、可以全局搜索等特性,在选择性集成研究中得到了广泛应用[20-22].

    蚁狮优化(ant lion optimizer,ALO)[23]算法是近年来出现的演化算法,它模拟蚁狮捕食蚂蚁的行为机制对最优解进行搜索,具有寻优性能强、参数设置少、易于实现等优点. 研究人员将ALO扩展应用于多目标优化问题,在污水处理、工程设计等领域取得了良好的应用效果[24-26].

    为了提升基于元学习的算法选择性能,提出基于多目标混合蚁狮优化的选择性集成算法选择方法(selective ensemble algorithm selection method based on multi-objective hybrid ant lion optimizer,SAMO),该方法包括一种算法选择模型和一种多目标混合蚁狮优化算法. 算法选择模型以集成元算法的准确性和多样性作为优化目标,同时选择元特征和构建选择性集成元算法. 多目标混合蚁狮优化算法对模型进行优化,通过离散型编码选择元特征子集,使用连续型编码构建集成元算法,在此基础上应用增强游走策略和偏好精英选择机制提升寻优性能.

    本文的主要贡献包括3个方面:

    1)提出一种算法选择模型,同时从元特征和元算法2个方面提升基于元学习的算法选择性能. 模型通过元特征选择寻找互补性较强的元特征子集,通过选择性集成构建异构集成元算法.

    2)提出一种多目标混合蚁狮优化算法,求解算法选择模型. 通过混合编码机制使个体同时进行元特征选择和选择性集成;采用增强游走策略,在个体搜索过程中引入随机性,增加算法的种群多样性;应用偏好精英选择机制,以不同的优化目标为偏好选择精英个体,增强算法的寻优能力.

    3)使用260个数据集、150种元特征和9种候选算法构建分类算法选择问题,在此基础上将所提方法与8种典型方法在4种性能指标上进行对比分析,结果证明了所提方法的有效性和优越性.

    基于元学习的算法选择框架[27-28]图1所示. 框架首先提取历史数据集的元特征,并通过性能测度评估确定历史数据集的最优算法,在过程中采用不同的性能测度方法可能获得不同的最优算法;然后以元特征为属性,以最优算法为标签组成元实例构建元数据集,并在元数据集上训练元算法;最后,提取新数据集的元特征输入已训练的元算法,元算法即对其最优算法进行预测输出.

    图  1  基于元学习的算法选择框架
    Figure  1.  Framework of algorithm selection based on meta-learning

    根据反映数据集特性的不同,元特征分为4类:

    1)基于统计和信息论的元特征采用统计学和信息论的方法提取数据集的抽象特征[27]. 该类型元特征应用广泛且易于提取,然而其难以较好地刻画数据集特性.

    2)基于决策树的元特征在数据集上训练决策树,使用决策树的结构信息作为元特征[28]. 其能够发掘数据集的整体特性,但提取成本高昂.

    3)基于基准的元特征将基准算法在数据集上的性能指标值作为元特征[29],其提取方法较为简单,然而对基准算法的依赖度较高.

    4)基于问题复杂度的元特征从类重叠、类不平衡、数据稀疏度等方面对数据集的几何复杂度进行量化评估[30]. 该类型元特征反映了求解问题的困难程度,应用效果较好,但是计算复杂度较大.

    按照训练过程的技术特点,将元算法分为4类:

    1)基于规则的元算法生成候选算法的选择规则,通过将元特征与规则进行匹配,为新数据集选择合适算法,该方法具备较强的可解释性,但泛化性能较差. 针对负荷预测问题,文献[8]提取18种数据集元特征,训练分类回归树(classification and regression tree,CART)选择最优算法.

    2)基于距离的元算法通过元特征测度距离,评估数据集之间的相似度,使用相似数据集的最优算法作为新数据集的合适算法. k最近邻(k nearest neighbors,kNN)是研究中常见的元算法,其实现简单且运行较快,但关键参数k的最优值设置较为困难. 文献[9]使用32种分类数据集元特征,基于kNN构建元特征到最优算法的映射.

    3)基于回归的元算法预测各候选算法的性能指标值并进行比较,从而完成算法选择决策,使用较为广泛的元算法为支持向量回归(support vector regression,SVR). 该类方法可以灵活地增减候选算法,但训练成本较高. 文献[10]选用12种元特征,应用SVR学习元特征与候选分类算法性能的映射关系.

    4)基于集成的元算法通过一定策略组合多个性能较弱的元算法,得到具有较强性能的集成元算法,常见的元算法包括随机森林(random forest,RF)、极端梯度提升(extreme gradient boosting,XGB)等,这些方法的泛化性能较好,但训练速度较慢且参数设置较为复杂. 文献[11]提取22种元特征,采用RF学习元特征到最优分类算法的映射. 文献[12]使用22种元特征,采用随机森林回归(random forest regression,RFR)预测候选分类算法的性能. 文献[13]提取98种图像元特征,分别训练RF和XGB并预测最优图像分割算法. 文献[14]选择39种分类数据集元特征,应用轻量梯度提升机(light gradient boosting machine,LGBM)进行算法选择.

    在上述研究中,元特征和元算法的选用仍存在一些问题:在元特征方面,研究通常根据需求选取固定的元特征,耦合度较高,可扩展性较弱;不同元特征从不同方面描述和提取数据集的抽象特征,具备一定的互补性,然而现有的方法难以有效利用. 在元算法方面,现有研究或使用泛化性能较弱的单个元算法,或采用同构基学习器的集成元算法,未能充分利用不同元算法的优势和多样性.

    多目标优化是指同时优化多个目标且目标间相互矛盾的问题,该问题通常是NP难的,无法获取唯一最优解,而是一组次优解[31]. 不失一般性地以最小化问题为例,给出多目标优化问题的定义.

    定义1. 多目标优化问题. 给定一个具有n维决策变量,mm≥2)个目标的优化问题,其数学模型的定义如式(1)所示:

    \min {\boldsymbol{F}}({\boldsymbol{x}}) = {({f_1}({\boldsymbol{x}}),{f_2}({\boldsymbol{x}}), … ,{f_m}({\boldsymbol{x}}))^{\text{T}}},{\boldsymbol{x}} \in \varOmega \text{,} (1)

    其中x=(x1, x2,…, xn),x为决策向量,Ω为决策空间;F:ΩΛF为目标函数向量,Λ为由m个优化目标构成的目标空间.

    定义2. 帕累托支配. 给定2个解xy,当满足如式(2)所示约束条件时,则称x支配y,表示为{\boldsymbol{x}} \prec{\boldsymbol{ y }}.

    \left\{\begin{aligned} & \forall i\in \{1,2,… ,m\},{f}_{i}({\boldsymbol{x}})\le {f}_{i}({\boldsymbol{y}})\text{,}\\ & \exists i\in \{1,2,… ,m\},{f}_{j}({\boldsymbol{x}}) < {f}_{j}({\boldsymbol{y}}). \end{aligned}\right. (2)

    定义3. 帕累托最优解. 给定解xΩ,当且仅当不存在另一个解支配它,即{\nexists }{\boldsymbol{y}}\in \varOmega :{\boldsymbol{y}}\prec {\boldsymbol{x}}时,x为帕累托最优解.

    定义4. 帕累托解集(Pareto set,PS). 决策空间Ω中所有帕累托最优解的集合称为帕累托解集,如式(3)所示:

    PS=\{{\boldsymbol{x}}\in \varOmega |{\nexists }{\boldsymbol{y}}\in \varOmega ,{\boldsymbol{y}}\prec {\boldsymbol{x}}\} . (3)

    定义5. 帕累托前沿(Pareto front,PF). PS对应的目标向量集合称为帕累托前沿,如式(4)所示:

    PF = \{ {\boldsymbol{F}}({\boldsymbol{x}}) \in \varLambda |{\boldsymbol{x}} \in PS\} . (4)

    多目标优化旨在获取靠近真实PF的解集S,从2个方面评估解集S的质量:收敛性,即解集S与真实PF的接近程度;分布性,即解集S在目标空间中的散布程度.

    ALO通过蚂蚁的随机游走模拟蚁狮使用陷阱诱捕蚂蚁的过程,实现个体解的搜索更新. 蚂蚁各维度的随机游走方法如式(5)所示:

    \begin{split}{\boldsymbol{X}}({{t}})=&(0,cusum\left({r}_{{t}_{1}}\left(rand\right)\right),cusum\left({r}_{{t}_{2}}\left(rand\right)\right),\\ &\cdots ,cusum\left({r}_{{t}_{T}}\left(rand\right)\right))\text{,}\end{split} (5)

    其中t为当前迭代次数,T为最大迭代次数,Xt)表示随机游走位置,cusum表示随机游走步长的累加和,{r}_{t}(rand)=\left\{\begin{array}{l}1,\text{ }rand > 0.5 \\ -1,\text{ }rand\leqslant 0.5 \end{array} \right. 用于生成随机游走步长,rand表示位于[0,1]之间均匀分布的随机数.

    在随机游走过程中,蚂蚁逐渐滑入陷阱,其游走边界随之缩小,如式(6)所示:

    {{\boldsymbol{c}}^t} = \frac{{\boldsymbol{c}}}{I},{{\boldsymbol{d}}^t} = \frac{{\boldsymbol{d}}}{I} . (6)

    其中cd表示个体各维度值的上界和下界, {{\boldsymbol{c}}^t} {{\boldsymbol{d}}^t} 分别表示第t次迭代中蚂蚁各维度值搜索范围的上界和下界,缩小程度I的计算如式(7)所示:

    I = {10^w}\frac{t}{T} \text{,} (7)

    其中倍率因子w的值取决于当前迭代次数t. t ≤ 0.1T时,w = 0;t > 0.1T时,w = 2;t > 0.5T时,w = 3;t > 0.75T时,w = 4;t > 0.9T时,w = 5;t > 0.95T时,w = 6. 可见随着t的增加,10wI值呈递增趋势,而 {{\boldsymbol{c}}^t} {{\boldsymbol{d}}^t} 呈递减趋势.

    ALO将适应度最优的蚁狮选为精英蚁狮,每只蚂蚁通过轮盘赌方法选择1个蚁狮,围绕精英蚁狮和轮盘赌蚁狮进行随机游走,并根据式(8)更新位置:

    {{\boldsymbol{A}}^t} = \frac{{{\boldsymbol{R}}_{\boldsymbol{e}}^t + {\boldsymbol{R}}_{\boldsymbol{r}}^t}}{2} \text{,} (8)

    其中At为第t次迭代蚂蚁的位置向量,er分别表示精英蚁狮和轮盘赌蚁狮,{\boldsymbol{R}}_{\boldsymbol{e}}^t为第t次迭代蚂蚁围绕e进行随机游走得到的向量,{\boldsymbol{R}}_{\boldsymbol{r}}^t为第t次迭代蚂蚁围绕r进行随机游走所得向量.

    为了充分利用元特征的互补性和元算法的多样性来提升算法选择性能,SAMO设计算法选择模型,并提出多目标混合蚁狮优化算法对模型进行求解,从而构建准确性和多样性较强的集成元算法.

    使用SAMO进行算法选择的流程如图2所示. 将元数据集输入SAMO后,SAMO利用所提优化算法对模型进行优化,输出集成元算法集合,然后从其中选择一个集成元算法用于预测新数据集的最优算法.

    图  2  SAMO算法选择流程
    Figure  2.  Algorithm selection process of SAMO

    模型使用分类错误率(error rate,ER),评估集成元算法的准确性.

    设有m个测试元实例X={x1, x2,…, xm},各个元实例对应的最优算法标签为Y={y1, y2,…, ym},且有ykA,1 ≤ k m,其中A={a1, a2,…, al}为包含l个候选算法标签的集合,ER的计算公式为

    E{R_E} = \frac{1}{m}\sum\limits_{i = 1}^m {e({{\hat y}_i},{y_i})} \text{,} (9)

    其中E表示集成元算法,e({{\hat y}_i},{y_i}) = \left\{ \begin{aligned} & 1, {{\hat y}_i} \ne {y_i} \\ & 0, {{\hat y}_i} = {y_i} \end{aligned} \right.{{\hat y}_i} {y_i} 分别表示第i个测试元实例的预测最优算法标签和真实最优算法标签. 通过加权投票法,利用集成权重W={w1, w2,…, wv}组合v个基分类器B={b1, b2,…, bv}构建集成元算法EE对测试元实例xkX进行预测,如式(10)所示:

    E({{\boldsymbol{x}}_k}) = \mathop {{\rm{arg\;max}} }_{a \in A} \left(\sum\limits_{i = 1}^v {{w_i}I({b_i}({{\boldsymbol{x}}_k}) = a)} \right) \text{.} (10)

    其中Exk)表示预测最优算法结果;aA中的一个候选算法标签;bi表示B中的第i个基分类器,bixk)表示对xk的预测结果;I()为示性函数,当其中的表达式逻辑为真时,I()=1,否则I ()=0;wi为第i个基分类器的集成权重;函数 \mathop {\arg \max }\limits_{a \in A} ()表示取使其内部表达式最大的算法标签a. ER值越小,表示集成元算法的性能越好.

    采用不同的多样性指标(diversity indicator,DI),度量集成元算法的多样性,包括Q统计量(Q statistic,QS)、K统计量(K statistic,KS)、相关系数(correlation coefficient,CC)、一致度量(agreement measure,AM)和双错测度(double fault,DF).

    基分类器bibjX的预测结果列联表如表1所示.

    表  1  预测结果列联表
    Table  1.  Contingency Table of Prediction Results
    bj预测结果bi预测结果
    bixk)= ykbixk)≠ yk
    bjxk)= ykcp
    bjxk)≠ ykqd
    下载: 导出CSV 
    | 显示表格

    表1中,c表示bibj均预测正确的元实例数;p表示bi预测正确而bj预测错误的元实例数;q表示bi预测错误而bj预测正确的元实例数;d表示bibj均预测错误的元实例数. 根据表1bibjQSKSCCAMDF指标计算公式为

    Q{S_{{b_i}}}_{,{b_j}} = \frac{{c\times d - p\times q}}{{c\times d + p\times q}} \text{,} (11)
    K{S_{{b_i}}}_{,{b_j}} = \frac{{2(c \times d - p\times q)}}{{(c + p)(p + d) + (c + q)(q + d)}} \text{,} (12)
    C{C_{{b_i}}}_{,{b_j}} = \frac{{c\times d - p \times q}}{{\sqrt {(c + p)(c + q)(q + d)(p + d)} }} \text{,} (13)
    A{M_{{b_i}}}_{,{b_j}} = \frac{{c + d}}{m} \text{,} (14)
    D{F_{{b_i}}}_{,{b_j}} = \frac{d}{m} . (15)

    式(11)~(15)所述的指标值越小代表基分类器多样性越强,除AMDF指标的值域为[0,1]外,其余指标的值域均为[−1,1];此外,这5个指标均为成对指标,即满足DI_{{b_i,b_j}}=DI_{{b_j,b_i}}. 集成元算法EDI值为所有成对基分类器DI值的平均值,如式(16)所示:

    D{I_E} = \frac{{\displaystyle\sum\limits_{j = i + 1}^v {\displaystyle\sum\limits_{i = 1}^{v - 1} {D{I_{{b_i}}}_{,{b_j}}} } }}{{({v^2} - v)/2}} . (16)

    综上,算法选择模型的目标函数向量为

    {{\boldsymbol{F}}_{{\rm{SAMO}}}} = \min (E{R_E},D{I_E}) \text{.} (17)

    模型从元特征和元算法2个方面提升算法选择性能. 在元特征方面,选择互补性较强的元特征子集,从而有效利用元特征;在元算法方面,通过选择性集成方法,对异构基分类器进行集成,构建具有较强泛化性能和多样性的集成元算法. 为了利用不同基分类器的优势和多样性,采用kNN、支持向量机(support vector machine,SVM)和CART作为基分类器,且本文设置每种基分类器的个数相等.

    在对元数据集进行元特征选择的基础上,算法选择模型构建集成元算法的过程如图3所示. 首先使用自助法对训练集进行若干次抽样生成多个训练子集;然后运用基分类器在训练子集上进行训练形成基分类器集合;最后通过选择性集成方法,从集合中选择准确性和多样性较强的基分类器并基于加权投票法策略进行组合,得到集成元算法.

    图  3  集成元算法构建过程
    Figure  3.  Construction process of ensemble meta-learner

    多目标混合蚁狮优化算法针对元特征选择的离散特性(元特征选择与否),使用离散型编码选择元特征子集;采用连续型编码构建集成元算法,该连续型编码包括1个用于控制基分类器选择概率的选择阈值编码,以及若干个基分类器权重编码,用于选择基分类器的训练子集并生成集成权重.

    个体的混合编码方式如图4所示. 设元数据集含有M个元特征,则个体的离散型编码数为M;设基分类器个数为V,则自助法抽样生成的训练子集个数为V,个体依次为kNN,SVM,CART使用V个编码选择训练子集和生成集成权重,因此基分类器权重编码数为3V,个体的连续型编码数为3V+1.

    图  4  个体编码方式
    Figure  4.  Coding pattern of individuals

    根据混合编码机制,对个体各维度的编码值进行初始化,如式(18)所示:

    {A}_{i}=\left\{\begin{array}{l}R(rand),i\le M\text{,}\\ rand,\text{}i > M\text{,}\end{array} \right. (18)

    其中Ai为个体第i维的编码值,个体前M维编码为离散型的元特征选择编码,R(rand) = \left\{ \begin{gathered} 1,{\text{ }}rand \geqslant 0.5, \\ 0,{\text{ }}rand < 0.5 , \\ \end{gathered} \right.其值为1表示第i维编码对应的元特征被选中,值为0表示未被选中. 个体第M+1维编码为选择阈值编码,其后的3V个编码为基分类器权重编码,前V个基分类器权重编码为kNN选择训练子集,如式(19)所示:

    J({A}_{i})=\left\{\begin{array}{l}1,\text{}{A}_{i}\ge {A}_{M+1}\text{,}\\ 0,\text{}{A}_{i} < {A}_{M+1}\text{,}\end{array}M+1 < i\le M+V+1\text{,}\right. (19)

    其中AM+1为选择阈值编码值,JAi)= 1表示第i维编码对应的训练子集被选中,J({A}_{i}) = 0表示未被选中,以此类推可得SVM和CART的训练子集. 3种基分类器分别在选择的训练子集上独立训练,得到基分类器集合B={b1,b2,…,bv},通过基分类器权重编码值生成这3种基分类器集合对应的集成权重W={w1,w2,…,wv},如式(20)所示:

    {w_i} = \frac{{{A_i}}}{{\displaystyle\sum\limits_{j = 1}^v {{A_j}} }} \text{,} (20)

    其中wi为第i个基分类器的集成权重,Ai为第i个基分类器对应的基分类器权重编码值,Aj为第j个基分类器对应的基分类器权重编码值. 通过式(20)的归一化方法,使得集成权重和为1,即\displaystyle\sum\limits_{i = 1}^v {{w_i}} = 1.

    在混合编码机制的基础上,采用增强游走策略对个体进行搜索更新.

    在迭代过程中,个体的离散型编码使用离散随机游走方法,如式(21)所示:

    {R}_{{\rm{dis}},i}^{t}=\left\{\begin{array}{l}{L}_{{\rm{dis}},i}^{t},\text{ }{X}^{t}/t\ge m(t)\text{,}\\ 1-{L}_{{\rm{dis}},i}^{t},\text{ }{X}^{t}/t < m(t)\text{.}\end{array}\right. (21)

    其中R_{{\rm{dis}},i}^t为第t次迭代蚂蚁的离散型编码进行离散随机游走后所得离散向量的第i维元素值;L_{{\rm{dis}},i}^t为第t次迭代蚂蚁所围绕游走的蚁狮第i维离散型编码值;X t为式(5)中第t次迭代对应的随机游走步长累加和,由式(5)可知其值在[−t, t]之间,由此可得X^t/t 位于[−1,1]之间,取反阈值m(t)计算如式(22)所示:

    m(t) = (2rand - 1)\cos \frac{{\pi (t - 1)}}{{2(T - 1)}} . (22)

    其中\cos \dfrac{{\pi (t - 1)}}{{2(T - 1)}}在[0,1]之间随着迭代次数t的增加呈现非线性递减趋势,(2rand−1)的值域为[−1,1],使得mt)在[−1,1]之间随t值增加呈现随机性的非线性递减趋势,如图5所示.

    图  5  m(t)变化趋势
    Figure  5.  Change trend of m(t)

    随机游走后,蚂蚁离散型编码值更新为

    {A}_{{\rm{dis}},i}^{t}=\left\{\begin{array}{l}{R}_{{\rm{dis}},{\boldsymbol{e}},i}^{t},\text{}rand\ge 0.5\text{,}\\ {R}_{{\rm{dis}},{\boldsymbol{r}},i}^{t},\text{}rand < 0.5\text{,}\end{array}\right. (23)

    其中A_{{\rm{dis}},i}^t为第t次迭代蚂蚁的第i维离散型编码值,R_{{\rm{dis}},{\boldsymbol{e}},i}^tR_{{\rm{dis}},{\boldsymbol{r}},i}^t分别为蚂蚁在第t次迭代围绕精英蚁狮和轮盘赌蚁狮进行离散随机游走所得离散向量的第i维元素值.

    对于个体的连续型编码,在其游走更新过程中引入随机性. ALO中每只蚂蚁的搜索边界{{\boldsymbol{c}}^t}{{\boldsymbol{d}}^t}变化趋势相同,种群多样性不足. 为了增强所提算法的种群多样性,对式(6)进行改进,如式(24)所示:

    \begin{aligned} & {{\boldsymbol{c}}}^{t}=\frac{{\boldsymbol{c}}}{I}\left(\mathrm{cos}\frac{\pi (t-1)}{3(T-1)}+rand-0.5\right)\text{,}\\ & {{\boldsymbol{d}}}^{t}=\frac{{\boldsymbol{d}}}{I}\left(\mathrm{cos}\frac{\pi (t-1)}{3(T-1)}+rand-0.5 \right)\text{,} \end{aligned} (24)

    其中\cos \dfrac{{\pi (t - 1)}}{{3(T - 1)}}在[0.5,1]之间随着迭代次数t的增加呈现非线性递减趋势,使得\left(\cos \dfrac{{\pi (t - 1)}}{{3(T - 1)}} + rand - 0.5\right)在[0,1.5]之间呈现随机性的非线性递减趋势,从而在{{\boldsymbol{c}}^t}{{\boldsymbol{d}}^t}的变化过程中引入一定随机性. 本文所提算法与ALO的搜索边界变化趋势对比如图6所示.

    图  6  搜索边界变化趋势
    Figure  6.  Change trend of search boundary

    通过上述设计,增强游走策略对个体不同类型的编码进行搜索更新,保留了蚂蚁搜索边界变化的递减趋势,同时在该过程中引入了随机性,增加了算法的种群多样性.

    算法将帕累托解作为蚁狮保存至外部存档S,为了提升算法的寻优能力,分别以不同优化目标为偏好,从存档S中选择在该目标上最优的精英蚁狮. 为了增强解的分布性,通过小生境技术计算蚁狮解的稀疏度从而量化其分布性,然后利用稀疏度通过轮盘赌选择蚁狮,蚁狮解xS的稀疏度为

    s({{\boldsymbol{x}}_S},\varphi ) = \frac{{\left| S \right| - 1}}{{\left| {\left\{ {{{\boldsymbol{y}}_S} \in S\left| {\left\| {{\boldsymbol{F}}({{\boldsymbol{x}}_S}) - {\boldsymbol{F}}({{\boldsymbol{y}}_S})} \right\| < \varphi } \right.} \right\}} \right|}} . (25)

    其中s({{\boldsymbol{x}}_S},\varphi ) 表示给定半径\varphi 的小生境范围内解xS的稀疏度,yS表示存档S中的另一个解,半径\varphi 计算为

    \varphi =\frac{{\displaystyle \sum _{j=i+1}^{m}{\displaystyle \sum _{i=1}^{m-1}\Vert {\boldsymbol{F}}({{\boldsymbol{e}}}_{i})-{\boldsymbol{F}}({{\boldsymbol{e}}}_{j})\Vert }}}{({m}^{2}-m)c/2}\text{,}0 < c\le \left|S\right| \text{,} (26)

    其中mm≥2)为优化目标个数,eiej分别表示第i个和第j个优化目标对应的精英蚁狮, \left\| {{\boldsymbol{F}}({{\boldsymbol{e}}_i}) - {\boldsymbol{F}}({{\boldsymbol{e}}_j})} \right\| 计算二者目标函数值向量的欧氏距离,c为常数; \left\| {{\boldsymbol{F}}({{\boldsymbol{x}}_S}) - {\boldsymbol{F}}({{\boldsymbol{y}}_S})} \right\| 计算解xS和解yS目标函数值向量的欧氏距离,其值小于半径\varphi 时表示ySxS的相邻解. 给定解在存档中的相邻解越多,则该解的稀疏度越低,分布性越差. 以2个优化目标为例,稀疏度计算示意如图7所示.

    图  7  稀疏度计算示意图
    Figure  7.  Schematic diagram of sparsity calculation

    在迭代过程中,对于每个优化目标,每只初始化蚂蚁根据蚁狮的稀疏度通过轮盘赌选择1个蚁狮,围绕该优化目标的精英蚁狮和轮盘赌蚁狮进行随机游走,产生新的个体解. 由此可得个体数是初始化种群的新蚂蚁种群的m倍,将新蚂蚁种群加入外部存档,筛选保留存档中的帕累托解作为新一代蚁狮.

    通过偏好精英选择机制,分别围绕不同优化目标上的最优个体进行迭代更新,增强算法的寻优性能;采用轮盘赌方法以较大概率选择分布性较好的个体,并围绕该个体进行搜索,使解的分布性更强.

    SAMO流程如图8所示. 方法首先输入元数据集;然后根据混合编码机制初始化蚂蚁种群;计算蚂蚁的适应度,通过帕累托支配关系从中选出帕累托解作为蚁狮保存至外部存档;按照偏好精英选择机制,从蚁狮中选出准确性和多样性优化目标上的精英个体,分别称为准确性精英蚁狮和多样性精英蚁狮,并计算蚁狮的稀疏度;在迭代过程中,每只初始化蚂蚁根据蚁狮稀疏度进行2次轮盘赌选择蚁狮,围绕准确性精英蚁狮和第1次轮盘赌蚁狮,以及围绕多样性精英蚁狮和第2次轮盘赌蚁狮,基于增强游走策略进行随机游走,生成2只新的蚂蚁;计算新蚂蚁种群的适应度并将其加入存档,更新存档和精英蚁狮;最后,达到最大迭代时,输出蚁狮构建的集成元算法集合. 在该过程中,个体的适应度为集成元算法的准确性和多样性指标值.

    图  8  SAMO流程
    Figure  8.  Process of SAMO

    SAMO方法的伪代码如算法1所示.

    算法1. SAMO方法.

    输入:元数据集D、基分类器个数V、多样性指标DI、最大迭代次数T、种群规模N

    输出:蚁狮构建的集成元算法集合ES.

    initAntsinitializeD,V,N);/*获取D的元特征 数M,根据MV确定个体不同类型编码 的维度数,基于混合编码机制,通过式(18) 获取N个蚂蚁的初始化种群initAnts*/

    trSubsetsbootStrapV);/*对训练集使用自助 法抽样生成V个训练子集trSubsets*/

    ③ for each ant A in initAnts

    ④  mfSubsetmfSelectionA);/*根据蚂蚁A的 离散型编码选择元特征子集mfSubSet*/

    ⑤  EAselEnsembleA,mfSubset,trSubsets);/*在 使用mfSubSet的基础上,根据A的连续 型编码,通过式(19)(20)为基分类器选 择训练子集和生成集成权重,训练基分 类器并集成得到集成元算法EA*/

    ⑥  FA=[EREA,DIEA]caculateFitEA, DI);/*通过 式(9)(16)计算EAER值和DI值作为 A的适应度值FA*/

    ⑦ end for

    SupdateArchiveinitAnts);/*将initAnts加入 外部存档S,根据式(2)计算解的支配关系, 保留存档中的帕累托解作为蚁狮*/

    eER,eDIupdateElitesS);/*从S中选择ER值最 优的准确性精英蚁狮eERDI值最优的多 样性精英蚁狮eDI*/

    ⑩ while iter=1,iterT

    ⑪  sparScaculateSparsityS);/*通过式(25)计 算存档中蚁狮的稀疏度sparS*/

    ⑫  for each ant A in initAnts

    ⑬   r←rouletteWheelA,sparS);/*A根据稀疏 度sparS通过轮盘赌选择蚁狮r*/

    ⑭   Re,RrrandomWalkA,eER,r);

    /*A通过式(21)~(24)围绕蚁狮eERr进行随机 游走产生向量ReRr*/

    ⑮   A=updateAntRe,Rr);/*使用ReRr,通过 式(8)(23)生成新蚂蚁个体A′*/

    ⑯   将步骤⑭~⑯中的eERA′替换为eDIA″后,重复步骤⑭~⑯;

    ⑰   newAntsnewAnts∪{A′,A″};/*将A′和A″ 加入新蚂蚁种群newAnts*/

    ⑱  end for

    ⑲  将步骤④~⑩中的initAnts替换为newAnts 后,重复步骤④~⑩;

    iter=iter+1;

    ㉑ end while

    ㉒ 输出S中蚁狮构建的集成元算法集合ES.

    现对SAMO的时间复杂度进行分析,设元特征数为M,基分类器个数为V,可得个体维度数D=M+3V+1;设种群规模为N,最大迭代次数为T,则蚂蚁随机游走进行迭代的时间复杂度为O(2N×D×T),计算解的支配关系和稀疏度的时间复杂度为ON2×T).综上所述,SAMO的时间复杂度为ON×T×(N+2D)).

    由于分类算法应用的广泛性,通过分类算法选择问题进行测试实验. 实验使用来自于UCI[32],KEEL[33],StatLib[34],OpenML[35]的260个分类数据集,这些数据集的数据来源领域各异,实例数从13~149332不等,属性数从1~1558不等,类数从2~188不等,具有一定的差异性,构成多样化的数据集,从而能够有效评估方法的性能. 实验数据集信息如表2所示.

    表  2  实验数据集信息
    Table  2.  Information of Experimental Datasets
    序号数据集属性数实例数类数序号数据集属性数实例数类数序号数据集属性数实例数类数
    1abalone841772988divorce541702175online-shoppers17123302
    2absenteeism207401889dna18031863176optdigits64382310
    3ada-agnostic484562290dry-bean16136117177ozone-1hr7225362
    4advertisement15583279291echocardiogram11753178ozone-8hr7225342
    5aids450292ecoli73368179page-blocks1054735
    6allrep293772493eeg-eyestate14149802180parkinson-speech2610402
    7amazon-employ93276994electricity8451322181pc12111092
    8acd-assessment1513495energy-eff976837182pc33715632
    9acd-authorship70841496engine153833183pc43714582
    10acd-bankruptcy650297eucalyptus197365184penbased161099210
    11acd-birthday3365798fabert80082377185phishing-websites3024562
    12acd-bondrate1157599first-order5161186186phoneme554042
    13acd-boxing131202100flag281948187pima87682
    14acd-boxing231322101flare1110666188polish-bankruptcy16470272
    15acd-braziltour84127102gas-drift128139106189polish-bankruptcy56455002
    16acd-broadway9955103german2010002190popularkids104783
    17acd-broadwaym72857104gesture-phase3298735191post-operative8903
    18acd-chall10121382105gina-prior2784346810192primary-tumor1733922
    19acd-creditscore61002106glass92147193prnn-fglass92146
    20acd-currency3317107haberman33062194ring2074002
    21acd-cyyoung810972108hayes-roth-test4284195risk-factors3585826
    22acd-cyyoung910922109hayes-roth-train41324196rmftsa-sleep210244
    23acd-dmft47976110hcv-egyptian2813854197robot-failures-lp4901173
    24acd-draft436512111heart-statlog132702198saheart94622
    25acd-esr2322112helena2765196100199sat11-hand-runtime11529614
    26acd-germangss54004113hepatitis191552200satimage3664357
    27acd-halloffame1713403114hill-valley10012122201sat-test3620006
    28acd-homerun261622115horse-colic-test27682202sat-train3644356
    29acd-lawsuit42642116horse-colic-train273002203seeds72103
    30acd-mapleleafs1843117house-votes162322204semeion256159310
    31acd-marketing323105118ilpd105832205sensor-readings-242454564
    32acd-supreme7405210119image-seg-test192107206sensor-readings-4454564
    33acd-votesurvey4484120image-seg-train1921007207servo41672
    34anneal387986121indian-pines22091448208shuttle9580007
    35anomalydata-5410502122internet-usage701010846209shuttle-landing6152
    36anomalydata-5h1010502123ionosphere343512210smartphone-har661806
    37appendicitis71062124iris41503211socmob511562
    38arrhythmia27945216125isolet1234617623826212sonar602082
    39artificial-charac71021810126isolet5617155926213soybean-large3530719
    40asp-potassco140129411127japanese-vowels1499619214soybean-small35474
    41audiology6922624128jungle-chess-l-e4647043215spambase5745972
    42australian146902129jungle-chess-p-l4647043216spect-test221872
    43autism-adult207042130jungle-chess-r-e4658803217spect-train22802
    44autohorse-fixed68201186131kc12121092218spectf-test442692
    45automobile252057132kc2215222219spectf-train44802
    46autouniv1-10002010002133kr-vs-kp3631962220spectrometer10153148
    47autouniv4-250010025003134kropt62805618221speech40036862
    48autouniv6-10004010008135leaf1534030222splice6031903
    49autouniv6-750407508136leaves-margin641600100223steel-plates-faults2719417
    50autouniv7-11001211005137leaves-shape641600100224student-mat3039521
    51autouniv7-500125005138leaves-texture641600100225student-por3064921
    52bach-choral165665102139led2424320010226surveillance7153
    53balance-scale46253140led7digit750010227synthetic-control606006
    54ballon4162141lense5243228tae51513
    55banana253002142letter162000026229tamilnadu34578120
    56bank-marketing16452112143libras-move9036015230texture40500011
    57banknote413722144lung-cancer56323231thyroid2172003
    58biodeg4110552145lupus3872232thyroid-allbp2628005
    59blood-trans47482146lymphography181484233thyroid-allhyper2628005
    60breast-cancer92862147madelon50026002234tic-tac-toe99582
    61breast-cancer-w96992148magic10190202235titanic322012
    62bupa63452149marketing1389939236toronto-apartment61124188
    63cacao8179542150mc13894662237touch2102658
    64calendar-dow323995151meta-all62716238trains32102
    65car617284152meta-stream744516413239twonorm2074002
    66car-evaluation2117284153mfeat-fac216200010240unix-user291009
    67cardiotocograph3521263154mfeat-fou76200010241user-knowledge54035
    68castmetal1373272155mfeat-kar64200010242usps256929810
    69chess3631962156mfeat-mor6200010243vehicle188464
    70churn2050002157mfeat-pix240200010244vehicle-reproduced188464
    71clean216565982158mfeat-zer47200010245volcanoes-a1332524
    72cleveland132975159miceprotein7610808246volcanoes-d2391724
    73click-prediction9399482160micro-a220200005247volcanoes-e2310804
    74climate-model205402161micro-mass130057120248vowel1399011
    75cmc914733162monks1-test61222249walking-activity

    414933222
    76cnae985610809163monks1-train61242250waveform2150003
    77coil20008598222164monks2-test64322251waveform-noise4050003
    78colleges-aaup1411614165monks2-train61692252wdbc305692
    79collins19100030166monks3-test64322253wifi-localization720004
    80contraceptive914733167monks3-train61222254wilt543392
    81cpmp-2015245274168mozilla45155452255wine131783
    82credit-card23300002169mushroom2281242256winequality-r11159910
    83crx156532170newthyroid52153257winequality-w11489810
    84cylinder-bands195392171nursery8129605258wpbc321982
    85dbworld-bodies6437212172obs-network2010754259yeast8148413
    86dermatology343666173oil-spill499372260zoo161017
    87diggle-table-a283109174olivetti-faces409640040
    下载: 导出CSV 
    | 显示表格

    通过元特征提取工具MFE[36]提取常用的150种分类数据集元特征,包括77种基于统计和信息论的元特征、24种基于决策树的元特征、14种基于基准的元特征和35种基于问题复杂度的元特征. 元特征信息如表3所示.

    表  3  元特征信息
    Table  3.  Information of Meta-Features
    元特征类型元特征名称
    基于统计和信
    息论的元特征
    attr_conc.mean,attr_conc.sd,attr_ent.mean,attr_ent.sd,attr_to_inst,can_cor.mean,can_cor.sd,cat_to_num,class_conc.mean,class_conc.sd,class_ent,cor.mean,cor.sd,cov.mean,cov.sd,eigenvalues.mean,eigenvalues.sd,eq_num_attr,freq_class.mean,freq_class.sd,g_mean.mean,g_mean.sd,gravity,h_mean.mean,h_mean.sd,inst_to_attr,iq_range.mean,iq_range.sd,joint_ent.mean,joint_ent.sd,kurtosis.mean,kurtosis.sd,lh_trace,mad.mean,mad.sd,max.mean,max.sd,mean.mean,mean.sd,median.mean,median.sd,min.mean,min.sd,mut_inf.mean,mut_inf.sd,nr_attr,nr_bin,nr_cat,nr_class,nr_cor_attr,nr_disc,nr_inst,nr_norm,nr_num,nr_outliers,ns_ratio,num_to_cat,one_itemset.mean,one_itemset.sd,p_trace,range.mean,range.sd,roy_root,sd.mean,sd.sd,sd_ratio,skewness.mean,skewness.sd,sparsity.mean,sparsity.sd,t_mean.mean,t_mean.sd,two_itemset.mean,two_itemset.sd,var.mean,var.sd,w_lambda
    基于决策树的
    元特征
    leaves,leaves_branch.mean,leaves_branch.sd,leaves_corrob.mean,leaves_corrob.sd,leaves_homo.mean,leaves_homo.sd,leaves_per_class.mean,leaves_per_class.sd,nodes,nodes_per_attr,nodes_per_inst,nodes_per_level.mean,nodes_per_level.sd,nodes_repeated.mean,nodes_repeated.sd,tree_depth.mean,tree_depth.sd,tree_imbalance.mean,tree_imbalance.sd,tree_shape.mean,tree_shape.sd,var_importance.mean,var_importance.sd
    基于基准的
    元特征
    best_node.mean,best_node.sd,elite_nn.mean,elite_nn.sd,linear_discr.mean,linear_discr.sd,naive_bayes.mean,naive_bayes.sd,one_nn.mean,one_nn.sd,random_node.mean,random_node.sd,worst_node.mean,worst_node.sd
    基于问题复杂
    度的元特征
    c1,c2,cls_coef,density,f1.mean,f1.sd,f1v.mean,f1v.sd,f2.mean,f2.sd,f3.mean,f3.sd,f4.mean,f4.sd,hubs.mean,hubs.sd,l1.mean,l1.sd,l2.mean,l2.sd,l3.mean,l3.sd,lsc,n1,n2.mean,n2.sd,n3.mean,n3.sd,n4.mean,n4.sd,t1.mean,t1.sd,t2,t3,t4
    下载: 导出CSV 
    | 显示表格

    使用8种sklearn[37]机器学习平台的分类算法,包括kNN、RF、SVM、逻辑回归(logistic regression,LR)、朴素贝叶斯(naive Bayes,NB)、线性判别分析(linear discriminant analysis,LDA)、ID3决策树和多层感知机(multi-layer perceptron,MLP),以及基于Keras[38]构建的卷积神经网络(convolutional neural network,CNN)作为候选算法. 基于sklearn实现的算法均采用其默认参数设置,对于CNN,使用1个含有32个卷积核的1维卷积层、1个全连接层以及1个最大池化层构成其隐藏层,并设置其卷积层和全连接层使用ReLU(rectified linear unit)激活函数,输出层使用softmax激活函数.

    使用准确率(Acc)、查准率(Pre)、查全率(Rec)、F1得分(F1)和ARR(adjusted ratio of ratios)[39]指标测度并比较候选算法的性能. ARR指标综合考虑算法的运行时间和准确率,其计算如式(27)(28)所示:

    AR{R_{{a_i},{a_j}}} = \frac{{{{Ac{c_{{a_i}}}} \mathord{\left/ {\vphantom {{Ac{c_{{a_i}}}} {Ac{c_{{a_j}}}}}} \right. } {Ac{c_{{a_j}}}}}}}{{1 + \alpha \times \lg \left( {{{R{t_{{a_i}}}} \mathord{\left/ {\vphantom {{R{t_{{a_i}}}} {R{t_{{a_j}}}}}} \right. } {R{t_{{a_j}}}}}} \right)}},1 \leqslant i \ne j \leqslant l \text{,} (27)
    AR{R_{{a_i}}} = \frac{{\displaystyle\sum_{j = 1 \wedge j \ne i}^l {AR{R_{{a_i},{a_j}}}} }}{{l - 1}},1 \leqslant i \ne j \leqslant l \text{.} (28)

    其中aiaj表示候选算法,l为候选算法数,AccRt表示算法在数据集上的准确率和运行时间; \alpha 为可变参数,表示准确率和运行时间的相对重要程度. 实验中ARR \alpha 值取研究中常用的0.1,0.01,0.001,并各自记为ARR1ARR2ARR3指标,以获得较为全面的算法性能测度结果.

    通过5次10折交叉验证获取候选算法在各数据集上的性能指标值,对指标值进行比较从而确定数据集的最优算法,将最优算法作为标签与元特征结合,构建相应性能指标的元数据集. 采用DAccDPreDRecDF1表示通过准确率、查准率、查全率和F1得分指标构建的元数据集;使用D_{AR{R_{{1}}}}D_{ AR{R_{{2}}}}D_{AR{R_{{3}}} }分别表示通过ARR1ARR2ARR3指标生成的元数据集. 此外,当应用基于回归的元算法时,为各候选算法构建单独的元数据集,其中的元实例标签为性能指标值,训练元算法预测各候选算法的性能指标值,比较预测值从而选出最优算法.

    采用kNN,SVM,CART,SVR,RF,RFR,XGB,LGBM作为元算法与SAMO进行对比实验. 研究表明kNN的距离度量采用欧氏距离,邻居数k值位于元数据集实例数的10%~15%之间时,其表现更好[9,40]. 经过对比择优,本文设置k=30,距离度量采用以距离倒数为权重的加权欧氏距离. 除kNN外,其他基分类器和元算法均采用sklearn中的默认参数设置.

    通过5折交叉验证,即将元数据集随机划为5份,选择其中4份作为训练集,1份为测试集,计算各方法的性能指标值.

    对构建的7个元数据集进行分析,候选算法在各元数据集中的胜出次数如表4所示.

    表  4  候选算法胜出次数
    Table  4.  Win Times of the Candidate Algorithms
    候选算法元数据集
    D_{Acc}D_{Pre }D_{Rec }D_{F{\rm{1}} } D_{AR{R_{{1}}}} D_{AR{R_{ {2} } } }D_{AR{R_{ {3} } } }
    kNN1015131645218
    RF1061008494106698
    SVM37282021213232
    LR2726252222126
    NB11132113511714
    LDA20242625664022
    ID321273838654634
    MLP2318232401421
    CNN59107035
    注:黑体数值表示最多胜出次数.
    下载: 导出CSV 
    | 显示表格

    表4可以看出,在本文的测试环境中,RF具有较为优越的分类性能,但是其在 D_{AR{R_{{1}}}} 元数据集中的胜出次数较少,表明运行时间是其短板. 与RF相对的是kNN和NB,得益于算法较快的运行速度,kNN和NB在ARR指标上具备一定优势,但其分类性能较差,因此,随着α值的减小,运行时间的重要性降低,kNN和NB在ARR指标元数据集中的胜出次数减少. 相较于kNN和NB,LR的分类性能略优但时间开销更高,在7个指标上的表现较为平庸. SVM的分类性能较优且具有合适的时间开销,在7个指标上均取得了较好结果. LDA和ID3展现了较好的分类性能,另一方面,LDA和ID3在ARR指标的元数据集中也取得了较多次数的胜出,说明其在运行时间和准确率上较为均衡. MLP的分类性能具有一定优势,但其在各数据集上的运行时间较长,因此MLP在ARR指标的元数据集中取得的胜出次数较少. CNN的分类性能较弱且运行开销较高,在7个指标上的结果较差.

    为了确定合适的参数设置,本节设置种群规模N=50和最大迭代次数T=300,对方法的关键参数,即基分类器个数V和多样性指标DI进行敏感性分析.

    为了便于说明,以DAcc元数据集为例,DI∈{QS, KS, CC, AM, DF},V∈{10, 15, 20, 25, 30},对方法独立运行20次的平均帕累托解数量和准确性精英蚁狮的ER平均值进行对比,结果分别如图9图10所示.

    图  9  不同多样性指标时的帕累托解数量
    Figure  9.  Pareto solution numbers of different diversity indicators
    图  10  不同基分类器个数时的错误率
    Figure  10.  Error rates of different base classifier numbers

    观察图9中的帕累托解数量,其在5种多样性指标上均呈现相同的变化趋势,即随着基分类器个数V的增加,帕累托解数量整体呈现出先增加后减少的趋势,当V=15时,SAMO在5种DI上的帕累托解数量相对较多. 另一方面,当使用QSKSCC指标时,SAMO产生的帕累托解数量较为接近,优于AMDF指标上的结果.

    分析图10结果可以发现,随着基分类器个数V的增加,SAMO的ER性能在5种多样性指标上均呈现出先降低后提升的趋势,当基分类器个数V=15时,5种指标上的ER值取得最优结果. 此外,当基分类器个数V相同时,5种指标上的ER性能较为接近,其中AM指标上的整体表现优于其他指标.

    进一步分析图9图10,随着V的增加,基分类器的多样性得到提升,集成的效果变好;而当V>15时,基分类器多样性的提升有限,却引入了部分准确性较差的基分类器,同时方法个体的搜索空间扩大,搜索效率降低,导致了方法性能的下降.

    综合上述结果,为了获取具有较好算法选择性能的集成元算法,设置基分类器个数V=15较为合适,多样性指标DI宜使用AM指标.

    对本文所提多目标混合蚁狮优化算法、多目标蚁狮优化(multi-objective ALO,MALO)[41]、第2代非支配排序遗传算法(non-dominated sorting genetic algorithm 2,NSGA2)[42]、速度约束多目标粒子群优化(speed-constrained multi-objective particle swarm optimization,SMPSO)[43]以及第2代强度帕累托进化算法(strength Pareto evolutionary algorithm 2,SPEA2)[44]的优化性能进行比较,其中NSGA2,SMPSO,SPEA2基于框架jMetalPy[45]实现. 各对比算法均采用连续型编码,设置阈值为0.5对个体的元特征选择编码进行离散化. 具体而言,设Ai为个体的第i维元特征选择编码,当Ai≥0.5时表示第i个元特征被选择,Ai<0.5时表示未被选择;在个体的选择性集成编码部分设置与本文算法相同的编码和解码方式.

    设置基分类器个数V=15,多样性指标DIAM,算法种群规模N=50,最大迭代次数T=300,取算法独立运行20次的平均结果. 对算法准确性最优个体的ER值、多样性最优个体的DI值和帕累托解数量进行比较,结果如表5~7所示.

    表  5  各算法错误率结果
    Table  5.  Error Rate Results of the Algorithms %
    元数据集本文算法MALONSGA2SMPSOSPEA2
    DAcc50.352.454.054.453.8
    DPre54.156.757.157.757.2
    DRec56.358.260.161.259.8
    DF151.754.455.556.355.8
    D_{AR{R_{ {1} } } } 51.654.357.859.258.9
    D_{AR{R_{ {2} } } } 55.757.960.560.960.8
    D_{AR{R_{ {3} } } } 50.952.253.954.253.8
    注:黑体数值表示最优结果.
    下载: 导出CSV 
    | 显示表格
    表  6  各算法多样性指标结果
    Table  6.  Diversity Indicator Results of the Algorithms
    元数据集本文算法MALONSGA2SMPSOSPEA2
    DAcc0.5610.5900.6540.640.651
    DPre0.5920.6280.6680.6650.662
    DRec0.5600.6010.6360.6310.639
    DF10.5490.5820.6340.6270.637
    D_{AR{R_{ {1} } } } 0.5010.5210.5870.5840.579
    D_{AR{R_{ {2} } } } 0.5680.6010.6460.6460.657
    D_{AR{R_{ {3} } } } 0.5610.5980.6470.6420.638
    注:黑体数值表示最优结果.
    下载: 导出CSV 
    | 显示表格
    表  7  各算法帕累托解数量结果
    Table  7.  Pareto Solution Number Results of the Algorithms
    元数据集本文算法MALONSGA2SMPSOSPEA2
    DAcc12.65.96.27.26.6
    DPre9.35.55.87.25.8
    DRec10.45.46.15.96.2
    DF112.25.66.15.95.7
    D_{AR{R_{ {1} } } } 9.05.24.95.14.8
    D_{AR{R_{ {2} } } } 10.25.75.56.05.1
    D_{AR{R_{ {3} } } } 11.06.06.66.86.3
    注:黑体数值表示最优结果.
    下载: 导出CSV 
    | 显示表格

    表5表6可以看出,本文算法的准确性最优个体和多样性最优个体分别可以取得最低的ER值和DI值,说明其寻优性能优于其他对比算法. 对比MALO,NSGA2,SMPSO,SPEA2的结果,不难发现MALO的性能优于另外3种算法.

    表7中,本文算法可以产生较多的帕累托解数量,而其他算法的帕累托解数量较之有明显的差距,进一步验证了本文算法具有较强的寻优性能.

    使用非支配比率(non-dominance ratio,NR[46]、超体积(hyper volume,HV[47]和空间指标(spacing,SP[48],对各算法的解集质量进行评估. NR将不同算法产生的多个解集合并为1个解集并从中筛选出帕累托解构成解集Sm,然后计算某一个解集中的解在Sm中所占的比例,比值越大说明该解集的质量较之其他解集的质量越优. HV计算解集与目标空间中的最差点所构成空间的面积,ERDI最差的指标值均为1,因此最差点为(1,1),HV值越大表示解集质量越好. SP计算解集中相距最近的2个解距离的标准差,其值越小代表解的分布性越好. 各算法在NRHVSP上的比较结果如表8~10所示.

    表  8  各算法NR结果
    Table  8.  NR Results of the Algorithms
    元数据集本文算法MALONSGA2SMPSOSPEA2
    DAcc0.7430.25200.0100
    DPre0.8340.166000
    DRec0.7130.297000
    DF10.7780.225000
    D_{AR{R_{ {1} } } } 0.7110.289000
    D_{AR{R_{ {2} } } } 0.7640.2290.01000
    D_{AR{R_{ {3} } } } 0.7570.2290.01000
    注:黑体数值表示最优结果.
    下载: 导出CSV 
    | 显示表格
    表  9  各算法HV结果
    Table  9.  HV Results of the Algorithms
    元数据集本文算法MALONSGA2SMPSOSPEA2
    DAcc0.2090.1900.1570.1600.158
    DPre0.1810.1580.1400.1390.142
    DRec0.1840.1640.1410.1390.141
    DF10.2080.1860.1590.1580.157
    D_{AR{R_{ {1} } } } 0.2350.2150.1710.1670.170
    D_{AR{R_{ {2} } } } 0.1840.1640.1370.1350.132
    D_{AR{R_{ {3} } } } 0.2060.1860.1590.1600.162
    注:黑体数值表示最优结果.
    下载: 导出CSV 
    | 显示表格
    表  10  各算法SP结果
    Table  10.  SP Results of the Algorithms
    元数据集本文算法MALONSGA2SMPSOSPEA2
    DAcc0.0130.0190.0170.0240.021
    DPre0.0180.0140.0200.0160.018
    DRec0.0120.0110.0190.0180.014
    DF10.0130.0210.0150.0160.022
    D_{AR{R_{ {1} } } } 0.0150.0160.0170.0170.022
    D_{AR{R_{ {2} } } } 0.0130.0110.0180.0140.018
    D_{AR{R_{ {3} } } } 0.0160.0230.0190.0170.018
    注:黑体数值表示最优结果.
    下载: 导出CSV 
    | 显示表格

    对比表8中的NR结果,相较于其他算法,本文算法有明显的优势,展现了更强的收敛性. MALO可以搜索到本文算法所未能发现的帕累托解,具备一定的寻优性能. NSGA2,SMPSO,SPEA2在NR上的表现较差.

    分析表9,所提算法在各元数据集上取得了最大的HV值,表明算法产生的解在目标空间中距离最差点较远,解集的质量高于其他算法. MALO在个别元数据集上可以取得与本文算法相近的HV值,其性能优于NSGA2,SMPSO,SPEA2.

    表10的结果显示5种算法在不同元数据集上具有较优的SP值,其中本文算法和MALO分别在部分元数据集上都取得了最优结果,NSGA2,SMPSO,SPEA2的结果较为接近. 结合帕累托解数量结果,本文算法产生较多的帕累托解,同时可以保持较好的分布性.

    综合上述分析,在对算法选择模型进行优化时,相较于其他4种算法,本文算法可以在2个优化目标上搜索到更优解,并且可以发现数量更多且质量更高的帕累托解,在收敛性和分布性上有较大的优势,说明本文算法在寻优性能上具备优越性.

    采用与3.4节相同的方法参数设置,取SAMO运行20次所得的准确性精英蚁狮的性能均值进行对比,各方法的ER比较结果如表11所示.

    表  11  各方法ER结果
    Table  11.  ER Results of the Methods %
    方法元数据集排名
    D_{Acc } D_{Pre}D_{Rec}D_{F{\rm{1} } } D_{AR{R_{ {1} } } } D_{AR{R_{ {2} } } } D_{AR{R_{ {3} } } }
    SAMO50.354.156.351.751.655.750.91
    kNN59.266.568.563.874.673.559.67
    SVM59.661.966.963.178.175.463.18
    CART69.671.976.269.664.673.169.29
    SVR59.261.567.363.875.075.062.76
    RF55.855.458.854.254.257.754.62
    RFR60.062.363.860.866.571.266.95
    XGB59.657.756.954.254.258.156.54
    LGBM57.753.86055.853.558.856.53
    注:黑体数值表示最优结果.
    下载: 导出CSV 
    | 显示表格

    表11进行分析,可以看出SAMO在各元数据集上有更优越的性能,其在除DPre外的其他元数据集上均取得了最好结果. kNN,SVM,CART的性能较弱,SAMO的性能明显优于这3种元算法. SVR预测各候选算法的性能指标值,比较预测值并选择最优算法,计算开销较高,且其在ER指标上不具备性能优势. 在集成方法中,除SAMO外,RF具有较好的性能,在各元数据集上均有较好的表现;RFR结合集成和回归方法,预测算法的性能指标值,其性能优于SVR但明显劣于其他集成方法;XGB在DRec元数据集上的性能强于RF,但是方法的整体性能不如RF;LGBM的整体表现并不突出,但其在DPre元数据集上取得了最优结果. 将SAMO与平均性能排名第2的RF进行比较,SAMO有着2.9%的平均性能领先,证明了SAMO在ER指标上的优越性.

    进一步使用查准率、查全率和F1得分指标比较方法性能,结果如表12~14所示.

    表  12  各方法查准率结果
    Table  12.  Precision Results of the Methods %
    方法元数据集排名
    D_{Acc }D_{Pre}D_{Rec}D_{F{\rm{1}} } D_{AR{R_{ {1} } } } D_{AR{R_{ {2} } } } D_{AR{R_{ {3} } } }
    SAMO35.735.938.537.738.632.832.81
    kNN16.114.218.621.317.416.718.37
    SVM5.04.47.07.55.12.94.48
    CART19.918.017.319.730.121.422.36
    SVR5.04.44.75.37.33.04.49
    RF27.142.435.936.134.828.829.52
    RFR27.921.025.723.622.915.912.75
    XGB26.726.640.636.336.230.231.14
    LGBM27.737.336.937.338.127.029.13
    注:黑体数值表示最优结果.
    下载: 导出CSV 
    | 显示表格
    表  13  各方法查全率结果
    Table  13.  Recall Results of the Methods %
    方法元数据集排名
    D_{Acc } D_{Pre}D_{Rec}D_{F{\rm{1} } }D_{AR{R_{ {1} } } }D_{AR{R_{ {2} } } }D_{AR{R_{ {3} } } }
    SAMO28.029.730.931.638.931.227.41
    kNN18.816.117.818.917.717.020.87
    SVM11.911.311.912.115.811.311.48
    CART19.920.620.526.131.223.122.26
    SVR12.011.411.411.615.811.511.69
    RF22.828.931.229.433.428.725.74
    RFR29.425.725.726.524.917.713.95
    XGB23.727.436.334.034.731.227.02
    LGBM24.431.931.630.634.829.225.93
    注:黑体数值表示最优结果.
    下载: 导出CSV 
    | 显示表格
    表  14  各方法F1得分结果
    Table  14.  F1 Score Results of the Methods %
    方法元数据集排名
    D_{Acc }D_{Pre}D_{Rec}D_{F{\rm{1} } } D_{AR{R_{ {1} } } } D_{AR{R_{ {2} } } }D_{AR{R_{ {3} } } }
    SAMO28.028.829.530.436.629.526.41
    kNN16.313.815.016.516.415.018.47
    SVM7.06.26.77.26.14.66.38
    CART18.717.817.220.727.921.120.65
    SVR7.06.35.96.46.94.76.49
    RF22.53029.627.532.126.524.54
    RFR25.020.922.722.122.014.811.56
    XGB23.325.434.831.633.129.326.62
    LGBM24.430.730.829.933.627.025.43
    注:黑体数值表示最优结果.
    下载: 导出CSV 
    | 显示表格

    分析表12查准率结果,SAMO在5个元数据集上均获得了最好性能. kNN和CART在查准率指标上的表现明显优于SVM和SVR. 在除SAMO外的集成方法中,RFR的性能不如其他方法. RF,XGB,LGBM的性能较优,其中RF在DPre元数据集上表现突出,XGB在DRec元数据集上具有优越性能. SAMO相较于kNN,SVM,CART,SVR有较大程度的性能优势,其与RF相比平均性能提升了2.5%,表明SAMO在查准率指标上具有优越性.

    表13中,SAMO在ARR指标的元数据集上取得了较好的查全率结果. CART的查全率性能优于kNN和SVM,SVR的性能表现稍劣于SVM. RFR的性能优于CART,在DAcc上具有优异表现,但是在其他元数据集上的表现一般. RF,XGB,LGBM的查全率性能较好,其中XGB在DRecDF1 D_{AR{R_{ {2} } } } 元数据集上以及LGBM在DPre元数据集上表现优异. SAMO的性能相较于RF平均领先了2.5%,相较于XGB平均领先了0.5%,说明SAMO在查全率指标上具备一定的有效性.

    表14可以看出,SAMO在各元数据集上的F1得分性能较好,在部分元数据集上取得了最好的结果. 在kNN,SVM,CART中,CART的性能优于kNN和SVM. SVR与SVM的性能较为接近,与其他方法的差距较大. RFR的性能稍劣于CART,表现较为一般. RF,XGB,LGBM具有较好的F1得分性能,其中XGB和LGBM分别在不同元数据集上表现突出. SAMO的F1得分优于kNN,SVM,CART,另一方面,其相较于RF性能平均提升了2.4%,与XGB相比则性能平均提升了0.8%,验证了SAMO在F1得分指标上的有效性.

    将SAMO与RF,RFR,XGB,LGBM这4种集成方法进行对比,SAMO使用3种基分类器进行选择性集成,而其他方法仅使用CART基学习器进行集成,SAMO生成的集成元算法具有更强的多样性. 另一方面,在实验测试中,SAMO构建的集成元算法在DAccDPreDRecDF1 D_{AR{R_{ {1} } } } D_{AR{R_{ {2} } } } D_{AR{R_{ {3} } } } 上平均集成的基分类器数量分别为7.7,6.5,6.7,7.2,5.5,7.1,7.8,而其他方法固定使用100个基学习器进行集成,由此可见SAMO通过选择性集成有效减少了基学习器的数量,从而降低集成元算法的存储和运行开销.

    综上所述,SAMO选择具有互补性的元特征子集,并使用不同类型的基分类器进行选择性集成,生成具有较好算法选择性能和多样性的集成元算法. 其在测试指标上的性能均明显优于kNN,SVM,CART,SVR方法,而与RF,XGB等采用同构基学习器的集成方法相比,也具有相对优越的性能表现,证明了SAMO的有效性和优越性.

    为了提升基于元学习的算法选择性能,提出基于多目标混合蚁狮优化的算法选择方法SAMO. 设计算法选择模型,引入元特征选择和选择性集成,同时寻找互补性较强的元特征子集和对多种元算法进行选择性集成;提出多目标混合蚁狮优化算法求解模型,使用混合编码机制选择元特征并构建集成元算法,采用增强游走策略和偏好精英选择机制提高寻优性能. 构建分类算法选择问题进行实验测试,将SAMO与8种典型的元算法进行对比,结果显示SAMO具有优异的算法选择性能,明显优于kNN,SVM,CART,SVR元算法,较之RF,RFR,XGB,LGBM这4种集成元算法也有一定优势,并且具备更强的多样性,综合体现了SAMO的有效性和优越性.

    未来的工作包括3个方面:

    1)由于算法选择相关因素较多,问题较为复杂,因此目前本文所提方法SAMO的效果有限,后续将深入研究数据集、元特征和元算法的内在关系,对SAMO进行改进,显著提升算法选择性能.

    2)本文方法的时间复杂度较高,未来将通过并行计算、迁移学习等手段降低方法的训练开销.

    3)将SAMO拓展应用于实际工程,构建算法选择系统.

    作者贡献声明:李庚松提出方法设计,负责实验方案的实现和论文内容的撰写;刘艺提出研究问题,梳理论文结构;郑奇斌整理论文内容,负责实验数据集的收集和处理;李翔参与方法的实验结果分析;刘坤提出方法思路的改进建议;秦伟提出论文修改和优化建议;王强负责实验方案的实现指导;杨长虹给出完善论文整体框架的建议.

  • 图  1   pix2pix 和 CycleGAN 的原理示意图

    Figure  1.   Illustration of principle of pix2pix and CycleGAN

    图  2   Faceshifter网络结构示意图[21]

    Figure  2.   Schematic diagram of the network structure of Faceshifter[21]

    图  3   Tacotron2 系统结构[79]

    Figure  3.   Systematic structure of Tacotron2[79]

    图  4   多注意力深度伪造检测方法结构图[140]

    Figure  4.   Structure diagram of multi-attention deepfake detection method[140]

    图  5   深度伪造对抗样本生成方法的网络结构[158]

    Figure  5.   Network structure of deepfake adversarial examples generation method[158]

    表  1   图像和视频伪造检测方法总结

    Table  1   Summary of Image and Video Fake Detection Methods

    检测方法特点适用场景实验数据集检测性能模型主干网络
    Exploiting Visual Artifact[92] 通过提取牙齿,眼睛及
    脸部轮廓等特征进行伪
    造检测
    使用Deepfakes方法和face2face方法生成的深度伪造视频 FaceForensics 0.866(AUC) 逻辑回归、多层感知机
    FDFL[95] 使用频域特征,优化难
    度小
    检测面部替换,面部重现等伪造图片和视频 FaceForensics++[12] 0.994(ACC)0.997(AUC) CNN
    Generalizing Face Forgery Detection[96] 利用图像的高频噪声,泛化能力较强 针对未知伪造方法生成图像的检测,需要高泛化性检测方法的场景 FaceForensics++[12] 0.994(AUC) CNN、注意力机制
    Face x-ray[99] 较高的泛化性 需要高泛化性检测方法的场景 FaceForensics++[12],DFDC[148],celebDF[149] 0.985(AUC,FF++),0.806(泛化AUC,celebDF),0.809(泛化AUC,DFDC) CNN
    LRNet[106] 通过帧间时序特征识别伪造视频,同时有较强的鲁棒性 针对存在压缩和破损等情况的深度伪造视频检测 FaceForensics++[12] celebDF[149] 0.957(AUC,FF++,c40压缩)0.554(AUC,celebDF,c40压缩) CNN+RNN
    Exposing Inconsistent Head Poses[110] 通过检测人物头部姿态判断是否为伪造视频 深度伪造视频检测 自建数据集 0.974(AUC) SVM
    F3-Net[116] 基于频域特征的深度伪造检测 被压缩的伪造视频检测 FaceForensics++[12] 0.958(AUC) CNN
    Two-branch Recurrent Network[117] 融合了RGB域信息和频域的高频信息 深度伪造视频检测 FaceForensics++[12],DFDC[148],celebDF[149] 0.987(AUC,单帧),0.991(AUC,视频) CNN+LSTM
    Id-reveal[122] 通过比对待测视频和参考视频中人脸身份信息判断伪造 拥有指定人物参考视频的深度伪造视频检测 DFD[150] 0.86(AUC) CNN
    Emotions Don’t Lie[127] 通过提取多模态情感信息之间的差异来检测伪造 带有音频的深度伪造视频检测 DF-TIMIT[151],DFDC[148] 0.844(AUC,DFDC) CNN
    Fakespotter[131] 通过神经网络可解释性方法检测伪造视频 针对GAN等生成模型的深度伪造检测 Celeb-DF v2[152] 0.668(AUC) 深度人脸识别模型
    On the Detection of Digital Face Manipulation [132] 基于注意力机制的深度伪造检测 需要可视化伪造区域的检测场景 自建数据集 0.997(AUC) CNN、注意力机制
    FReTal[137] 通过知识蒸馏和迁移学习,解决针对新出现的伪造方法的检测 适用于检测较新的伪造生成方法 FaceForensics++[12] 0.925(泛化AUC) CNN
    Multi-attentional deepfake detection[140] 聚合高维的语义信息和低维的纹理信息 图像和视频深度伪造检测 FaceForensics++[12],DFDC[148],celebDF[149] 0.993(AUC,FF++) CNN、注意力机制
    CviT[145] 引入视觉transformer检测深度伪造 图像和视频深度伪造检测 FaceForensics++[12],DFDC[148] 0.915(ACC,DFDC) CNN+视觉transformer
    下载: 导出CSV

    表  2   深度伪造视频和图片数据集

    Table  2   Deepfake Video and Image Datasets

    数据集发布年份伪造方法数据集描述数据集大小真伪样本数量比
    DFD[150] 2019 Deepfakes 篡改视频均使用 C0,C23
    C40 这3种压缩方式
    363个原始视频、3068个篡改视频、28个演员和16个不同场景 1∶8.45
    Deepfake-TIMIT[151] 2018 FaceSwap-GAN 从VidTIMIT数据库中选取相近人脸伪造构建 320个视频、每个视频有高清(128×128)和低清(64×64)版本 1∶1
    DFDC(deepfake detection challenge)Preview[148] 2019 未知 DFDC预赛中使用的数据集 5214个视频 1∶3.57
    DFDC[170] 2020 8种伪造方法 DFDC比赛中使用的数据集 119154个视频 1∶5.26
    FaceForensics++
    (FF++)[12]
    2019 Deepfakes,FaceSwap,Face2face,Neuraltexture,faceshifter Google推出的另一个数据集,前身为FaceForensics,目前仍在持续更新 6000个视频 1∶5
    Celeb-DF[149] 2020 Deepfakes 视频数量较少,已有后续版本Celeb-DF v2[152]和DFGC(deepfake game competition)[171] 590个真实视频、5639个伪造视频 1∶9.56
    Wild Deepfake[172] 2020 网络途径获取 通过网络获取的伪造数据集,效果较好 707个伪造视频、100个演员
    DeeperForensics 1.0[173] 2020 deepfake-VAE 大型深度伪造数据集,包含多种灯光条件和面部角度,同时使用了改进的生成方法,较之前数据集更为真实 60000个视频、1760万帧 1∶5
    Video Forensics HQ[174] 2020 Neural Textures 高清视频伪造数据集
    FFIW-10K[175] 2021 3种合成方法 同一个视频片段中出现多个可能被篡改的人脸,平均每帧3.15个人脸 10000个真实视频和10000个篡改视频 1∶1
    ForgeryNet[176] 2021 15种合成方法(7种图像级方法、8种视频级方法) 支持多种任务的超大数据集(630万个分类标签、290万个操纵区域标注和221247个时空伪造段标签) 290万张图像、221247个视频 视频1∶1.22, 图片1∶1.01
    FakeAVCeleb[177] 2021 5种伪造方法 多模态数据集、伪造视频包含音频 25500个视频 1∶51.02
    下载: 导出CSV

    表  3   ForgeryNet中的定位伪造区域的检测任务[176]

    Table  3   Spatial Forgery Localization Detection Task in ForgeryNet[176]

    伪造方法图片样本检测结果标注
    面部替换
    面部重现
    面部特征编辑
    真实人脸
    下载: 导出CSV

    表  4   深度伪造语音数据集

    Table  4   Deepfake Audio Datasets

    数据集发布年份描述数据集大小
    ASVspoof 2015[178]2015语音合成与转换16651 段原始音频、246500 段合成转换音频
    ASVspoof 2017[179]2017录音重放3566段非重放音频、14466段重放音频
    ASVspoof 2019[180]2019录音重放、语音合成与转换15928 原始音频、117996合成转换音频
    下载: 导出CSV
  • [1]

    Mirsky Y, Lee W. The creation and detection of deepfakes: A survey[J]. ACM Computing Surveys, 2021, 54(1): 264−263

    [2]

    Kingma D P, Welling M. Auto-encoding variational Bayes[J]. arXiv preprint, arXiv: 1312.6114, 2013

    [3]

    Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[C] //Proc of the 27th Int Conf on Neural Information Processing Systems. La Jolla, CA : NIPS, 2014: 2672−2680

    [4]

    Isola P, Zhu Junyan, Zhou Tinghui, et al. Image-to-image translation with conditional adversarial networks[C] //Proc of the 30th IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2017: 1125−1134

    [5]

    Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation[C] //Proc of the 18th Int Conf on Medical Image Computing and Computer-assisted Intervention. Berlin: Springer, 2015: 234−241

    [6]

    Wang Tingchun, Liu Mingyu, Zhu Yanjun, et al. High-resolution image synthesis and semantic manipulation with conditional GANs[C] //Proc of the 31st IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018: 8798−8807

    [7]

    Wang Tingchun, Liu Mingyu, Zhu Yanjun, et al. Video-to-video synthesis[J]. arXiv preprint, arXiv: 1808.06601, 2018

    [8]

    Zhu Junyan, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C] //Proc of the 30th IEEE Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2017: 2223−2232

    [9]

    Huang Gao, Liu Zhuang, Van Der Maaten L, et al. Densely connected convolutional networks[C] //Proc of the 30th IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2017: 4700−4708

    [10]

    He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Deep residual learning for image recognition[C] //Proc of the 29th IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 770−778

    [11]

    Chollet F. Xception: Deep learning with depthwise separable convolutions [C] //Proc of the 30th IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2017: 1251−1258

    [12]

    Rossler A, Cozzolino D, Verdoliva L, et al. FaceForensics++: Learning to detect manipulated facial images [C] //Proc of the 17th IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2019: 1−11

    [13]

    Dale K, Sunkavalli K, Johnson M K, et al. Video face replacement [J]. ACM Transactions on Graphics, 2011, 30(6): 8: 1−8: 10

    [14]

    torzdf. Deepfakes [CP/OL] 2017 [2021-10-15]. https://github.com/deepfakes/face swap

    [15]

    Korshunova I, Shi Wenzhe, Dambre J, et al. Fast face-swap using convolutional neural networks[C] //Proc of the 16th IEEE Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2017: 3677−3685

    [16]

    Ulyanov D, Lebedev V, Vedaldi A, et al. Texture networks: Feed-forward synthesis of textures and stylized images[C] //Proc of the 33rd Int Conf on Machine Learning. New York: PMLR, 2016: 1349− 1357

    [17]

    Shaoanlu. Fceswap-GAN [CP/OL]. 2017 [2021-10-15]. https://github.com/shaoa nlu/faceswap-GAN

    [18]

    Natsume R, Yatagawa T, Morishima S. FsNet: An identity-aware generative model for image-based face swapping[C] //Proc of the 14th Asian Conf on Computer Vision. Berlin: Springer, 2018: 117−132

    [19]

    Natsume R, Yatagawa T, Morishima S. RSGAN: Face swapping and editing using face and hair representation in latent spaces[J]. arXiv preprint, arXiv: 1804.03447, 2018.

    [20]

    Nirkin Y, Keller Y, Hassner T. FSGAN: Subject agnostic face swapping and reenactment[C] //Proc of the 17th IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2019: 7184−7193

    [21]

    Li Lingzhi, Bao Jianmin, Yang Hao, et al. Faceshifter: Towards high fidelity and occlusion aware face swapping[J]. arXiv preprint, arXiv: 1912.13457, 2019

    [22]

    Chen Renwang, Chen Xuanhong, Ni Bingbing, et al. Simswap: An efficient framework for high fidelity face swapping[C] //Proc of the 28th ACM Int Conf on Multimedia. New York: ACM, 2020: 2003−2011

    [23]

    Zhu Yuhao, Li Qi, Wang Jian, et al. One shot face swapping on megapixels [C] //Proc of the 18th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 4834−4844

    [24]

    Lin Yuan, Lin Qian, Tang Feng, et al. Face replacement with large-pose differences[C] //Proc of the 20th ACM Int Conf on Multimedia. New York: ACM, 2012: 1249−1250

    [25]

    Min Feng, Sang Nong, Wang Zhefu. Automatic face replacement in video based on 2D morphable model[C] //Proc of the 20th Int Conf on Pattern Recognition. Piscataway, NJ: IEEE, 2010: 2250−2253

    [26]

    Moniz J R A, Beckham C, Rajotte S, et al. Unsupervised depth estimation, 3D face rotation and replacement[J]. arXiv preprint, arXiv: 1803.09202, 2018

    [27]

    Thies J, Zollhofer M, Niessner M, et al. Real-time expression transfer for facial reenactment[J]. ACM Transactions on Graphics, 2015, 34(6): 183: 1−183: 4

    [28]

    Thies J, Zollhofer M, Stamminger M, et al. Face2Face: Real-time face capture and reenactment of rgb videos[C] //Proc of the 29th IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 2387−2395

    [29]

    Thies J, Zollhofer M, Theobalt C, et al. Headon: Real-time reenactment of human portrait videos[J]. ACM Transactions on Graphics, 2018, 37(4): 164: 1−164: 13

    [30]

    Kim H, Garrido P, Tewari A, et al. Deep video portraits[J]. ACM Transactions on Graphics, 2018, 37(4): 163: 1−163: 14

    [31]

    Nagano K, Seo J, Xing Jun, et al. PaGAN: Real-time avatars using dynamic textures[J]. ACM Transactions on Graphics (TOG), 2018, 37(6): 258: 1−258: 12

    [32]

    Geng Jiahao, Shao Tianjia, Zheng Youyi, et al. Warp-guided GANs for single-photo facial animation[J]. ACM Transactions on Graphics, 2018, 37(6): 231: 1−231: 12

    [33]

    Wang Yaohui, Bilinski P, Bremond F, et al. Imaginator: Conditional spatio-temporal GAN for video generation[C] //Proc of the 20th IEEE/CVF Winter Conf on Applications of Computer Vision. Piscataway, NJ: IEEE, 2020: 1160−1169

    [34]

    Siarohin A, Lathuiliere S, Tulyakov S, et al. Animating arbitrary objects via deep motion transfer[C] //Proc of the 32nd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2019: 2377−2386

    [35]

    Siarohin A, Lathuiliere S, Tulyakov S, et al. First order motion model for image animation[C] //Proc of the 32nd Int Conf on Neural Information Processing Systems. La Jolla, CA : NIPS, 2019: 7137−7147

    [36]

    Qian Shengju, Lin K Y, Wu W, et al. Make a face: Towards arbitrary high fidelity face manipulation[C] //Proc of the 32nd IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2019: 10033−10042

    [37]

    Song Linsen, Wu W, Fu Chaoyou, et al. Pareidolia face reenactment[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 2236−2245

    [38]

    Pumarola A, Agudo A, Martinez A M, et al. GANimation: Anatomically-aware facial animation from a single image[C] //Proc of the 15th European Conf on Computer Vision (ECCV). Berlin: Springer, 2018: 818−833

    [39]

    Tripathy S, Kannala J, Rahtu E. FACEGAN: Facial attribute controllable reenactment gan[C] //Proc of the 21st IEEE/CVF Winter Conf on Applications of Computer Vision. Piscataway, NJ: IEEE, 2021: 1329−1338

    [40]

    Gu Kuangxiao, Zhou Yuqian, Huang T. FLNet: Landmark driven fetching and learning network for faithful talking facial animation synthesis[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 10861−10868

    [41]

    Xu Runze, Zhou Zhiming, Zhang Weinan, et al. Face transfer with generative adversarial network[J]. arXiv preprint, arXiv: 1710.06090, 2017

    [42]

    Bansal A, Ma Shugao, Ramanan D, et al. RecycleGan: Unsupervised video retargeting[C] //Proc of the 15th European Conf on Computer Vision (ECCV). Berlin: Springer, 2018: 119−135

    [43]

    Wu W, Zhang Yunxuan, Li Cheng, et al. ReenactGAN: Learning to reenact faces via boundary transfer[C] //Proc of the 15th European Conf on Computer Vision (ECCV). Berlin: Springer, 2018: 603−619

    [44]

    Zhang Jiangning, Zeng Xianfang, Wang Mengmeng, et al. FReeNet: Multi-identity face reenactment[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 5326−5335

    [45]

    Zhang Jiangning, Zeng Xianfang, Pan Yusu, et al. FaceSwapNet: Landmark guided many-to-many face reenactment[J]. arXiv preprint, arXiv: 1905.11805, 2019

    [46]

    Tripathy S, Kannala J, Rahtu E. ICface: Interpretable and controllable face reenactment using GANs[C] //Proc of the 20th IEEE/CVF Winter Conf on Applications of Computer Vision. Piscataway, NJ: IEEE, 2020: 3385−3394

    [47]

    Wiles O, Koepke A, Zisserman A. X2Face: A network for controlling face generation using images, audio, and pose codes[C] //Proc of the 15th European Conf on Computer Vision (ECCV). Berlin: Springer, 2018: 670−686

    [48]

    Shen Yujun, Luo Ping, Yan Junjie, et al. Faceid-GAN: Learning a symmetry three-player GAN for identity-preserving face synthesis[C] //Proc of the 31st IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018: 821−830

    [49]

    Shen Yujun, Zhou Bolei, Luo Ping, et al. FaceFeat-GAN: A two-stage approach for identity-preserving face synthesis[J]. arXiv preprint, arXiv: 1812.01288, 2018

    [50]

    Wang Tingchun, Liu Mingyu, Tao A, et al. Few-shot video-to-video synthesis[J]. arXiv preprint, arXiv: 1910.12713, 2019

    [51]

    Zakharov E, Shysheya A, Burkov E, et al. Few-shot adver-sarial learning of realistic neural talking head models[C] //Proc of the 32nd IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2019: 9459−9468

    [52]

    Burkov E, Pasechnik I, Grigorev A, et al. Neural head reenactment with latent pose descriptors[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 13786−13795

    [53]

    Ha S, Kersner M, Kim B, et al. MarioNETte: Few-shot face reenactment preserving identity of unseen targets[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 10893−10900

    [54]

    Hao Hanxiang, Baireddy S, Reibman A R, et al. Far-GAN for one-shot face reenactment[J]. arXiv preprint, arXiv: 2005.06402, 2020

    [55]

    Fried O, Tewari A, Zollhofer M, et al. Text-based editing of talking-head video[J]. ACM Transactions on Graphics, 2019, 38(4): 68: 1−68: 14

    [56]

    Kumar R, Sotelo J, Kumar K, et al. ObamaNet: Photo-realisticlip-sync from text[J]. arXiv preprint, arXiv: 1801.01442, 2017

    [57]

    Sotelo J, Mehri S, Kumar K, et al. Char2wav: End-to-end speech synthesis[C] //Proc of the ICLR 2017 Workshop. 2017: 24−26

    [58]

    Jamaludin A, Chung J S, Zisserman A. You said that?: Synthesising talking faces from audio[J]. International Journal of Computer Vision, 2019, 127(11): 1767−1779

    [59]

    Vougioukas K, Petridis S, Pantic M. Realistic speech-driven facial animation with GANs[J]. International Journal of Computer Vision, 2020, 128(5): 1398−1413 doi: 10.1007/s11263-019-01251-8

    [60]

    Suwajanakorn S, Seitz S M, Kemelmacher-shlizerman I. Synthesizing Obama: Learning lip sync from audio[J]. ACM Transactions on Graphics, 2017, 36(4): 95: 1−95: 13

    [61]

    Chen Lele, Maddox R K, Duan Zhiyao, et al. Hierarchical cross-modal talking face generation with dynamic pixel-wise loss[C] //Proc of the 32nd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2019: 7832−7841

    [62]

    Zhou Hang, Liu Yu, Liu Ziwei, et al. Talking face generation by adversarially disentangled audio-visual representation[C] //Proc of the 33rd AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2019: 9299−9306

    [63]

    Thies J, Elgharib M, Tewari A, et al. Neural voice puppetry: Audio-driven facial reenactment[C] //Proc of the 16th European Conf on Computer Vision (ECCV). Berlin: Springer, 2020: 716−731

    [64]

    Hannun A, Case C, Casper J, et al. DeepSpeech: Scaling up end-to-end speech recognition[J]. arXiv preprint, arXiv: 1412.5567, 2014

    [65]

    Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks[C] //Proc of the 32nd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2019: 4401−4410

    [66]

    Karras T, Laine S, Airtala M, et al. Analyzing and improving the image quality of StyleGAN[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 8110−8119

    [67]

    Karras T, Aittala M, Laine S, et al. Alias-free generative adversarial networks[J]. arXiv preprint, arXiv: 2106.12423, 2021

    [68]

    Choi Y, Choi M, Kim M, et al. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation[C] //Proc of the 31st IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018: 8789−8797

    [69]

    Choi Y, Uh Y, Yoo J, et al. StarGAN v2: Diverse image synthesis for multiple domains[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 8188−8197

    [70]

    Sanchez E, Valstar M. Triple consistency loss for pairing distributions in GAN-based face synthesis[J]. arXiv preprint, arXiv: 1811.03492, 2018

    [71]

    Kim D, Khan M A, Choo J. Not just compete, but collaborate: Local image-to-image translation via cooperative mask prediction[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 6509−6518

    [72]

    Li Xinyang, Zhang Shengchuan, Hu Jie, et al. Image-to-image translation via hierarchical style disentanglement[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 8639−8648

    [73]

    Aberman K, Shi Mingyi, Liao Jing, et al. Deep video-based performance cloning[J]. Computer Graphics Forum, 2019, 38(2): 219−233 doi: 10.1111/cgf.13632

    [74]

    Chan C, Ginosar S, Zhou Tinghui, et al. Everybody Dance Now [C] //Proc of the 32nd IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2019: 5933−5942

    [75]

    Liu Lingjie, Xu Weipeng, Zollhofer M, et al. Neural rendering and reenactment of human actor videos[J]. ACM Transactions on Graphics, 2019, 38(5): 139: 1−139: 14

    [76]

    Tokuda K, Nankaku Y, Toda T, et al. Speech synthesis based on hidden Markov models[J]. Proceedings of the IEEE, 2013, 101(5): 1234−1252 doi: 10.1109/JPROC.2013.2251852

    [77]

    Oord A, Dieleman S, Zen H, et al. WaveNet: A generative model for raw audio[J]. arXiv preprint, arXiv: 1609.03499, 2016

    [78]

    Wang Yuxuan, Skerry-ryan R, Stanton D, et al. Tacotron: A fully end-to-end text-to-speech synthesis model[J]. arXiv preprint, arXiv: 1703.10135, 2017

    [79]

    Shen J, Pang Ruoming, Weiss R J, et al. Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions[C] //Proc of the 43rd IEEE Int Conf on Acoustics, Speech and Signal Processing(ICASSP). Piscataway, NJ: IEEE, 2018: 4779−4783

    [80]

    Fu Ruibo, Tao Jianhua, Wen Zhengqi, et al. Focusing on attention: Prosody transfer and adaptative optimization strategy for multi-speaker end-to-end speech synthesis[C] //Proc of the 45th IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2020: 6709−6713

    [81]

    Kumar K, Kumar R, de Boissiere T, et al. MelGAN: Generative adversarial networks for conditional waveform synthesis[J]. arXiv preprint, arXiv: 1910.06711, 2019.

    [82]

    Yang Geng, Yang Shan, Liu Kai, et al. Multi-band melgan: Faster waveform generation for high-quality text-to-speech[C] //Proc of the 8th IEEE Spoken Language Technology Workshop (SLT). Piscataway, NJ: IEEE, 2021: 492−498

    [83]

    Kaneko T, Kameoka H. CycleGAN-VC: Non-parallel voice conversion using cycle-consistent adversarial networks[C] //Proc of the 27th European Signal Processing Conf (EUSIPCO). Piscataway, NJ: IEEE, 2018: 2100−2104

    [84]

    Kaneko T, Kameoka H, Tanaka K, et al. CycleGAN-VC2: Improved cyclegan-based non-parallel voice conversion[C] //Proc of the 44th IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2019: 6820−6824

    [85]

    Kaneko T, Kameoka H, Tanaka K, et al. CycleGAN-VC3: Examining and improving CycleGAN-VCs for mel-spectrogram conversion[J]. arXiv preprint, arXiv: 2010.11672, 2020

    [86]

    Kameoka H, Kaneko T, Tanaka K, et al. StarGAN-VC: Non-parallel many-to-many voice conversion using star generative adversarial networks[C] //Proc of the 7th IEEE Spoken Language Technology Workshop (SLT). Piscataway, NJ: IEEE, 2018: 266−273

    [87]

    Kaneko T, Kameoka H, Tanaka K, et al. StarGAN-VC2: Rethinking conditional methods for StarGAN-based voice conversion[J]. arXiv preprint, arXiv: 1907.12279, 2019

    [88]

    Liu Ruolan, Chen Xiao, Wen Xue. Voice conversion with transformer network[C] //Proc of the 45th IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2020: 7759−7759

    [89]

    Luong H T, Yamagishi J. Bootstrapping non-parallel voice conver-sion from speaker-adaptive text-to-speech[C] //Proc of the 16th IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). Piscataway, NJ: IEEE, 2019: 200−207

    [90]

    Zhang Mingyang, Zhou Yi, Zhao Li, et al. Transfer learning from speech synthesis to voice conversion with non-parallel training data[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29(1): 1290−1302

    [91]

    Huang Wenqin, Hayashi T, Wu Yiqiao, et al. Voice transformer network: Sequence-to-sequence voice conversion using transformer with text-to-speech pretraining[J]. arXiv preprint, arXiv: 1912.06813, 2019

    [92]

    Matern F, Riess C, Stamminger M. Exploiting visual artifacts to expose deepfakes and face manipulations[C] //Proc of the 20th IEEE Winter Applications of Computer Vision Workshops (WACVW). Piscataway, NJ: IEEE, 2019: 83−92

    [93]

    Zhou Peng, Han Xintong, Morariu V I, et al. Two-stream neural networks for tampered face detection[C] //Proc of the 30th IEEE Conf on Computer Vision and Pattern Recognition Workshops (CVPRW). Piscataway, NJ: IEEE, 2017: 1831−1839

    [94]

    Nataraj L, Mohammed T M, Manjunath B, et al. Detecting GAN generated fake images using co-occurrence matrices[J]. Electronic Imaging, 2019 : 1−7

    [95]

    Li Jiaming, Xie Hongtao, Li Jiahong, et al. Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 6458−6467

    [96]

    Luo Yuchen, Zhang Yong, Yan Junchi, et al. Generalizing face forgery detection with high-frequency features[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 16317−16326

    [97]

    Shang Zhihua, Xie Hongtao, Zha Zhengjun, et al. PrrNet: Pixel-region relation network for face forger1y detection[J/OL]. Pattern Recognition, 2021, 116 [2021-10-15]. https://doi.org/10.1016/j.patcog.2021.107950

    [98]

    Li Yuezun, Lyu Siwei. Exposing deepfake videos by detecting face warping artifacts[J]. arXiv preprint, arXiv: 1811.00656, 2018

    [99]

    Li Lingzhi, Bao Jianmin, Zhang Ting, et al. Face x-ray for more general face forgery detection[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 5001−5010

    [100]

    Li Xurong, Yu Kun, Ji Shouling, et al. Fighting against deepfake: Patch&pair convolutional neural networks (PPCNN)[C] //Proc of the 29th the Web Conf . New York: ACM, 2020: 88−89

    [101]

    Nguyen H, Fang Fuming, Yamagishi J, et al. Multi-task learning for detecting and segmenting manipulated facial images and videos[J]. arXiv preprint, arXiv: 1906.06876, 2019

    [102]

    Nirkin Y, Wolf L, Keller Y, et al. Deepfake detection based on the discrepancy between the face and its context[J]. arXiv preprint, arXiv: 2008.12262, 2020

    [103]

    Amerini I, Caldelli R. Exploiting prediction error in consistencies through LSTM-based classifiers to detect deepfake videos[C] //Proc of the 8th ACM Workshop on Information Hiding and Multimedia Security. New York: ACM, 2020: 97−102

    [104]

    Amerini I, Galteri L, Caldelli R, et al. Deepfake video detection through optical flow based CNN[C] //Proc of the 32nd IEEE/CVF Int Conf on Computer Vision Workshops. Piscataway, NJ: IEEE, 2019: 1205−1207

    [105]

    Guera D, Delp E J. Deepfake video detection using recurrent neural networks[C/OL] //Proc of the 15th IEEE Int Conf on Advanced Video and Signal Based Surveillance (AVSS). Piscataway, NJ: IEEE, 2018 [2021-10-15]. https://doi.org/10.1109/AVSS.2018.8639163

    [106]

    Sun Zekun, Han Yujie, Hua Zeyu, et al. Improving the efficiency and robustness of deepfakes detection through precise geometric features[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 3609−3618

    [107]

    Sabir E, Cheng Jiaxin, Jaiswal A, et al. Recurrent convolutional strategies for face manipulation detection in videos[C] //Proc of the 32nd IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops. Piscataway, NJ: IEEE, 2019: 80−87

    [108]

    Agarwal S, Farid H, Gu Yuming, et al. Protecting world leaders against deep fakes [C] //Proc of the 32nd IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops. Piscataway, NJ: IEEE, 2019: 38−45

    [109]

    Agarwal S, Farid H, Fried O, et al. Detecting deep-fake videos from phoneme-viseme mismatches[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops. Piscataway, NJ: IEEE, 2020: 660−661

    [110]

    Yang Xin, Li Yuezun, Lyu Siwei. Exposing deep fakes using inconsistent head poses[C] //Proc of the 44th IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2019: 8261−8265

    [111]

    Ciftci U A, Demir I, Yin Lijun. FakeCatcher: Detection of synthetic portrait videos using biological signals[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020 [2021-10-15]. https://doi.org/10.1109/T PAMI.2020.3009287

    [112]

    Fernandes S, Raj S, Ewetz R, et al. Detecting deepfake videos using attribution-based confidence metric[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops. Piscataway, NJ: IEEE, 2020: 308−309

    [113]

    Jha S, Raj S, Fernandes S, et al. Attribution-based confidence metric for deep neural networks[C] //Proc of the 32nd Int Conf on Neural Information Processing Systems. La Jolla, CA : NIPS, 2019: 11826−11837

    [114]

    McCloskey S, Albright M. Detecting GAN-generated imagery using color cues[J]. arXiv preprint, arXiv: 1812.08247, 2018

    [115]

    Guarnera L, Giudice O, Battiato S. Deepfake detection by analyzing convolutional traces[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops. Piscataway, NJ: IEEE, 2020: 666−667.

    [116]

    Qian Yuyang, Yin Guojun, Sheng Lu, et al. Thinking in frequency: Face forgery detection by mining frequency-aware clues[C] //Proc of the 16th European Conf on Computer Vision. Berlin: Springer, 2020: 86−103

    [117]

    Masi I, Killekar A, Mascarenhas R M, et al. Two-branch recurrent network for isolating deepfakes in videos[C] //Proc of the 16th European Conf on Computer Vision. Berlin: Springer, 2020: 667−684

    [118]

    Liu Honggu, Li Xiaodan, Zhou Wenbo, et al. Spatial-phase shallow learning: Rethinking face forgery detection in frequency domain[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 772−781

    [119]

    Agarwal S, Farid H, EL-Gaaly T, et al. Detecting deepfake videos from appearance and behavior[C/OL] //Proc of the 12th IEEE Int Workshop on Information Forensics and Security (WIFS). Piscataway, NJ: IEEE, 2020 [2021-10-15]. https://doi.org/10.1109/WIFS49906.2020.9360904

    [120]

    Wiles O, Koepke A, Zisserman A. Self-supervised learning of a facial attribute embedding from video[J]. arXiv preprint, arXiv: 1808.06882, 2018

    [121]

    Cozzolino D, Rossler A, Thies J, et al. Id-reveal: Identity-aware deepfake video detection[J]. arXiv preprint, arXiv: 2012.02512, 2020

    [122]

    Dong Xiaoyi, Bao Jianmin, Chen Dongdong, et al. Identity-driven deepfake detection[J]. arXiv preprint, arXiv: 2012.03930, 2020

    [123]

    Jiang Jun, Wang Bo, Li Bing, et al. Practical face swapping detection based on identity spatial constraints[C] //Proc of the 7th IEEE Int Joint Conf on Biometrics (IJCB). Piscataway, NJ: IEEE, 2021: 1−8

    [124]

    Lewis J K, Toubal I E, Chen Helen, et al. Deepfake video detection based on spatial, spectral, and temporal inconsistencies using multi-modal deep learning[C/OL] //Proc of the 49th IEEE Applied Imagery Pattern Recognition Workshop (AIPR). Piscataway, NJ: IEEE, 2020 [2021-10-15]. https://doi.org/10.1109/AIPR50011.2020.9425167

    [125]

    Lomnitz M, Hampel-arias Z, Sandesara V, et al. Multimodal approach for deepfake detection[C/OL] //Proc of the 49th IEEE Applied Imagery Pattern Recognition Workshop (AIPR). Piscataway, NJ: IEEE, 2020 [2021-10-15]. https://doi.org/10.1109/AIPR50011.2020.9425192

    [126]

    Ravanelli M, Bengio Y. Speaker recognition from raw waveform with SincNet[C] //Proc of the 7th IEEE Spoken Language Technology Workshop(SLT). Piscataway, NJ: IEEE, 2018: 1021−1028

    [127]

    Mittal T, Bhattacharya U, Chandra R, et al. Emotions don’t lie: An audio-visual deepfake detection method using affective cues[C] //Proc of the 28th ACM Int Conf on Multimedia. New York: ACM, 2020: 2823−2832

    [128]

    Hosler B, Salvi D, Murray A, et al. Do deepfakes feel emotions? A semantic approach to detecting deepfakes via emotional inconsistencies[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 1013−1022

    [129]

    Afchar D, Nozick V, Yamagishi J, et al. MesoNet: A compact facial video forgery detection network[C/OL] //Proc of the 10th IEEE Int Workshop on Information Forensics and Security (WIFS). Piscataway, NJ: IEEE, 2018 [2021-10-15]. https://doi.org/10.1109/WIFS.2018.8630761

    [130]

    Jain A, Singh R, Vatsa M. On detecting GANs and retouching based synthetic alterations[C/OL] //Proc of the 9th Int Conf on Biometrics Theory, Applications and Systems (BTAS). Piscataway, NJ: IEEE, 2018 [2021-10-15]. https://doi.org/10.1109/BTAS.2018.8698545

    [131]

    Wang Run, Xu Juefei, Ma Lei, et al. FakeSpotter: A simple yet robust baseline for spotting ai-synthesized fake faces[J]. arXiv preprint, arXiv: 1909.06122, 2019

    [132]

    Dang Hao, Liu Feng, Stehouwer J, et al. On the detection of digital face manipulation[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 5781−5790

    [133]

    Hsu C C, Zhuang Yixiu, Lee C Y. Deep fake image detection based on pairwise learning[J/OL]. Applied Sciences, 2020 [2021-10-15]. https://doi.org/10.3390/app10010370

    [134]

    Khalid H, Woo S S. Oc-fakedect: Classifying deepfakes using one-class variational autoencoder[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops. Piscataway, NJ: IEEE, 2020: 656−657

    [135]

    Rana M S, Sung A H. DeepfakeStack: A deep ensemble-based learning technique for deepfake detection[C] //Proc of the 7th IEEE Int Conf on Cyber Security and Cloud Computing(CSCloud)/IEEE Int Conf on Edge Computing and Scalable Cloud (EdgeCom). Piscataway, NJ: IEEE, 2020: 70−75

    [136]

    Bonettini N, Cannas E D, Mandelli S, et al. Video face manipulation detection through ensemble of CNNs[C] //Proc of the 31st Int Conf on Pattern Recognition (ICPR). Piscataway, NJ: IEEE, 2021: 5012−5019

    [137]

    Kim M, Tariq S, Woo S S. FReTal: Generalizing deepfake detection using knowledge distillation and representation learning[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 1001−1012

    [138]

    Aneja S, Niessner M. Generalized zero and few-shot transfer for facial forgery detection[J]. arXiv preprint, arXiv: 2006.11863, 2020

    [139]

    Wang Chengrui, Deng Weihong. Representative forgery mining for fake face detection[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 14923−14932

    [140]

    Zhao Hanqing, Zhou Wenbo, Chen Dongdong, et al. Multi-attentional deepfake detection[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 2185−2194

    [141]

    Kumar P, Vatsa M, Singh R. Detecting face2face facial reenactment in videos[C] //Proc of the 20th IEEE/CVF Winter Conf on Applications of Computer Vision. Piscataway, NJ: IEEE, 2020: 2589− 2597

    [142]

    Jeon H, Bang Y, Woo S S. FdftNet: Facing off fake images using fake detection fine-tuning network[C] //Proc of the 35th IFIP Int Conf on ICT Systems Security and Privacy Protection. Berlin: Springer, 2020: 416−430

    [143]

    Wang Shengyu, Wang O, Zhang R, et al. CNN-generated images are surprisingly easy to spot for now[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 8695−8704

    [144]

    Liu Zhengzhe, Qi Xiaojuan, Torr P. Global texture enhancement for fake face detection in the wild[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 8060−8069

    [145]

    Wodajo D, Atnafu S. Deepfake video detection using convolutional vision transformer[J]. arXiv preprint, arXiv: 2102.11126, 2021

    [146]

    Wang Junke, Wu Zuxuan, Chen Jingjing, et al. M2tr: Multi-modal multi-scale transformers for deepfake detection[J]. arXiv preprint, arXiv: 2104.09770, 2021

    [147]

    Heo Y, Choi Y, Lee Y, et al. Deepfake detection scheme based on vision transformer and distillation[J]. arXiv preprint, arXiv: 2104.01353, 2021

    [148]

    Dolhansky B, Howes R, Pflaum B, et al. The deepfake detection challenge (DFDC) preview dataset[J]. arXiv preprint, arXiv: 1910.08854, 2019

    [149]

    Li Yuezun, Yang Xin, Sun Pu, et al. Celeb-DF: A large-scale challenging dataset for deepfake forensics[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 3207−3216

    [150]

    Ondyari. Deepfake detection (DFD) dataset [DB/OL]. 2018 [2021-10-15]. https://github.com/ondyari/FaceForensics

    [151]

    Koeshunov P, Marcel S. Deepfakes: A new threat to face recognition? assessment and detection[J]. arXiv preprint, arXiv: 1812.08685, 2018

    [152]

    Li Yuezun, Yang Xin, Sun Pu, et al. Celeb-DF (v2): A new dataset for deepfake forensics[J]. arXiv preprint, arXiv: 1909.12962, 2019

    [153]

    Ruiz N, Bargal S A, Sclaroff S. Disrupting deepfakes: Adversarial attacks against conditional image translation networks and facial manipulation systems[C] //Proc of the 16th European Conf on Computer Vision. Berlin: Springer, 2020: 236−251

    [154]

    Huang Qidong, Zhang Jie, Zhou Wenbo, et al. Initiative defense against facial manipulation[C] //Proc of the 35th AAAI Conf on Artificial Intelligence. New York: ACM, 2021: 1619−1627

    [155]

    Dong Junhao, Xie Xiaohua. Visually maintained image disturbance against deepfake face swapping [C/OL] //Proc of the 22nd IEEE Int Conf on Multimedia and Expo (ICME). Piscataway, NJ: IEEE, 2021 [2021-10-15]. https://doi.org/10.1109/ICME51207.2021.9428173

    [156]

    Neves J C, Tolosana R, Vera-rodriguez R, et al. Real or fake? Spoofing state-of-the-art face synthesis detection systems[J]. arXiv preprint, arXiv: 1911.05351, 2019

    [157]

    Carlini N, Farid H. Evading deepfake-image detectors with white- and black-box attacks[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops. Piscataway, NJ: IEEE, 2020: 658−659

    [158]

    Hussain S, Neekhara P, Jere M, et al. Adversarial deepfakes: Evaluating vulnerability of deepfake detectors to adversarial examples [C] //Proc of the 21st IEEE/CVF Winter Conf on Applications of Computer Vision. Piscataway, NJ: IEEE, 2021: 3348− 3357

    [159]

    Patel T B, Patil H A. Cochlear filter and instantaneous frequency based features for spoofed speech detection[J]. IEEE Journal of Selected Topics in Signal Processing, 2016, 11(4): 618−631

    [160]

    Tom F, Jain M, Dey P. End-to-end audio replay attack detection using deep convolutional networks with attention.[C] //Proc of the 20th Interspeech. 2018 [2021-10-15]. https://www.isca-speech.org/archive_v0/Interspeech_2018/abstracts/2279.html

    [161]

    Das R K, Yang Jichen, Li Haizhou. Assessing the scope of generalized counter-measures for anti-spoofing[C] //Proc of the 45th IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2020: 6589−6593

    [162]

    Lavrentyeva G, Novoselov S, Malykh E, et al. Audio replay attack detection with deep learning frameworks[C] //Proc of the 19th Interspeech. 2017 [2021-10-15]. https://www.isca-speech.org/archive_v0/Interspeech_2017/abstracts/0360.html

    [163]

    Wu Xiang, He Ran, Sun Zhenan, et al. A light CNN for deep face representation with noisy labels[J]. IEEE Transactions on Information Forensics and Security, 2018, 13(11): 2884−2896 doi: 10.1109/TIFS.2018.2833032

    [164]

    Lavrentyeva G, Novoselov S, Tseren A, et al. Stc anti-spoofing systems for the ASVspoof 2019 challenge[J]. arXiv preprint, arXiv: 1904.05576, 2019

    [165]

    Cai Weicheng, Wu Haiwei, Cai Danwei, et al. The DKU replay detection system for the ASVspoof 2019 challenge: On data augmentation, feature representation, classification, and fusion[J]. arXiv preprint, arXiv: 1907.02663, 2019

    [166]

    Lai C I, Chen Nanxin, Villalba J, et al. Assert: Anti-spoofing with squeeze-excitation and residual networks[J]. arXiv preprint, arXiv: 1904.01120, 2019

    [167]

    Parasu P, Epps J, Sriskandaraja K, et al. Investigating light-resnet architecture for spoofing detection under mismatched conditions[C] // Proc of the 22nd Interspeech. 2020 [2021-10-15]. https://www.isca-speech.org/archive_v0/Interspeech_2020/abstracts/2039.html

    [168]

    Ma Haoxin, Yi Jiangyan, Tao Jianhua, et al. Continual learning for fake audio detection[J]. arXiv preprint, arXiv: 2104.07286, 2021

    [169]

    Li Zzhizhong, Hoiem D. Learning without forgetting[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(12): 2935−2947

    [170]

    Dolhansky B, Bitton J, Pflaum B, et al. The deepfake detection challenge (DFDC) dataset[J]. arXiv preprint, arXiv: 2006.07397, 2020

    [171]

    Peng Bo, Fan Hongxing, Wang Wei, et al. DFGC 2021: A deepfake game competition[J]. arXiv preprint, arXiv: 2106.01217, 2021

    [172]

    Zi Bojia, Chang Minghao, Chen Jingjing, et al. Wild Deepfake: A challenging real-world dataset for deepfake detection[C] //Proc of the 28th ACM Int Conf on Multimedia. New York: ACM, 2020: 2382−2390

    [173]

    Jiang Liming, Li Ren, Wu W, et al. DeeperForensics-1.0: A large-scale dataset for real-world face forgery detection[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 2889−2898

    [174]

    Fox G, Liu Wentao, Kim H, et al. Video ForensicsHQ: Detecting high-quality manipulated face videos[C/OL] //Proc of the 22nd IEEE Int Conf on Multimedia and Expo (ICME). Piscataway, NJ: IEEE, 2021 [2021-10-15]. https://doi.org/10.1109/ICME51207.2021.9428101

    [175]

    Zhou Tianfei, Wang Wenguan, Liang Zhiyuan, et al. Face forensics in the wild[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 5778−5788

    [176]

    He Yinan, Gan Bei, Chen Siyu, et al. ForgeryNet: A versatile benchmark for comprehensive forgery analysis[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 4360−4369

    [177]

    Khalid H, Tariq S, Woo S S. FakeAVCeleb: A novel audio-video multimodal deepfake dataset[J]. arXiv preprint, arXiv: 2108.05080, 2021

    [178]

    University of Edinburgh, the Centre for Speech Technology Research (CSTR). ASVspoof 2015 database[DB/OL]. 2015 [2021-10-15]. https://datasha re.ed.ac.uk/handle/10283/853

    [179]

    University of Edinburgh, the Centre for Speech Technology Research (CSTR). ASVspoof 2017 database [DB/OL]. 2017 [2021-10-15]. https://datashar e.ed.ac.uk/handle/10283/3055.

    [180]

    University of Edinburgh, the Centre for Speech Technology Research (CSTR). ASVspoof 2019 database [DB/OL]. 2019 [2021-10-15]. https://datashar e.ed.ac.uk/handle/10283/3336.

    [181]

    Krishnan P, Kovvuri R, Pang Guan, et al. Textstyle brush: Transfer of text aesthetics from a single example[J]. arXiv preprint, arXiv: 2106.08385, 2021

  • 期刊类型引用(4)

    1. 马良玉,程东炎,梁书源,耿妍竹,段新会. 基于LightGBM-VIF-MIC-SFS的风电机组故障诊断输入特征选择方法. 热力发电. 2024(01): 154-164 . 百度学术
    2. 王永兴,王彦坤. 基于改进蚁狮算法的智慧赋能工厂装配线任务分配优化. 自动化与仪器仪表. 2024(03): 167-170 . 百度学术
    3. 韦修喜,彭茂松,黄华娟. 基于多策略改进蝴蝶优化算法的无线传感网络节点覆盖优化. 计算机应用. 2024(04): 1009-1017 . 百度学术
    4. 刘艺,杨国利,郑奇斌,李翔,周杨森,陈德鹏. 无人系统数据融合流水线架构设计. 计算机应用. 2024(08): 2536-2543 . 百度学术

    其他类型引用(3)

图(5)  /  表(4)
计量
  • 文章访问数:  1241
  • HTML全文浏览量:  144
  • PDF下载量:  495
  • 被引次数: 7
出版历程
  • 收稿日期:  2021-11-11
  • 修回日期:  2022-08-28
  • 网络出版日期:  2023-03-19
  • 刊出日期:  2023-05-31

目录

/

返回文章
返回