Citation: | Li Zeyu, Zhang Xuhong, Pu Yuwen, Wu Yiming, Ji Shouling. A Survey on Multimodal Deepfake and Detection Techniques[J]. Journal of Computer Research and Development, 2023, 60(6): 1396-1416. DOI: 10.7544/issn1000-1239.202111119 |
With the application of all kinds of deep learning generation models in various fields, the authenticity of their generated multimedia files has become increasingly difficult to distinguish, therefore, deepfake technology has been born and developed. Utilizing deep learning related techniques, the deepfake technology can tamper with the facial identity information, expressions, and body movements in videos or pictures, and generate fake voice of a specific person. Since 2018, when Deepfakes sparked a wave of face swapping on social networks, a large number of deepfake methods have been proposed, which had demonstrated their potential applications in education, entertainment, and some other fields. But at the same time, the negative impact of deepfake on public opinion, judicial and criminal investigations, etc. can not be ignored. As a consequence, more and more countermeasures have been proposed to prevent deepfake from being utilized by the criminals, such as the detection of deepfake and watermark. Firstly, a review and summary of deepfake technologies of different modal types and corresponding detection technologies are carried out, and the existing researches are analyzed and classified according to the research purpose and research method. Secondly, the video and audio datasets widely used in the recent studies are summarized. Finally, the opportunities and challenges for future development in this field are discussed.
[1] |
Mirsky Y, Lee W. The creation and detection of deepfakes: A survey[J]. ACM Computing Surveys, 2021, 54(1): 264−263
|
[2] |
Kingma D P, Welling M. Auto-encoding variational Bayes[J]. arXiv preprint, arXiv: 1312.6114, 2013
|
[3] |
Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[C] //Proc of the 27th Int Conf on Neural Information Processing Systems. La Jolla, CA : NIPS, 2014: 2672−2680
|
[4] |
Isola P, Zhu Junyan, Zhou Tinghui, et al. Image-to-image translation with conditional adversarial networks[C] //Proc of the 30th IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2017: 1125−1134
|
[5] |
Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation[C] //Proc of the 18th Int Conf on Medical Image Computing and Computer-assisted Intervention. Berlin: Springer, 2015: 234−241
|
[6] |
Wang Tingchun, Liu Mingyu, Zhu Yanjun, et al. High-resolution image synthesis and semantic manipulation with conditional GANs[C] //Proc of the 31st IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018: 8798−8807
|
[7] |
Wang Tingchun, Liu Mingyu, Zhu Yanjun, et al. Video-to-video synthesis[J]. arXiv preprint, arXiv: 1808.06601, 2018
|
[8] |
Zhu Junyan, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C] //Proc of the 30th IEEE Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2017: 2223−2232
|
[9] |
Huang Gao, Liu Zhuang, Van Der Maaten L, et al. Densely connected convolutional networks[C] //Proc of the 30th IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2017: 4700−4708
|
[10] |
He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Deep residual learning for image recognition[C] //Proc of the 29th IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 770−778
|
[11] |
Chollet F. Xception: Deep learning with depthwise separable convolutions [C] //Proc of the 30th IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2017: 1251−1258
|
[12] |
Rossler A, Cozzolino D, Verdoliva L, et al. FaceForensics++: Learning to detect manipulated facial images [C] //Proc of the 17th IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2019: 1−11
|
[13] |
Dale K, Sunkavalli K, Johnson M K, et al. Video face replacement [J]. ACM Transactions on Graphics, 2011, 30(6): 8: 1−8: 10
|
[14] |
torzdf. Deepfakes [CP/OL] 2017 [2021-10-15]. https://github.com/deepfakes/face swap
|
[15] |
Korshunova I, Shi Wenzhe, Dambre J, et al. Fast face-swap using convolutional neural networks[C] //Proc of the 16th IEEE Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2017: 3677−3685
|
[16] |
Ulyanov D, Lebedev V, Vedaldi A, et al. Texture networks: Feed-forward synthesis of textures and stylized images[C] //Proc of the 33rd Int Conf on Machine Learning. New York: PMLR, 2016: 1349− 1357
|
[17] |
Shaoanlu. Fceswap-GAN [CP/OL]. 2017 [2021-10-15]. https://github.com/shaoa nlu/faceswap-GAN
|
[18] |
Natsume R, Yatagawa T, Morishima S. FsNet: An identity-aware generative model for image-based face swapping[C] //Proc of the 14th Asian Conf on Computer Vision. Berlin: Springer, 2018: 117−132
|
[19] |
Natsume R, Yatagawa T, Morishima S. RSGAN: Face swapping and editing using face and hair representation in latent spaces[J]. arXiv preprint, arXiv: 1804.03447, 2018.
|
[20] |
Nirkin Y, Keller Y, Hassner T. FSGAN: Subject agnostic face swapping and reenactment[C] //Proc of the 17th IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2019: 7184−7193
|
[21] |
Li Lingzhi, Bao Jianmin, Yang Hao, et al. Faceshifter: Towards high fidelity and occlusion aware face swapping[J]. arXiv preprint, arXiv: 1912.13457, 2019
|
[22] |
Chen Renwang, Chen Xuanhong, Ni Bingbing, et al. Simswap: An efficient framework for high fidelity face swapping[C] //Proc of the 28th ACM Int Conf on Multimedia. New York: ACM, 2020: 2003−2011
|
[23] |
Zhu Yuhao, Li Qi, Wang Jian, et al. One shot face swapping on megapixels [C] //Proc of the 18th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 4834−4844
|
[24] |
Lin Yuan, Lin Qian, Tang Feng, et al. Face replacement with large-pose differences[C] //Proc of the 20th ACM Int Conf on Multimedia. New York: ACM, 2012: 1249−1250
|
[25] |
Min Feng, Sang Nong, Wang Zhefu. Automatic face replacement in video based on 2D morphable model[C] //Proc of the 20th Int Conf on Pattern Recognition. Piscataway, NJ: IEEE, 2010: 2250−2253
|
[26] |
Moniz J R A, Beckham C, Rajotte S, et al. Unsupervised depth estimation, 3D face rotation and replacement[J]. arXiv preprint, arXiv: 1803.09202, 2018
|
[27] |
Thies J, Zollhofer M, Niessner M, et al. Real-time expression transfer for facial reenactment[J]. ACM Transactions on Graphics, 2015, 34(6): 183: 1−183: 4
|
[28] |
Thies J, Zollhofer M, Stamminger M, et al. Face2Face: Real-time face capture and reenactment of rgb videos[C] //Proc of the 29th IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 2387−2395
|
[29] |
Thies J, Zollhofer M, Theobalt C, et al. Headon: Real-time reenactment of human portrait videos[J]. ACM Transactions on Graphics, 2018, 37(4): 164: 1−164: 13
|
[30] |
Kim H, Garrido P, Tewari A, et al. Deep video portraits[J]. ACM Transactions on Graphics, 2018, 37(4): 163: 1−163: 14
|
[31] |
Nagano K, Seo J, Xing Jun, et al. PaGAN: Real-time avatars using dynamic textures[J]. ACM Transactions on Graphics (TOG), 2018, 37(6): 258: 1−258: 12
|
[32] |
Geng Jiahao, Shao Tianjia, Zheng Youyi, et al. Warp-guided GANs for single-photo facial animation[J]. ACM Transactions on Graphics, 2018, 37(6): 231: 1−231: 12
|
[33] |
Wang Yaohui, Bilinski P, Bremond F, et al. Imaginator: Conditional spatio-temporal GAN for video generation[C] //Proc of the 20th IEEE/CVF Winter Conf on Applications of Computer Vision. Piscataway, NJ: IEEE, 2020: 1160−1169
|
[34] |
Siarohin A, Lathuiliere S, Tulyakov S, et al. Animating arbitrary objects via deep motion transfer[C] //Proc of the 32nd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2019: 2377−2386
|
[35] |
Siarohin A, Lathuiliere S, Tulyakov S, et al. First order motion model for image animation[C] //Proc of the 32nd Int Conf on Neural Information Processing Systems. La Jolla, CA : NIPS, 2019: 7137−7147
|
[36] |
Qian Shengju, Lin K Y, Wu W, et al. Make a face: Towards arbitrary high fidelity face manipulation[C] //Proc of the 32nd IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2019: 10033−10042
|
[37] |
Song Linsen, Wu W, Fu Chaoyou, et al. Pareidolia face reenactment[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 2236−2245
|
[38] |
Pumarola A, Agudo A, Martinez A M, et al. GANimation: Anatomically-aware facial animation from a single image[C] //Proc of the 15th European Conf on Computer Vision (ECCV). Berlin: Springer, 2018: 818−833
|
[39] |
Tripathy S, Kannala J, Rahtu E. FACEGAN: Facial attribute controllable reenactment gan[C] //Proc of the 21st IEEE/CVF Winter Conf on Applications of Computer Vision. Piscataway, NJ: IEEE, 2021: 1329−1338
|
[40] |
Gu Kuangxiao, Zhou Yuqian, Huang T. FLNet: Landmark driven fetching and learning network for faithful talking facial animation synthesis[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 10861−10868
|
[41] |
Xu Runze, Zhou Zhiming, Zhang Weinan, et al. Face transfer with generative adversarial network[J]. arXiv preprint, arXiv: 1710.06090, 2017
|
[42] |
Bansal A, Ma Shugao, Ramanan D, et al. RecycleGan: Unsupervised video retargeting[C] //Proc of the 15th European Conf on Computer Vision (ECCV). Berlin: Springer, 2018: 119−135
|
[43] |
Wu W, Zhang Yunxuan, Li Cheng, et al. ReenactGAN: Learning to reenact faces via boundary transfer[C] //Proc of the 15th European Conf on Computer Vision (ECCV). Berlin: Springer, 2018: 603−619
|
[44] |
Zhang Jiangning, Zeng Xianfang, Wang Mengmeng, et al. FReeNet: Multi-identity face reenactment[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 5326−5335
|
[45] |
Zhang Jiangning, Zeng Xianfang, Pan Yusu, et al. FaceSwapNet: Landmark guided many-to-many face reenactment[J]. arXiv preprint, arXiv: 1905.11805, 2019
|
[46] |
Tripathy S, Kannala J, Rahtu E. ICface: Interpretable and controllable face reenactment using GANs[C] //Proc of the 20th IEEE/CVF Winter Conf on Applications of Computer Vision. Piscataway, NJ: IEEE, 2020: 3385−3394
|
[47] |
Wiles O, Koepke A, Zisserman A. X2Face: A network for controlling face generation using images, audio, and pose codes[C] //Proc of the 15th European Conf on Computer Vision (ECCV). Berlin: Springer, 2018: 670−686
|
[48] |
Shen Yujun, Luo Ping, Yan Junjie, et al. Faceid-GAN: Learning a symmetry three-player GAN for identity-preserving face synthesis[C] //Proc of the 31st IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018: 821−830
|
[49] |
Shen Yujun, Zhou Bolei, Luo Ping, et al. FaceFeat-GAN: A two-stage approach for identity-preserving face synthesis[J]. arXiv preprint, arXiv: 1812.01288, 2018
|
[50] |
Wang Tingchun, Liu Mingyu, Tao A, et al. Few-shot video-to-video synthesis[J]. arXiv preprint, arXiv: 1910.12713, 2019
|
[51] |
Zakharov E, Shysheya A, Burkov E, et al. Few-shot adver-sarial learning of realistic neural talking head models[C] //Proc of the 32nd IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2019: 9459−9468
|
[52] |
Burkov E, Pasechnik I, Grigorev A, et al. Neural head reenactment with latent pose descriptors[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 13786−13795
|
[53] |
Ha S, Kersner M, Kim B, et al. MarioNETte: Few-shot face reenactment preserving identity of unseen targets[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 10893−10900
|
[54] |
Hao Hanxiang, Baireddy S, Reibman A R, et al. Far-GAN for one-shot face reenactment[J]. arXiv preprint, arXiv: 2005.06402, 2020
|
[55] |
Fried O, Tewari A, Zollhofer M, et al. Text-based editing of talking-head video[J]. ACM Transactions on Graphics, 2019, 38(4): 68: 1−68: 14
|
[56] |
Kumar R, Sotelo J, Kumar K, et al. ObamaNet: Photo-realisticlip-sync from text[J]. arXiv preprint, arXiv: 1801.01442, 2017
|
[57] | |
[58] |
Jamaludin A, Chung J S, Zisserman A. You said that?: Synthesising talking faces from audio[J]. International Journal of Computer Vision, 2019, 127(11): 1767−1779
|
[59] |
Vougioukas K, Petridis S, Pantic M. Realistic speech-driven facial animation with GANs[J]. International Journal of Computer Vision, 2020, 128(5): 1398−1413 doi: 10.1007/s11263-019-01251-8
|
[60] |
Suwajanakorn S, Seitz S M, Kemelmacher-shlizerman I. Synthesizing Obama: Learning lip sync from audio[J]. ACM Transactions on Graphics, 2017, 36(4): 95: 1−95: 13
|
[61] |
Chen Lele, Maddox R K, Duan Zhiyao, et al. Hierarchical cross-modal talking face generation with dynamic pixel-wise loss[C] //Proc of the 32nd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2019: 7832−7841
|
[62] |
Zhou Hang, Liu Yu, Liu Ziwei, et al. Talking face generation by adversarially disentangled audio-visual representation[C] //Proc of the 33rd AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2019: 9299−9306
|
[63] |
Thies J, Elgharib M, Tewari A, et al. Neural voice puppetry: Audio-driven facial reenactment[C] //Proc of the 16th European Conf on Computer Vision (ECCV). Berlin: Springer, 2020: 716−731
|
[64] |
Hannun A, Case C, Casper J, et al. DeepSpeech: Scaling up end-to-end speech recognition[J]. arXiv preprint, arXiv: 1412.5567, 2014
|
[65] |
Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks[C] //Proc of the 32nd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2019: 4401−4410
|
[66] |
Karras T, Laine S, Airtala M, et al. Analyzing and improving the image quality of StyleGAN[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 8110−8119
|
[67] |
Karras T, Aittala M, Laine S, et al. Alias-free generative adversarial networks[J]. arXiv preprint, arXiv: 2106.12423, 2021
|
[68] |
Choi Y, Choi M, Kim M, et al. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation[C] //Proc of the 31st IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018: 8789−8797
|
[69] |
Choi Y, Uh Y, Yoo J, et al. StarGAN v2: Diverse image synthesis for multiple domains[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 8188−8197
|
[70] |
Sanchez E, Valstar M. Triple consistency loss for pairing distributions in GAN-based face synthesis[J]. arXiv preprint, arXiv: 1811.03492, 2018
|
[71] |
Kim D, Khan M A, Choo J. Not just compete, but collaborate: Local image-to-image translation via cooperative mask prediction[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 6509−6518
|
[72] |
Li Xinyang, Zhang Shengchuan, Hu Jie, et al. Image-to-image translation via hierarchical style disentanglement[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 8639−8648
|
[73] |
Aberman K, Shi Mingyi, Liao Jing, et al. Deep video-based performance cloning[J]. Computer Graphics Forum, 2019, 38(2): 219−233 doi: 10.1111/cgf.13632
|
[74] |
Chan C, Ginosar S, Zhou Tinghui, et al. Everybody Dance Now [C] //Proc of the 32nd IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2019: 5933−5942
|
[75] |
Liu Lingjie, Xu Weipeng, Zollhofer M, et al. Neural rendering and reenactment of human actor videos[J]. ACM Transactions on Graphics, 2019, 38(5): 139: 1−139: 14
|
[76] |
Tokuda K, Nankaku Y, Toda T, et al. Speech synthesis based on hidden Markov models[J]. Proceedings of the IEEE, 2013, 101(5): 1234−1252 doi: 10.1109/JPROC.2013.2251852
|
[77] |
Oord A, Dieleman S, Zen H, et al. WaveNet: A generative model for raw audio[J]. arXiv preprint, arXiv: 1609.03499, 2016
|
[78] |
Wang Yuxuan, Skerry-ryan R, Stanton D, et al. Tacotron: A fully end-to-end text-to-speech synthesis model[J]. arXiv preprint, arXiv: 1703.10135, 2017
|
[79] |
Shen J, Pang Ruoming, Weiss R J, et al. Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions[C] //Proc of the 43rd IEEE Int Conf on Acoustics, Speech and Signal Processing(ICASSP). Piscataway, NJ: IEEE, 2018: 4779−4783
|
[80] |
Fu Ruibo, Tao Jianhua, Wen Zhengqi, et al. Focusing on attention: Prosody transfer and adaptative optimization strategy for multi-speaker end-to-end speech synthesis[C] //Proc of the 45th IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2020: 6709−6713
|
[81] |
Kumar K, Kumar R, de Boissiere T, et al. MelGAN: Generative adversarial networks for conditional waveform synthesis[J]. arXiv preprint, arXiv: 1910.06711, 2019.
|
[82] |
Yang Geng, Yang Shan, Liu Kai, et al. Multi-band melgan: Faster waveform generation for high-quality text-to-speech[C] //Proc of the 8th IEEE Spoken Language Technology Workshop (SLT). Piscataway, NJ: IEEE, 2021: 492−498
|
[83] |
Kaneko T, Kameoka H. CycleGAN-VC: Non-parallel voice conversion using cycle-consistent adversarial networks[C] //Proc of the 27th European Signal Processing Conf (EUSIPCO). Piscataway, NJ: IEEE, 2018: 2100−2104
|
[84] |
Kaneko T, Kameoka H, Tanaka K, et al. CycleGAN-VC2: Improved cyclegan-based non-parallel voice conversion[C] //Proc of the 44th IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2019: 6820−6824
|
[85] |
Kaneko T, Kameoka H, Tanaka K, et al. CycleGAN-VC3: Examining and improving CycleGAN-VCs for mel-spectrogram conversion[J]. arXiv preprint, arXiv: 2010.11672, 2020
|
[86] |
Kameoka H, Kaneko T, Tanaka K, et al. StarGAN-VC: Non-parallel many-to-many voice conversion using star generative adversarial networks[C] //Proc of the 7th IEEE Spoken Language Technology Workshop (SLT). Piscataway, NJ: IEEE, 2018: 266−273
|
[87] |
Kaneko T, Kameoka H, Tanaka K, et al. StarGAN-VC2: Rethinking conditional methods for StarGAN-based voice conversion[J]. arXiv preprint, arXiv: 1907.12279, 2019
|
[88] |
Liu Ruolan, Chen Xiao, Wen Xue. Voice conversion with transformer network[C] //Proc of the 45th IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2020: 7759−7759
|
[89] |
Luong H T, Yamagishi J. Bootstrapping non-parallel voice conver-sion from speaker-adaptive text-to-speech[C] //Proc of the 16th IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). Piscataway, NJ: IEEE, 2019: 200−207
|
[90] |
Zhang Mingyang, Zhou Yi, Zhao Li, et al. Transfer learning from speech synthesis to voice conversion with non-parallel training data[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29(1): 1290−1302
|
[91] |
Huang Wenqin, Hayashi T, Wu Yiqiao, et al. Voice transformer network: Sequence-to-sequence voice conversion using transformer with text-to-speech pretraining[J]. arXiv preprint, arXiv: 1912.06813, 2019
|
[92] |
Matern F, Riess C, Stamminger M. Exploiting visual artifacts to expose deepfakes and face manipulations[C] //Proc of the 20th IEEE Winter Applications of Computer Vision Workshops (WACVW). Piscataway, NJ: IEEE, 2019: 83−92
|
[93] |
Zhou Peng, Han Xintong, Morariu V I, et al. Two-stream neural networks for tampered face detection[C] //Proc of the 30th IEEE Conf on Computer Vision and Pattern Recognition Workshops (CVPRW). Piscataway, NJ: IEEE, 2017: 1831−1839
|
[94] |
Nataraj L, Mohammed T M, Manjunath B, et al. Detecting GAN generated fake images using co-occurrence matrices[J]. Electronic Imaging, 2019 : 1−7
|
[95] |
Li Jiaming, Xie Hongtao, Li Jiahong, et al. Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 6458−6467
|
[96] |
Luo Yuchen, Zhang Yong, Yan Junchi, et al. Generalizing face forgery detection with high-frequency features[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 16317−16326
|
[97] |
Shang Zhihua, Xie Hongtao, Zha Zhengjun, et al. PrrNet: Pixel-region relation network for face forger1y detection[J/OL]. Pattern Recognition, 2021, 116 [2021-10-15]. https://doi.org/10.1016/j.patcog.2021.107950
|
[98] |
Li Yuezun, Lyu Siwei. Exposing deepfake videos by detecting face warping artifacts[J]. arXiv preprint, arXiv: 1811.00656, 2018
|
[99] |
Li Lingzhi, Bao Jianmin, Zhang Ting, et al. Face x-ray for more general face forgery detection[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 5001−5010
|
[100] |
Li Xurong, Yu Kun, Ji Shouling, et al. Fighting against deepfake: Patch&pair convolutional neural networks (PPCNN)[C] //Proc of the 29th the Web Conf . New York: ACM, 2020: 88−89
|
[101] |
Nguyen H, Fang Fuming, Yamagishi J, et al. Multi-task learning for detecting and segmenting manipulated facial images and videos[J]. arXiv preprint, arXiv: 1906.06876, 2019
|
[102] |
Nirkin Y, Wolf L, Keller Y, et al. Deepfake detection based on the discrepancy between the face and its context[J]. arXiv preprint, arXiv: 2008.12262, 2020
|
[103] |
Amerini I, Caldelli R. Exploiting prediction error in consistencies through LSTM-based classifiers to detect deepfake videos[C] //Proc of the 8th ACM Workshop on Information Hiding and Multimedia Security. New York: ACM, 2020: 97−102
|
[104] |
Amerini I, Galteri L, Caldelli R, et al. Deepfake video detection through optical flow based CNN[C] //Proc of the 32nd IEEE/CVF Int Conf on Computer Vision Workshops. Piscataway, NJ: IEEE, 2019: 1205−1207
|
[105] |
Guera D, Delp E J. Deepfake video detection using recurrent neural networks[C/OL] //Proc of the 15th IEEE Int Conf on Advanced Video and Signal Based Surveillance (AVSS). Piscataway, NJ: IEEE, 2018 [2021-10-15]. https://doi.org/10.1109/AVSS.2018.8639163
|
[106] |
Sun Zekun, Han Yujie, Hua Zeyu, et al. Improving the efficiency and robustness of deepfakes detection through precise geometric features[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 3609−3618
|
[107] |
Sabir E, Cheng Jiaxin, Jaiswal A, et al. Recurrent convolutional strategies for face manipulation detection in videos[C] //Proc of the 32nd IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops. Piscataway, NJ: IEEE, 2019: 80−87
|
[108] |
Agarwal S, Farid H, Gu Yuming, et al. Protecting world leaders against deep fakes [C] //Proc of the 32nd IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops. Piscataway, NJ: IEEE, 2019: 38−45
|
[109] |
Agarwal S, Farid H, Fried O, et al. Detecting deep-fake videos from phoneme-viseme mismatches[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops. Piscataway, NJ: IEEE, 2020: 660−661
|
[110] |
Yang Xin, Li Yuezun, Lyu Siwei. Exposing deep fakes using inconsistent head poses[C] //Proc of the 44th IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2019: 8261−8265
|
[111] |
Ciftci U A, Demir I, Yin Lijun. FakeCatcher: Detection of synthetic portrait videos using biological signals[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020 [2021-10-15]. https://doi.org/10.1109/T PAMI.2020.3009287
|
[112] |
Fernandes S, Raj S, Ewetz R, et al. Detecting deepfake videos using attribution-based confidence metric[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops. Piscataway, NJ: IEEE, 2020: 308−309
|
[113] |
Jha S, Raj S, Fernandes S, et al. Attribution-based confidence metric for deep neural networks[C] //Proc of the 32nd Int Conf on Neural Information Processing Systems. La Jolla, CA : NIPS, 2019: 11826−11837
|
[114] |
McCloskey S, Albright M. Detecting GAN-generated imagery using color cues[J]. arXiv preprint, arXiv: 1812.08247, 2018
|
[115] |
Guarnera L, Giudice O, Battiato S. Deepfake detection by analyzing convolutional traces[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops. Piscataway, NJ: IEEE, 2020: 666−667.
|
[116] |
Qian Yuyang, Yin Guojun, Sheng Lu, et al. Thinking in frequency: Face forgery detection by mining frequency-aware clues[C] //Proc of the 16th European Conf on Computer Vision. Berlin: Springer, 2020: 86−103
|
[117] |
Masi I, Killekar A, Mascarenhas R M, et al. Two-branch recurrent network for isolating deepfakes in videos[C] //Proc of the 16th European Conf on Computer Vision. Berlin: Springer, 2020: 667−684
|
[118] |
Liu Honggu, Li Xiaodan, Zhou Wenbo, et al. Spatial-phase shallow learning: Rethinking face forgery detection in frequency domain[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 772−781
|
[119] |
Agarwal S, Farid H, EL-Gaaly T, et al. Detecting deepfake videos from appearance and behavior[C/OL] //Proc of the 12th IEEE Int Workshop on Information Forensics and Security (WIFS). Piscataway, NJ: IEEE, 2020 [2021-10-15]. https://doi.org/10.1109/WIFS49906.2020.9360904
|
[120] |
Wiles O, Koepke A, Zisserman A. Self-supervised learning of a facial attribute embedding from video[J]. arXiv preprint, arXiv: 1808.06882, 2018
|
[121] |
Cozzolino D, Rossler A, Thies J, et al. Id-reveal: Identity-aware deepfake video detection[J]. arXiv preprint, arXiv: 2012.02512, 2020
|
[122] |
Dong Xiaoyi, Bao Jianmin, Chen Dongdong, et al. Identity-driven deepfake detection[J]. arXiv preprint, arXiv: 2012.03930, 2020
|
[123] |
Jiang Jun, Wang Bo, Li Bing, et al. Practical face swapping detection based on identity spatial constraints[C] //Proc of the 7th IEEE Int Joint Conf on Biometrics (IJCB). Piscataway, NJ: IEEE, 2021: 1−8
|
[124] |
Lewis J K, Toubal I E, Chen Helen, et al. Deepfake video detection based on spatial, spectral, and temporal inconsistencies using multi-modal deep learning[C/OL] //Proc of the 49th IEEE Applied Imagery Pattern Recognition Workshop (AIPR). Piscataway, NJ: IEEE, 2020 [2021-10-15]. https://doi.org/10.1109/AIPR50011.2020.9425167
|
[125] |
Lomnitz M, Hampel-arias Z, Sandesara V, et al. Multimodal approach for deepfake detection[C/OL] //Proc of the 49th IEEE Applied Imagery Pattern Recognition Workshop (AIPR). Piscataway, NJ: IEEE, 2020 [2021-10-15]. https://doi.org/10.1109/AIPR50011.2020.9425192
|
[126] |
Ravanelli M, Bengio Y. Speaker recognition from raw waveform with SincNet[C] //Proc of the 7th IEEE Spoken Language Technology Workshop(SLT). Piscataway, NJ: IEEE, 2018: 1021−1028
|
[127] |
Mittal T, Bhattacharya U, Chandra R, et al. Emotions don’t lie: An audio-visual deepfake detection method using affective cues[C] //Proc of the 28th ACM Int Conf on Multimedia. New York: ACM, 2020: 2823−2832
|
[128] |
Hosler B, Salvi D, Murray A, et al. Do deepfakes feel emotions? A semantic approach to detecting deepfakes via emotional inconsistencies[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 1013−1022
|
[129] |
Afchar D, Nozick V, Yamagishi J, et al. MesoNet: A compact facial video forgery detection network[C/OL] //Proc of the 10th IEEE Int Workshop on Information Forensics and Security (WIFS). Piscataway, NJ: IEEE, 2018 [2021-10-15]. https://doi.org/10.1109/WIFS.2018.8630761
|
[130] |
Jain A, Singh R, Vatsa M. On detecting GANs and retouching based synthetic alterations[C/OL] //Proc of the 9th Int Conf on Biometrics Theory, Applications and Systems (BTAS). Piscataway, NJ: IEEE, 2018 [2021-10-15]. https://doi.org/10.1109/BTAS.2018.8698545
|
[131] |
Wang Run, Xu Juefei, Ma Lei, et al. FakeSpotter: A simple yet robust baseline for spotting ai-synthesized fake faces[J]. arXiv preprint, arXiv: 1909.06122, 2019
|
[132] |
Dang Hao, Liu Feng, Stehouwer J, et al. On the detection of digital face manipulation[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 5781−5790
|
[133] |
Hsu C C, Zhuang Yixiu, Lee C Y. Deep fake image detection based on pairwise learning[J/OL]. Applied Sciences, 2020 [2021-10-15]. https://doi.org/10.3390/app10010370
|
[134] |
Khalid H, Woo S S. Oc-fakedect: Classifying deepfakes using one-class variational autoencoder[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops. Piscataway, NJ: IEEE, 2020: 656−657
|
[135] |
Rana M S, Sung A H. DeepfakeStack: A deep ensemble-based learning technique for deepfake detection[C] //Proc of the 7th IEEE Int Conf on Cyber Security and Cloud Computing(CSCloud)/IEEE Int Conf on Edge Computing and Scalable Cloud (EdgeCom). Piscataway, NJ: IEEE, 2020: 70−75
|
[136] |
Bonettini N, Cannas E D, Mandelli S, et al. Video face manipulation detection through ensemble of CNNs[C] //Proc of the 31st Int Conf on Pattern Recognition (ICPR). Piscataway, NJ: IEEE, 2021: 5012−5019
|
[137] |
Kim M, Tariq S, Woo S S. FReTal: Generalizing deepfake detection using knowledge distillation and representation learning[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 1001−1012
|
[138] |
Aneja S, Niessner M. Generalized zero and few-shot transfer for facial forgery detection[J]. arXiv preprint, arXiv: 2006.11863, 2020
|
[139] |
Wang Chengrui, Deng Weihong. Representative forgery mining for fake face detection[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 14923−14932
|
[140] |
Zhao Hanqing, Zhou Wenbo, Chen Dongdong, et al. Multi-attentional deepfake detection[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 2185−2194
|
[141] |
Kumar P, Vatsa M, Singh R. Detecting face2face facial reenactment in videos[C] //Proc of the 20th IEEE/CVF Winter Conf on Applications of Computer Vision. Piscataway, NJ: IEEE, 2020: 2589− 2597
|
[142] |
Jeon H, Bang Y, Woo S S. FdftNet: Facing off fake images using fake detection fine-tuning network[C] //Proc of the 35th IFIP Int Conf on ICT Systems Security and Privacy Protection. Berlin: Springer, 2020: 416−430
|
[143] |
Wang Shengyu, Wang O, Zhang R, et al. CNN-generated images are surprisingly easy to spot for now[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 8695−8704
|
[144] |
Liu Zhengzhe, Qi Xiaojuan, Torr P. Global texture enhancement for fake face detection in the wild[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 8060−8069
|
[145] |
Wodajo D, Atnafu S. Deepfake video detection using convolutional vision transformer[J]. arXiv preprint, arXiv: 2102.11126, 2021
|
[146] |
Wang Junke, Wu Zuxuan, Chen Jingjing, et al. M2tr: Multi-modal multi-scale transformers for deepfake detection[J]. arXiv preprint, arXiv: 2104.09770, 2021
|
[147] |
Heo Y, Choi Y, Lee Y, et al. Deepfake detection scheme based on vision transformer and distillation[J]. arXiv preprint, arXiv: 2104.01353, 2021
|
[148] |
Dolhansky B, Howes R, Pflaum B, et al. The deepfake detection challenge (DFDC) preview dataset[J]. arXiv preprint, arXiv: 1910.08854, 2019
|
[149] |
Li Yuezun, Yang Xin, Sun Pu, et al. Celeb-DF: A large-scale challenging dataset for deepfake forensics[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 3207−3216
|
[150] |
Ondyari. Deepfake detection (DFD) dataset [DB/OL]. 2018 [2021-10-15]. https://github.com/ondyari/FaceForensics
|
[151] |
Koeshunov P, Marcel S. Deepfakes: A new threat to face recognition? assessment and detection[J]. arXiv preprint, arXiv: 1812.08685, 2018
|
[152] |
Li Yuezun, Yang Xin, Sun Pu, et al. Celeb-DF (v2): A new dataset for deepfake forensics[J]. arXiv preprint, arXiv: 1909.12962, 2019
|
[153] |
Ruiz N, Bargal S A, Sclaroff S. Disrupting deepfakes: Adversarial attacks against conditional image translation networks and facial manipulation systems[C] //Proc of the 16th European Conf on Computer Vision. Berlin: Springer, 2020: 236−251
|
[154] |
Huang Qidong, Zhang Jie, Zhou Wenbo, et al. Initiative defense against facial manipulation[C] //Proc of the 35th AAAI Conf on Artificial Intelligence. New York: ACM, 2021: 1619−1627
|
[155] |
Dong Junhao, Xie Xiaohua. Visually maintained image disturbance against deepfake face swapping [C/OL] //Proc of the 22nd IEEE Int Conf on Multimedia and Expo (ICME). Piscataway, NJ: IEEE, 2021 [2021-10-15]. https://doi.org/10.1109/ICME51207.2021.9428173
|
[156] |
Neves J C, Tolosana R, Vera-rodriguez R, et al. Real or fake? Spoofing state-of-the-art face synthesis detection systems[J]. arXiv preprint, arXiv: 1911.05351, 2019
|
[157] |
Carlini N, Farid H. Evading deepfake-image detectors with white- and black-box attacks[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops. Piscataway, NJ: IEEE, 2020: 658−659
|
[158] |
Hussain S, Neekhara P, Jere M, et al. Adversarial deepfakes: Evaluating vulnerability of deepfake detectors to adversarial examples [C] //Proc of the 21st IEEE/CVF Winter Conf on Applications of Computer Vision. Piscataway, NJ: IEEE, 2021: 3348− 3357
|
[159] |
Patel T B, Patil H A. Cochlear filter and instantaneous frequency based features for spoofed speech detection[J]. IEEE Journal of Selected Topics in Signal Processing, 2016, 11(4): 618−631
|
[160] |
Tom F, Jain M, Dey P. End-to-end audio replay attack detection using deep convolutional networks with attention.[C] //Proc of the 20th Interspeech. 2018 [2021-10-15]. https://www.isca-speech.org/archive_v0/Interspeech_2018/abstracts/2279.html
|
[161] |
Das R K, Yang Jichen, Li Haizhou. Assessing the scope of generalized counter-measures for anti-spoofing[C] //Proc of the 45th IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2020: 6589−6593
|
[162] |
Lavrentyeva G, Novoselov S, Malykh E, et al. Audio replay attack detection with deep learning frameworks[C] //Proc of the 19th Interspeech. 2017 [2021-10-15]. https://www.isca-speech.org/archive_v0/Interspeech_2017/abstracts/0360.html
|
[163] |
Wu Xiang, He Ran, Sun Zhenan, et al. A light CNN for deep face representation with noisy labels[J]. IEEE Transactions on Information Forensics and Security, 2018, 13(11): 2884−2896 doi: 10.1109/TIFS.2018.2833032
|
[164] |
Lavrentyeva G, Novoselov S, Tseren A, et al. Stc anti-spoofing systems for the ASVspoof 2019 challenge[J]. arXiv preprint, arXiv: 1904.05576, 2019
|
[165] |
Cai Weicheng, Wu Haiwei, Cai Danwei, et al. The DKU replay detection system for the ASVspoof 2019 challenge: On data augmentation, feature representation, classification, and fusion[J]. arXiv preprint, arXiv: 1907.02663, 2019
|
[166] |
Lai C I, Chen Nanxin, Villalba J, et al. Assert: Anti-spoofing with squeeze-excitation and residual networks[J]. arXiv preprint, arXiv: 1904.01120, 2019
|
[167] |
Parasu P, Epps J, Sriskandaraja K, et al. Investigating light-resnet architecture for spoofing detection under mismatched conditions[C] // Proc of the 22nd Interspeech. 2020 [2021-10-15]. https://www.isca-speech.org/archive_v0/Interspeech_2020/abstracts/2039.html
|
[168] |
Ma Haoxin, Yi Jiangyan, Tao Jianhua, et al. Continual learning for fake audio detection[J]. arXiv preprint, arXiv: 2104.07286, 2021
|
[169] |
Li Zzhizhong, Hoiem D. Learning without forgetting[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(12): 2935−2947
|
[170] |
Dolhansky B, Bitton J, Pflaum B, et al. The deepfake detection challenge (DFDC) dataset[J]. arXiv preprint, arXiv: 2006.07397, 2020
|
[171] |
Peng Bo, Fan Hongxing, Wang Wei, et al. DFGC 2021: A deepfake game competition[J]. arXiv preprint, arXiv: 2106.01217, 2021
|
[172] |
Zi Bojia, Chang Minghao, Chen Jingjing, et al. Wild Deepfake: A challenging real-world dataset for deepfake detection[C] //Proc of the 28th ACM Int Conf on Multimedia. New York: ACM, 2020: 2382−2390
|
[173] |
Jiang Liming, Li Ren, Wu W, et al. DeeperForensics-1.0: A large-scale dataset for real-world face forgery detection[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 2889−2898
|
[174] |
Fox G, Liu Wentao, Kim H, et al. Video ForensicsHQ: Detecting high-quality manipulated face videos[C/OL] //Proc of the 22nd IEEE Int Conf on Multimedia and Expo (ICME). Piscataway, NJ: IEEE, 2021 [2021-10-15]. https://doi.org/10.1109/ICME51207.2021.9428101
|
[175] |
Zhou Tianfei, Wang Wenguan, Liang Zhiyuan, et al. Face forensics in the wild[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 5778−5788
|
[176] |
He Yinan, Gan Bei, Chen Siyu, et al. ForgeryNet: A versatile benchmark for comprehensive forgery analysis[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 4360−4369
|
[177] |
Khalid H, Tariq S, Woo S S. FakeAVCeleb: A novel audio-video multimodal deepfake dataset[J]. arXiv preprint, arXiv: 2108.05080, 2021
|
[178] |
University of Edinburgh, the Centre for Speech Technology Research (CSTR). ASVspoof 2015 database[DB/OL]. 2015 [2021-10-15]. https://datasha re.ed.ac.uk/handle/10283/853
|
[179] |
University of Edinburgh, the Centre for Speech Technology Research (CSTR). ASVspoof 2017 database [DB/OL]. 2017 [2021-10-15]. https://datashar e.ed.ac.uk/handle/10283/3055.
|
[180] |
University of Edinburgh, the Centre for Speech Technology Research (CSTR). ASVspoof 2019 database [DB/OL]. 2019 [2021-10-15]. https://datashar e.ed.ac.uk/handle/10283/3336.
|
[181] |
Krishnan P, Kovvuri R, Pang Guan, et al. Textstyle brush: Transfer of text aesthetics from a single example[J]. arXiv preprint, arXiv: 2106.08385, 2021
|
[1] | Guo Husheng, Zhang Yutong, Wang Wenjian. Elastic Gradient Ensemble for Concept Drift Adaptation[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440407 |
[2] | Guo Husheng, Zhang Yang, Wang Wenjian. Two-Stage Adaptive Ensemble Learning Method for Different Types of Concept Drift[J]. Journal of Computer Research and Development, 2024, 61(7): 1799-1811. DOI: 10.7544/issn1000-1239.202330452 |
[3] | Guo Husheng, Cong Lu, Gao Shuhua, Wang Wenjian. Adaptive Classification Method for Concept Drift Based on Online Ensemble[J]. Journal of Computer Research and Development, 2023, 60(7): 1592-1602. DOI: 10.7544/issn1000-1239.202220245 |
[4] | Cai Derun, Li Hongyan. A Metric Learning Based Unsupervised Domain Adaptation Method with Its Application on Mortality Prediction[J]. Journal of Computer Research and Development, 2022, 59(3): 674-682. DOI: 10.7544/issn1000-1239.20200693 |
[5] | Cai Huan, Lu Kezhong, Wu Qirong, Wu Dingming. Adaptive Classification Algorithm for Concept Drift Data Stream[J]. Journal of Computer Research and Development, 2022, 59(3): 633-646. DOI: 10.7544/issn1000-1239.20201017 |
[6] | Yu Xian, Li Zhenyu, Sun Sheng, Zhang Guangxing, Diao Zulong, Xie Gaogang. Adaptive Virtual Machine Consolidation Method Based on Deep Reinforcement Learning[J]. Journal of Computer Research and Development, 2021, 58(12): 2783-2797. DOI: 10.7544/issn1000-1239.2021.20200366 |
[7] | Bai Chenjia, Liu Peng, Zhao Wei, Tang Xianglong. Active Sampling for Deep Q-Learning Based on TD-error Adaptive Correction[J]. Journal of Computer Research and Development, 2019, 56(2): 262-280. DOI: 10.7544/issn1000-1239.2019.20170812 |
[8] | Zhang Yuanpeng, Deng Zhaohong, Chung Fu-lai, Hang Wenlong, Wang Shitong. Fast Self-Adaptive Clustering Algorithm Based on Exemplar Score Strategy[J]. Journal of Computer Research and Development, 2018, 55(1): 163-178. DOI: 10.7544/issn1000-1239.2018.20160937 |
[9] | Ma Anxiang, Zhang Bin, Gao Kening, Qi Peng, and Zhang Yin. Deep Web Data Extraction Based on Result Pattern[J]. Journal of Computer Research and Development, 2009, 46(2): 280-288. |
[10] | Dandan, Li Zusong, Wang Jian, Zhang Longbing, Hu Weiwu, Liu Zhiyong. Adaptive Stack Cache with Fast Address Generation[J]. Journal of Computer Research and Development, 2007, 44(1): 169-176. |