Citation: | Zheng Mingyu, Lin Zheng, Liu Zhengxiao, Fu Peng, Wang Weiping. Survey of Textual Backdoor Attack and Defense[J]. Journal of Computer Research and Development, 2024, 61(1): 221-242. DOI: 10.7544/issn1000-1239.202220340 |
In the deep learning community, lots of efforts have been made to enhance the robustness and the reliability of deep neural networks (DNNs). Previous research mainly analyzed the fragility of DNN from the perspective of adversarial attack, and researchers designed numerous adversarial attack and defense methods. However, with the wide application of pre-trained models (PTMs), a new security threat against DNN especially PTM, called backdoor attack is emerging. Backdoor attack aims at injecting hidden backdoors into DNN, such that the backdoored model behaves properly on normal inputs but produces attacker-specified malicious outputs on the poisoned inputs embedded with special triggers. Backdoor attack poses a severe threat against DNN based systems like spam filter or hate speech detector. Compared with the textual adversarial attack and defense which has been widely studied, textual backdoor attack and defense has not been thoroughly investigated and requires a systematic review. In this paper, we present a comprehensive survey of backdoor attack and defense methods in the text domain. Specifically, we first summarize and categorize the textual backdoor attack and defense methods from different perspectives, then we introduce typical work and analyze their pros and cons. We also enumerate widely adopted benchmark datasets and evaluation metrics in the current literatures. Moreover, we respectively compare the backdoor attack with two relevant threats (i.e., adversarial attack and data poisoning). Finally, we discuss existing challenges of backdoor attack and defense in the text domain and present several promising future directions in this emerging and rapidly growing research area.
[1] |
Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks [J]. arXiv preprint, arXiv: 1312. 6199, 2013
|
[2] |
Goodfellow I J, Shlens J, Szegedy C. Explaining and harnessing adversarial examples [J]. arXiv preprint, arXiv: 1412. 6572, 2014
|
[3] |
Devlin J, Chang Mingwei, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding [C] //Proc of the 14th Conf of the North American Chapter of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2019: 4171−4186
|
[4] |
Liu Yinhan, Ott M, Goyal N, et al. Roberta: A robustly optimized bert pretraining approach [J]. arXiv preprint, arXiv: 1907. 11692, 2019
|
[5] |
Raffel C, Shazeer N, Roberts A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer [J]. arXiv preprint, arXiv: 1910. 10683, 2019
|
[6] |
Guzella T S, Caminhas W M. A review of machine learning approaches to spam filtering[J]. Expert Systems with Applications, 2009, 36(7): 10206−10222 doi: 10.1016/j.eswa.2009.02.037
|
[7] |
Schmidt A, Wiegand M. A survey on hate speech detection using natural language processing [C] //Proc of the 5th Int Workshop on Natural Language Processing for Social Media. Stroudsburg, PA: ACL, 2019: 1−10
|
[8] |
Ford E, Carroll J A, Smith H E, et al. Extracting information from the text of electronic medical records to improve case detection: A systematic review[J]. Journal of the American Medical Informatics Association, 2016, 23(5): 1007−1015 doi: 10.1093/jamia/ocv180
|
[9] |
Zhang W E, Sheng Q Z, Alhazmi A, et al. Adversarial attacks on deep-learning models in natural language processing: A survey[J]. ACM Transactions on Intelligent Systems and Technology, 2020, 11(3): 1−41
|
[10] |
Xu Han, Ma Yao, Liu Haochen, et al. Adversarial attacks and defenses in images, graphs and text: A review[J]. International Journal of Automation and Computing, 2020, 17(2): 151−178 doi: 10.1007/s11633-019-1211-x
|
[11] |
Belinkov Y, Glass J. Analysis methods in neural language processing: A survey [J]. arXiv preprint, arXiv: 1812. 08951, 2018
|
[12] |
Li Yiming, Jiang Yong, Li Zhifeng, et al. Backdoor learning: A survey [J]. arXiv preprint, arXiv: 2007. 08745, 2020
|
[13] |
Garg S, Kumar A, Goel V, et al. Can adversarial weight perturbations inject neural backdoors [C] //Proc of the 29th ACM Int Conf on Information & Knowledge Management. New York: ACM, 2020: 2029−2032
|
[14] |
Dai Jiazhu, Chen Chuanshuai. A backdoor attack against LSTM-based text classification systems [J]. arXiv preprint, arXiv: 1905. 12457, 2019
|
[15] |
Chen Xiaoyi, Salem A, Chen Dingfan, et al. Badnl: Backdoor attacks against NLP models with semantic-preserving improvements [C] //Proc of the 37th Annual Computer Security Applications Conf. New York: ACM, 2021: 554−569
|
[16] |
Zhang Zhengyan, Xiao Guangxuan, Li Yongwei, et al. Red alarm for pre-trained models: Universal vulnerability to neuron-level backdoor attacks [J]. arXiv preprint, arXiv: 2101. 06969, 2021
|
[17] |
Kurita K, Michel P, Neubig G. Weight poisoning attacks on pretrained models [C] //Proc of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2020: 2793–2806
|
[18] |
Wallace E, Feng Shi, Kandpal N, et al. Universal adversarial triggers for attacking and analyzing NLP [J]. arXiv preprint, arXiv: 1908. 07125, 2019
|
[19] |
Azizi A, Tahmid I A, Waheed A, et al. T-miner: A generative approach to defend against trojan attacks on DNN-based text classification [J]. arXiv preprint, arXiv: 2103. 04264, 2021
|
[20] |
Chen Chuanshuai, Dai Jiazhu. Mitigating backdoor attacks in LSTM-based text classification systems by backdoor keyword identification [J]. arXiv preprint, arXiv: 2007. 12070, 2021
|
[21] |
Qi Fanchao, Chen Yangyi, Li Mukai, et al. Onion: A simple and effective defense against textual backdoor attacks[C] //Proc of the 26th Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2021: 9558–9566
|
[22] |
Gao Yansong, Kim Y, Doan B G, et al. Design and evaluation of a multi-domain trojan detection method on deep neural networks[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19(4): 2349−2364 doi: 10.1109/TDSC.2021.3055844
|
[23] |
Gu Tianyu, Dolan-Gavitt B, Garg S. Badnets: Identifying vulnerabilities in the machine learning model supply chain [J]. arXiv preprint, arXiv: 1708. 06733, 2017
|
[24] |
Chen Xinyun, Liu Chang, Li Bo, et al. Targeted backdoor attacks on deep learning systems using data poisoning [J]. arXiv preprint, arXiv: 1712. 05526, 2017
|
[25] |
Yan Zhicong, Li Gaolei, TIan Yuan, et al. Dehib: Deep hidden backdoor attack on semi-supervised learning via adversarial perturbation [C] //Proc of the 35th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2021: 10585−10593
|
[26] |
Saha A, Subramanya A, Pirsiavash H. Hidden trigger backdoor attacks [C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 11957−11965
|
[27] |
Chou E, Tramer F, Pellegrino G. Sentinet: Detecting localized universal attacks against deep learning systems [C] //Proc of the 41st IEEE Symp on Security and Privacy Workshops (SPW). Piscataway, NJ: IEEE, 2020: 48−54
|
[28] |
Nguyen A, Tran A. WaNet-imperceptible warping-based backdoor attack [J]. arXiv preprint, arXiv: 2102. 10369, 2021
|
[29] |
Liu Yingqi, Ma Shiqing, Aafer Y, et al. Trojaning attack on neural networks [C] //Proc of the 25th Annual Network and Distributed System Security Symp (NDSS). Reston, VA: The Internet Society, 2017: 18−21
|
[30] |
Kwon H, Lee S. Textual backdoor attack for the text classification system [J/OL]. Security and Communication Networks, 2021[2022-11-18].https://www.hindawi.com/journals/scn/2021/2938386/
|
[31] |
Qi Fanchao, Yao Yuan, Xu S, et al. Turn the combination lock: Learnable textual backdoor attacks via word substitution [C] //Proc of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int Joint Conf on Natural Language Processing. Stroudsburg, PA: ACL, 2021: 4873–4883
|
[32] |
Li Linyang, Song Demin, Li Xiaonan, et al. Backdoor attacks on pre-trained models by layerwise weight poisoning [C] //Proc of the 26th Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2021: 3023–3032
|
[33] |
Wallace E, Zhao T Z, Feng Shi, et al. Concealed data poisoning attacks on NLP models [J]. arXiv preprint, arXiv: 2010. 12563, 2020
|
[34] |
Song Liwei, Yu Xinwei, Peng H T, et al. Universal adversarial attacks with natural triggers for text classification [C] //Proc of the 15th Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: ACL, 2021: 3724–3733
|
[35] |
Yang Wenkai, Li Lei, Zhang Zhiyuan, et al. Be careful about poisoned word embeddings: Exploring the vulnerability of the embedding layers in NLP models [C] //Proc of the 15th Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: ACL, 2021: 2048–2058
|
[36] |
Wallace E, Feng Shi, Kandpal N, et al. Universal adversarial triggers for attacking and analyzing NLP [C] //Proc of the 24th Conf on Empirical Methods in Natural Language Processing and the 9th Int Joint Conf on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA: ACL, 2019: 2153–2162
|
[37] |
Zhang Zhiyuan, Ren Xuancheng, Su Qi, et al. Neural network surgery: Injecting data patterns into pre-trained models with minimal instance-wise side effects [C] //Proc of the 15th Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: ACL, 2021: 5453−5466
|
[38] |
Gan Leilei, Li Jiwei, Zhang Tianwei, et al. Triggerless backdoor attack for NLP tasks with clean labels [J]. arXiv preprint, arXiv: 2111. 07970, 2021
|
[39] |
Bagdasaryan E, Shmatikov V. Spinning language models for propaganda-as-a-service [J]. arXiv preprint, arXiv: 2112.05224, 2021
|
[40] |
Qi Fanchao, Li Mukai, Chen Yangyi, et al. Hidden Killer: Invisible textual backdoor attacks with syntactic trigger [C] //Proc of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int Joint Conf on Natural Language Processing. Stroudsburg, PA: ACL, 2021: 443–453
|
[41] |
Chan A, Tay Y, Ong Y S, et al. Poison attacks against text datasets with conditional adversarially regularized autoencoder [J]. arXiv preprint, arXiv: 2010. 02684, 2020
|
[42] |
Qi Fanchao, Chen Yangyi, Zhang Xurui, et al. Mind the style of text! adversarial and backdoor attacks based on text style transfer [C] // Proc of the 26th Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2021: 4569–4580
|
[43] |
Chen Yangyi, Qi Fanchao, Gao Hongcheng, et al. Textual backdoor attacks can be more harmful via two simple tricks [J]. arXiv preprint, arXiv: 2110. 08247, 2021
|
[44] |
Li Shaofeng, Liu Hui, Dong Tian, et al. Hidden backdoors in human-centric language models [C] //Proc of the 28th ACM SIGSAC Conf on Computer and Communications Security. New York: ACM, 2021: 3123−3140
|
[45] |
Zhang Xinyang, Zhang Zheng, Ji Shouling, et al. Trojaning language models for fun and profit [C] //Proc of 6th IEEE European Symp on Security and Privacy (EuroS&P). Piscataway, NJ: IEEE, 2021: 179−197
|
[46] |
Yang Wenkai, Lin Yankai, Li Peng, et al. Rethinking stealthiness of backdoor attack against NLP models [C] //Proc of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int Joint Conf on Natural Language Processing. Stroudsburg, PA: ACL, 2021: 5543−5557
|
[47] |
Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735−1780 doi: 10.1162/neco.1997.9.8.1735
|
[48] |
Chen Qian, Zhu Xiaodan, Ling Zhenhua, et al. Enhanced LSTM for natural language inference [C] //Proc of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2017: 1657–1668
|
[49] |
Parikh A, Täckström O, Das D, et al. A decomposable attention model for natural language inference [C] //Proc of the 21st Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2016: 2249–2255
|
[50] |
Radford A, Wu J, Child R, et al. Language models are unsupervised multitask learners [EB/OL]. OpenAI, 2019[2022-11-03].https://openai.com/blog/better-language-models/
|
[51] |
Lewis M, Liu Yinhan, Goyal N, et al. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension [J]. arXiv preprint, arXiv: 1910. 13461, 2019
|
[52] |
Dowmunt M, Grundkiewicz R, Dwojak T, et al. Marian: Fast neural machine translation in C++ [C] //Proc of the 56th Annual Meeting of the Association for Computational Linguistics, System Demonstrations. Stroudsburg, PA: ACL, 2018: 116–121
|
[53] |
Lan Zhenzhong, Chen Mingda, Goodman S, et al. Albert: A lite bert for self-supervised learning of language representations [J]. arXiv preprint, arXiv: 1909. 11942, 2019
|
[54] |
Sanh V, Debut L, Chaumond J, et al. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter [J]. arXiv preprint, arXiv: 1910. 01108, 2019
|
[55] |
Yang Zhilin, Dai Zihang, Yang Yiming, et al. Xlnet: Generalized autoregressive pretraining for language understanding [J]. arXiv preprint, arXiv: 1906. 08237, 2019
|
[56] |
Kim Y. Convolutional neural networks for sentence classification [C] //Proc of the 19th Conf on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA: ACL, 2014: 1746–1751
|
[57] |
Zhu Yukun, Kiros R, Zemel R, et al. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books [C] //Proc of the 15th IEEE Int Conf on Computer Vision (ICCV). Piscataway, NJ: IEEE, 2015: 19−27
|
[58] |
Maas A, Daly R E, Pham P T, et al. Learning word vectors for sentiment analysis [C] //Proc of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: ACL, 2011: 142−150
|
[59] |
Iyyer M, Wieting J, Gimpel K, et al. Adversarial example generation with syntactically controlled paraphrase networks [C] //Proc of the 13th Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL). Stroudsburg, PA: ACL, 2018: 1875−1885
|
[60] |
Krishna K, Wieting J, Iyyer M. Reformulating unsupervised style transfer as paraphrase generation [C] //Proc of the 25th Conf on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA: ACL, 2020: 737–762
|
[61] |
Huang Xijie, Alzantot M, Srivastava M. Neuroninspect: Detecting backdoors in neural networks via output explanations [J]. arXiv preprint, arXiv: 1911. 07399, 2019
|
[62] |
Wang Bolun, Yao Yuanshun, Shan S, et al. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks [C] //Proc of the 40th IEEE Symp on Security and Privacy (S&P). Piscataway, NJ: IEEE, 2019: 707−723
|
[63] |
Du Min, Jia Ruoxi, Song D. Robust anomaly detection and backdoor attack detection via differential privacy [J]. arXiv preprint, arXiv: 1911. 07116, 2020
|
[64] |
Qiao Ximing, Yang Yukun, Li Hai. Defending neural backdoors via generative distribution modeling [C] //Proc of the 33rd Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates, 2019: 14027−14036
|
[65] |
Kolouri S, Saha A, Pirsiavash H, et al. Universal litmus patterns: Revealing backdoor attacks in CNNs [C] //Proc of the 30th IEEE/CVF Conf on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ: IEEE, 2020: 301−310
|
[66] |
Levine A, Feizi S. Deep partition aggregation: Provable defense against general poisoning attacks [J]. arXiv preprint, arXiv: 2006. 14768, 2020
|
[67] |
Hu Zhiting, Yang Zichao, Liang Xiaodong, et al. Toward controlled generation of text [C] //Proc of the 34th Int Conf on Machine Learning. New York: ACM, 2017: 1587−1596
|
[68] |
Ester M, Kriegel H P, Sander J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise [C] //Proc of the 2nd Int Conf on Knowledge Discovery and Data Mining (KDD). Palo Alto, CA: AAAI, 1996: 226−231
|
[69] |
Liu Kang, Dolan-Gavitt B, Garg S. Fine-pruning: Defending against backdooring attacks on deep neural networks [C] //Proc of the 21st Int Symp on Research in Attacks, Intrusions, and Defenses (RAID). Berlin: Springer, 2018: 273−294
|
[70] |
Li Yege, Lyu Xixiang, Koren N, et al. Neural attention distillation: Erasing backdoor triggers from deep neural networks [J]. arXiv preprint, arXiv: 2101. 05930, 2021
|
[71] |
Shen Lingfeng, Jiang Haiyun, Liu Lemao, et al. Rethink the evaluation for attack strength of backdoor attacks in natural language processing [J]. arXiv preprint, arXiv: 2201. 02993, 2022
|
[72] |
Le T, Park N, Lee D. A sweet rabbit hole by darcy: Using honeypots to detect universal trigger’s adversarial attacks [C] //Proc of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int Joint Conf on Natural Language Processing. Stroudsburg, PA: ACL, 2021: 3831−3844
|
[73] |
Gao Yansong, Xu Chang, Wang Derui, et al. Strip: A defence against trojan attacks on deep neural networks [C] //Proc of the 35th Annual Computer Security Applications Conf. New York: ACM, 2019: 113−125
|
[74] |
Yang Wenkai, Lin Yankai, Li Peng, et al. RAP: Robustness-aware perturbations for defending against backdoor attacks on NLP models [C] //Proc of the 26th Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2021: 8365−8381
|
[75] |
Li Zichao, Mekala D, Dong Chengyu, et al. BFClass: A backdoor-free text classification framework [J]. arXiv preprint, arXiv: 2109. 10855, 2021
|
[76] |
Zhu Chen, Cheng Yu, Gan Zhe, et al. Freelb: Enhanced adversarial training for natural language understanding [J]. arXiv preprint, arXiv: 1909. 11764, 2019
|
[77] |
Miyato T, Dai A M, Goodfellow I. Adversarial training methods for semi-supervised text classification [J]. arXiv preprint, arXiv: 1605. 07725, 2016
|
[78] |
Jia R, Raghunathan A, Göksel K, et al. Certified robustness to adversarial word substitutions [C] //Proc of the 24th Conf on Empirical Methods in Natural Language Processing and the 9th Int Joint Conf on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA: ACL, 2019: 4129−4142
|
[79] |
Huang P S, Stanforth R, Welbl J, et al. Achieving verified robustness to symbol substitutions via interval bound propagation [C] //Proc of the 24th Conf on Empirical Methods in Natural Language Processing and the 9th Int Joint Conf on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA: ACL, 2019: 4083−4093
|
[80] |
Ye Mao, Gong Chengyue, Liu Qiang. Safer: A structure-free approach for certified robustness to adversarial word substitutions [C] //Proc of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2020: 3465−3475
|
[81] |
Lakshmipathi N. IMDB dataset of 50K movie reviews [EB/OL]. 2018[2022-11-16].https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
|
[82] |
Zhang Xiang. AG’s news topic classification dataset [EB/OL]. 2015[2022-11-16].https://paperswithcode.com/dataset/ag-news
|
[83] |
Zhang Xiang, Zhao Junbo, LeCun Y. Character-level convolutional networks for text classification [C] //Proc of the 28th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates, 2015: 649–657
|
[84] |
Stanford University. Sentiment analysis [EB/OL]. 2013[2022-11-17].https://nlp.stanford.edu/sentiment/index.html
|
[85] |
Socher R, Perelygin A, Wu J Y, et al. Recursive deep models for semantic compositionality over a sentiment treebank [C] //Proc of the 18th Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2013: 1631−1642
|
[86] |
Shervin M. Offensive language identification dataset–OLID [EB/OL]. 2019[2022-11-17].https://scholar.harvard.edu/malmasi/olid
|
[87] |
Zampieri M, Malmasi S, Nakov P, et al. Predicting the type and target of offensive posts in social media [C] //Proc of the 14th Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: ACL, 2019: 1415−1420
|
[88] |
Leskovec J. Amazon reviews [EB/OL]. 2013[2022-11-17]. http://snap.stanford.edu/data/web-Amazon-links.html
|
[89] |
McAuley J, Leskovec J. Hidden factors and hidden topics: Understanding rating dimensions with review text [C] //Proc of the 7th ACM Conf on Recommender Systems. New York: ACM, 2013: 165−172
|
[90] |
Conversation AI. Toxic comment classification challenge [EB/OL]. 2017[2022-11-17].https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/data
|
[91] |
Antigoni M F. Hate and abusive speech on Twitter [EB/OL]. 2018[2022-11-17].https://github.com/ENCASEH2020/hatespeech-twitter
|
[92] |
Founta A M, Djouvas C, Chatzakou D, et al. Large scale crowdsourcing and characterization of Twitter abusive behavior [J]. arXiv preprint, arXiv: 1802.00393, 2018
|
[93] |
Mandy G. Ling-spam dataset [EB/OL]. 2019[2022-11-17].https://www.kaggle.com/datasets/mandygu/lingspam-dataset
|
[94] |
Sakkis G, Androutsopoulos I, Paliouras G, et al. A memory-based approach to anti-spam filtering for mailing lists[J]. Information Retrieval, 2003, 6(1): 49−73 doi: 10.1023/A:1022948414856
|
[95] |
Van Ranst W, Thys S, Goedemé T. Fooling automated surveillance cameras: Adversarial patches to attack person detection [C] //Proc of the 29th CVPR Workshop on The Bright and Dark Sides of Computer Vision: Challenges and Opportunities for Privacy and Security. Piscataway, NJ: IEEE, 2019: 49−55
|
[96] |
Moosavi-Dezfooli S M, Fawzi A, Fawzi O, et al. Universal adversarial perturbations [C] //Proc of the 27th IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2017: 1765−1773
|
[97] |
Alzantot M, Sharma Y, Elgohary A, et al. Generating natural language adversarial examples [J]. arXiv preprint, arXiv: 1804. 07998, 2018
|
[98] |
Ren Shuhuai, Deng Yihe, He Kun, et al. Generating natural language adversarial examples through probability weighted word saliency [C] //Proc of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2019: 1085−1097
|
[99] |
Zang Yuan, Qi Fanchao, Yang Chenghao, et al. Word-level textual adversarial attacking as combinatorial optimization [C] //Proc of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2020: 6066−6080
|
[100] |
Pang Ren, Shen Hua, Zhang Xinyang, et al. A tale of evil twins: Adversarial inputs versus poisoned models [C] //Proc of the 27th ACM SIGSAC Conf on Computer and Communications Security. New York: ACM, 2020: 85−99
|
[101] |
Weng C H, Lee Y T, Wu S H B. On the trade-off between adversarial and backdoor robustness [C] // Proc of the 33rd Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates, 2020: 11973−11983
|
[102] |
Biggio B, Nelson B, Laskov P. Poisoning attacks against support vector machines [C] //Proc of the 29th Int Conf on Machine Learning (ICML’12). Madison, WI: Omnipress, 2012: 1467–1474
|
[103] |
Yang Chaofei, Wu Qing, Li Hai, et al. Generative poisoning attack method against neural networks [J]. arXiv preprint, arXiv: 1703. 01340, 2017
|
[104] |
Steinhardt J, Koh P W, Liang P. Certified defenses for data poisoning attacks [C] //Proc of the 30th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates, 2017: 3520−3532
|
[105] |
Kwon H, Yoon H, Park K W. Selective poisoning attack on deep neural network to induce fine-Grained recognition Error [C] //Proc of the 2nd IEEE Int Conf on Artificial Intelligence and Knowledge Engineering (AIKE). Piscataway, NJ: IEEE, 2019: 136−139
|
[106] |
Liu Pengfei, Yuan Weizhe, Fu Jinlan, et al. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing [J]. arXiv preprint, arXiv: 2107. 13586, 2021
|
[107] |
杜巍,刘功申. 深度学习中的后门攻击综述[J]. 信息安全学报,2022,7(3):1−16 doi: 10.19363/J.cnki.cn10-1380/tn.2022.05.01
Du Wei, Liu Gongshen. A survey of backdoor attack in deep learning[J]. Journal of Cyber Security, 2022, 7(3): 1−16 (in Chinese) doi: 10.19363/J.cnki.cn10-1380/tn.2022.05.01
|
[108] |
谭清尹,曾颖明,韩叶,等. 神经网络后门攻击研究[J]. 网络与信息安全学报,2021,7(3):46−58 doi: 10.11959/j.issn.2096-109x.2021053
Tan Qingyin, Zeng Yingming, Han Ye, et al. Survey on backdoor attacks targeted on neural network[J]. Chinese Journal of Network and Information Security, 2021, 7(3): 46−58 (in Chinese) doi: 10.11959/j.issn.2096-109x.2021053
|
[109] |
陈大卫,付安民,周纯毅,等. 基于生成式对抗网络的联邦学习后门攻击方案[J]. 计算机研究与发展,2021,58(11):2364−2373 doi: 10.7544/issn1000-1239.2021.20210659
Chen Dawei, Fu Anmin, Zhou Chunyi, et al. Federated learning backdoor attack scheme based on generative adversarial network[J]. Journal of Computer Research and Development, 2021, 58(11): 2364−2373 (in Chinese) doi: 10.7544/issn1000-1239.2021.20210659
|
[110] |
Geirhos R, Jacobsen J H, Michaelis C, et al. Shortcut learning in deep neural networks[J]. Nature Machine Intelligence, 2020, 2(11): 665−673 doi: 10.1038/s42256-020-00257-z
|
[1] | Liu Le, Guo Shengnan, Jin Xiyuan, Zhao Miaomiao, Chen Ran, Lin Youfang, Wan Huaiyu. Spatial-Temporal Traffic Data Imputation Method with Uncertainty Modeling[J]. Journal of Computer Research and Development, 2025, 62(2): 346-363. DOI: 10.7544/issn1000-1239.202330455 |
[2] | Xu Xiao, Ding Shifei, Sun Tongfeng, Liao Hongmei. Large-Scale Density Peaks Clustering Algorithm Based on Grid Screening[J]. Journal of Computer Research and Development, 2018, 55(11): 2419-2429. DOI: 10.7544/issn1000-1239.2018.20170227 |
[3] | Wang Haiyan, Xiao Yikang. Dynamic Group Discovery Based on Density Peaks Clustering[J]. Journal of Computer Research and Development, 2018, 55(2): 391-399. DOI: 10.7544/issn1000-1239.2018.20160928 |
[4] | Ren Lifang, Wang Wenjian, Xu Hang. Uncertainty-Aware Adaptive Service Composition in Cloud Computing[J]. Journal of Computer Research and Development, 2016, 53(12): 2867-2881. DOI: 10.7544/issn1000-1239.2016.20150078 |
[5] | Xu Zhengguo, Zheng Hui, He Liang, Yao Jiaqi. Self-Adaptive Clustering Based on Local Density by Descending Search[J]. Journal of Computer Research and Development, 2016, 53(8): 1719-1728. DOI: 10.7544/issn1000-1239.2016.20160136 |
[6] | Xu Min, Deng Zhaohong, Wang Shitong, Shi Yingzhong. MMCKDE: m-Mixed Clustering Kernel Density Estimation over Data Streams[J]. Journal of Computer Research and Development, 2014, 51(10): 2277-2294. DOI: 10.7544/issn1000-1239.2014.20130718 |
[7] | Qi Yafei, Wang Yijie, and Li Xiaoyong. A Skyline Query Method over Gaussian Model Uncertain Data Streams[J]. Journal of Computer Research and Development, 2012, 49(7): 1467-1473. |
[8] | Pan Weimin and He Jun. Neuro-Fuzzy System Modeling with Density-Based Clustering[J]. Journal of Computer Research and Development, 2010, 47(11): 1986-1992. |
[9] | Chen Jianmei, Lu Hu, Song Yuqing, Song Shunlin, Xu Jing, Xie Conghua, Ni Weiwei. A Possibility Fuzzy Clustering Algorithm Based on the Uncertainty Membership[J]. Journal of Computer Research and Development, 2008, 45(9): 1486-1492. |
[10] | Ma Liang, Chen Qunxiu, and Cai Lianhong. An Improved Model for Adaptive Text Information Filtering[J]. Journal of Computer Research and Development, 2005, 42(1): 79-84. |