高级检索

    文本后门攻击与防御综述

    Survey of Textual Backdoor Attack and Defense

    • 摘要: 深度神经网络的安全性和鲁棒性是深度学习领域的研究热点. 以往工作主要从对抗攻击角度揭示神经网络的脆弱性,即通过构建对抗样本来破坏模型性能并探究如何进行防御. 但随着预训练模型的广泛应用,出现了一种针对神经网络尤其是预训练模型的新型攻击方式——后门攻击. 后门攻击向神经网络注入隐藏的后门,使其在处理包含触发器(攻击者预先定义的图案或文本等)的带毒样本时会产生攻击者指定的输出. 目前文本领域已有大量对抗攻击与防御的研究,但对后门攻击与防御的研究尚不充分,缺乏系统性的综述. 全面介绍文本领域后门攻击和防御技术. 首先,介绍文本领域后门攻击基本流程,并从不同角度对文本领域后门攻击和防御方法进行分类,介绍代表性工作并分析其优缺点;之后,列举常用数据集以及评价指标,将后门攻击与对抗攻击、数据投毒2种相关安全威胁进行比较;最后,讨论文本领域后门攻击和防御面临的挑战,展望该新兴领域的未来研究方向.

       

      Abstract: In the deep learning community, lots of efforts have been made to enhance the robustness and the reliability of deep neural networks (DNNs). Previous research mainly analyzed the fragility of DNN from the perspective of adversarial attack, and researchers designed numerous adversarial attack and defense methods. However, with the wide application of pre-trained models (PTMs), a new security threat against DNN especially PTM, called backdoor attack is emerging. Backdoor attack aims at injecting hidden backdoors into DNN, such that the backdoored model behaves properly on normal inputs but produces attacker-specified malicious outputs on the poisoned inputs embedded with special triggers. Backdoor attack poses a severe threat against DNN based systems like spam filter or hate speech detector. Compared with the textual adversarial attack and defense which has been widely studied, textual backdoor attack and defense has not been thoroughly investigated and requires a systematic review. In this paper, we present a comprehensive survey of backdoor attack and defense methods in the text domain. Specifically, we first summarize and categorize the textual backdoor attack and defense methods from different perspectives, then we introduce typical work and analyze their pros and cons. We also enumerate widely adopted benchmark datasets and evaluation metrics in the current literatures. Moreover, we respectively compare the backdoor attack with two relevant threats (i.e., adversarial attack and data poisoning). Finally, we discuss existing challenges of backdoor attack and defense in the text domain and present several promising future directions in this emerging and rapidly growing research area.

       

    /

    返回文章
    返回