基于感知相似性的多目标优化隐蔽图像后门攻击

朱素霞; 王金印; 孙广路

doi:10.7544/issn1000-1239.202330521

基于感知相似性的多目标优化隐蔽图像后门攻击

Perceptual Similarity-Based Multi-Objective Optimization for Stealthy Image Backdoor Attack

摘要

摘要: 深度学习模型容易受到后门攻击，在处理干净数据时表现正常，但在处理具有触发模式的有毒样本时会表现出恶意行为. 然而，目前大多数后门攻击产生的后门图像容易被人眼察觉，导致后门攻击隐蔽性不足. 因此提出了一种基于感知相似性的多目标优化隐蔽图像后门攻击方法. 首先，使用感知相似性损失函数减少后门图像与原始图像之间的视觉差异. 其次，采用多目标优化方法解决中毒模型上任务间冲突的问题，从而确保模型投毒后性能稳定. 最后，采取了两阶段训练方法，使触发模式的生成自动化，提高训练效率. 最终实验结果表明，在干净准确率不下降的情况下，人眼很难将生成的后门图像与原始图像区分开. 同时，在目标分类模型上成功进行了后门攻击，all-to-one攻击策略下所有实验数据集的攻击成功率均达到了100%. 相比其他隐蔽图像后门攻击方法，具有更好的隐蔽性.

Abstract: Deep learning models are vulnerable to backdoor attacks and behave normally when processing clean data, but they will exhibit malicious behavior when processing toxic samples with trigger patterns. However, most backdoor attacks currently produce backdoor images that are easily perceived by the human eye, resulting in insufficient stealthiness of backdoor attacks. Therefore, a multi-objective optimized covert image backdoor attack method based on perceptual similarity is proposed. Firstly, the visual difference between the backdoor image and the original image is reduced using a perceptual similarity loss function. Secondly, a multi-objective optimization method is used to solve the problem of inter-task conflict on the poisoning model, thus ensuring stable performance of the model after poisoning. Finally, a two-stage training method is adopted to automate the generation of trigger patterns and improve the training efficiency. The final experimental results show that it is difficult for human eye to distinguish the generated backdoor image from the original image without any degradation in clean accuracy. Meanwhile, the backdoor attack is successfully performed on the target classification model, and the attack success rate reaches 100% for all experimental datasets under the all-to-one attack strategy. Compared with other steganographic backdoor attack methods, our method has better stealthiness.

HTML全文

参考文献(25)

施引文献

资源附件(0)