Abstract:
Deep learning models are vulnerable to backdoor attacks and behave normally when processing clean data, but they will exhibit malicious behavior when processing toxic samples with trigger patterns. However, most backdoor attacks currently produce backdoor images that are easily perceived by the human eye, resulting in insufficient stealthiness of backdoor attacks. Therefore, a multi-objective optimized covert image backdoor attack method based on perceptual similarity is proposed. Firstly, the visual difference between the backdoor image and the original image is reduced using a perceptual similarity loss function. Secondly, a multi-objective optimization method is used to solve the problem of inter-task conflict on the poisoning model, thus ensuring stable performance of the model after poisoning. Finally, a two-stage training method is adopted to automate the generation of trigger patterns and improve the training efficiency. The final experimental results show that it is difficult for human eye to distinguish the generated backdoor image from the original image without any degradation in clean accuracy. Meanwhile, the backdoor attack is successfully performed on the target classification model, and the attack success rate reaches 100% for all experimental datasets under the all-to-one attack strategy. Compared with other steganographic backdoor attack methods, our method has better stealthiness.