Abstract:
The defense methods against adversarial perturbations holds significant importance for the secure application of recognition models. Existing adversarial defense methods typically either leverage fixed adversarial samples to augment classifier training or perform sample reconstruction prior to model input. However, in practical defense scenarios, such approaches suffer from inherent limitations, including inadequate generalizability, loss of critical sample components, and unstable data distribution. To tackle this problem, we propose probability distribution bidirectional guidance for adversarial purification with diffusion model. Specifically, we propose a novel bidirectional guidance strategy encompassing inter-class loops and intra-class aggregation. For the inter-class loop component, cross-domain mapping between adversarial and clean domains is achieved by leveraging the inverse time sequence of stochastic differential equations (SDEs), with targeted interference elimination enabled via the circular consistency principle. Subsequently, latent variable refinement is employed for distribution regularization, enhancing the integrity of noise-free components and the stability of distribution characteristics. For the intra-class aggregation component, it is exploited to strengthen the confidence of category attributes at each time step along the path from the clean domain to the adversarial domain, thereby facilitating the shift toward high-density intervals of the probability distribution. Extensive experiments conducted on benchmark datasets including CIFAR-10, CIFAR-100, and ImageNet demonstrate that our proposed method achieves state-of-the-art (SOTA) performance. Specifically, it yields a 10.9 percentage point (pp) improvement in Standard Acc and a 4.36 pp gain in Robust Acc compared to existing approaches. Furthermore, our method exhibits remarkable advantages in preserving the integrity of noise-free components and demonstrates robust stability throughout the purification process.