基于扩散模型的分布双向引导对抗扰动净化防御

赵新宇; 李司

doi:10.7544/issn1000-1239.202550515

基于扩散模型的分布双向引导对抗扰动净化防御

赵新宇,
李司

Probability Distribution Bidirectional Guidance for Adversarial Purification with Diffusion Model

Zhao Xinyu,
Li Si

摘要

摘要: 对抗扰动的净化防御对识别模型安全应用具有重要的研究意义。现有的防御方法通常采用固定的对抗攻击方法进行分类模型增强训练，或者对输入样本进行数据重构，但在实际防御中存在方法泛化性低、信息丢失或分布不稳定等问题。为解决上述问题，提出了基于扩散模型的分布双向引导对抗扰动净化防御方法。具体而言，设计了类间循环和类内聚合的双向引导方法，通过随机微分方程逆时序实现对抗域与干净域之间相互跨域映射，依据循环一致性思想完成干扰的有向消除，利用潜在变量增强进行分布调控，提高干净分量完整性和分布稳定性，并在干净域到对抗域的通路中引用类内聚合，提升各时刻类别属性的置信度，促使其向概率分布高密度区间偏移。实验结果表明，在数据集CIFAR-10，CIFAR-100，ImageNet上，所提方法与基线防御算法相比，Standard Acc平均提升10.9个百分点， Robust Acc平均提升4.36个百分点，在干净分量完整性保持方面表现出显著的优势，并且该方法在净化过程中也展现出较强的稳定性。

Abstract: The defense methods against adversarial perturbations holds significant importance for the secure application of recognition models. Existing adversarial defense methods typically either leverage fixed adversarial samples to augment classifier training or perform sample reconstruction prior to model input. However, in practical defense scenarios, such approaches suffer from inherent limitations, including inadequate generalizability, loss of critical sample components, and unstable data distribution. To tackle this problem, we propose probability distribution bidirectional guidance for adversarial purification with diffusion model. Specifically, we propose a novel bidirectional guidance strategy encompassing inter-class loops and intra-class aggregation. For the inter-class loop component, cross-domain mapping between adversarial and clean domains is achieved by leveraging the inverse time sequence of stochastic differential equations (SDEs), with targeted interference elimination enabled via the circular consistency principle. Subsequently, latent variable refinement is employed for distribution regularization, enhancing the integrity of noise-free components and the stability of distribution characteristics. For the intra-class aggregation component, it is exploited to strengthen the confidence of category attributes at each time step along the path from the clean domain to the adversarial domain, thereby facilitating the shift toward high-density intervals of the probability distribution. Extensive experiments conducted on benchmark datasets including CIFAR-10, CIFAR-100, and ImageNet demonstrate that our proposed method achieves state-of-the-art (SOTA) performance. Specifically, it yields a 10.9 percentage point (pp) improvement in Standard Acc and a 4.36 pp gain in Robust Acc compared to existing approaches. Furthermore, our method exhibits remarkable advantages in preserving the integrity of noise-free components and demonstrates robust stability throughout the purification process.

HTML全文

参考文献(33)

施引文献

资源附件(0)