DiffAD：基于差分卷积注意力的多类无监督异常检测

姚健; 马帅; 钱鹏江; 王闯; 杨桂松; 何源

doi:10.7544/issn1000-1239.202550706

DiffAD：基于差分卷积注意力的多类无监督异常检测

DiffAD: Difference Convolution Attention-Based Multi-Class Unsupervised Anomaly Detection

摘要

摘要: 提出一种新颖的多类别无监督异常检测模型——DiffAD，以应对复杂工业场景下视觉异常检测所面临的标注匮乏与精度不足等挑战。该模型采用渐进式特征重建策略，其核心在于设计了一个特征重建组件DADE，该组件创新性地将重建过程划分为混沌、预细化和强细化3个阶段，有效提升了重建质量与过程稳定性。DADE的突出创新点在于融合差分卷积注意力与细节增强机制，通过整合差分卷积与多头自注意力机制，并借助残差密集连接，显著增强了对图像细微变化及高频信息的捕捉能力，进而提升了异常定位的精准度。在MVTec-AD，VisA，MVTec-3D，Uni-Medical这4个代表性数据集上的广泛实验表明，DiffAD在图像级和像素级异常检测指标上整体表现显著优于现有主流模型，充分彰显了其在无监督视觉检测领域的实际应用价值与潜力。

Abstract: A novel unsupervised multi-class anomaly detection model, termed DiffAD, is proposed to tackle the critical challenges of annotation scarcity and insufficient detection accuracy plaguing visual anomaly detection tasks in complex industrial scenarios. The model employs a progressive feature reconstruction strategy, with its core innovation centered on a dedicated feature reconstruction component named DADE. Breaking new ground, DADE decomposes the feature reconstruction process into three sequential phases, namely chaos, pre-refinement and strong refinement, which synergistically enhance the quality of reconstructed features and ensure the stability of the entire reconstruction workflow. A key standout of DADE lies in its seamless integration of difference convolution attention and a detail enhancement mechanism. It combines difference convolution with multi-head self-attention and incorporates residual dense connections, thereby enabling the component to markedly strengthen the model’s ability to capture subtle image variations and high-frequency details that are critical for anomaly identification and substantially elevate the precision of pixel-level anomaly localization. Extensive empirical evaluations are conducted on four representative benchmark datasets, including MVTec-AD, VisA, MVTec-3D and Uni-Medical. The experimental results consistently demonstrate that DiffAD achieves significant performance superiority over state-of-the-art mainstream models across both image-level and pixel-level anomaly detection metrics, fully underscoring its remarkable practical application value and great potential for widespread deployment in the domain of unsupervised visual anomaly detection.

HTML全文

参考文献(45)

施引文献

资源附件(0)