Abstract:
In recent years, convolutional neural network (CNN), as a typical deep neural network model, has achieved remarkable results in computer vision fields such as image recognition, target detection and semantic segmentation. However, the end-to-end learning mode of CNNs makes the logical relationships of their hidden layers and the results of model decisions difficult to be interpreted, which limits their promotion and application. Therefore, the research of interpretable CNNs is of important significance and application value. In order to make the classifier of CNNs interpretable, many researches have emerged in recent years to introduce basis concepts into CNN architectures as plug-in components. The post-hoc concept activation vector methods take the basis concept as their representation and are used to analyze the pre-trained models. However, they rely on additional classifiers independent of the original models and the interpretation results may not match the original model logic. Furthermore, some existing concept-based ad-hoc interpretable methods are too absolute in handling concepts in the latent classification space of CNNs. In this work, a within-class concepts graphs encoder (CGE) is designed by introducing a graph convolutional network module to learn the basis concepts within a class and their latent interactions. The adaptive disentangled interpretable CNN classifier (ADIC) with adaptive disentangled latent space is proposed based on CGE by designing regularization terms that implement different degrees disentanglement of the basis concepts with different dependencies. By embedding ADIC into ResNet18 and ResNet50 architectures, classification experiments and interpretable image recognition experiments on Mini-ImageNet and Places 365 datasets have shown that ADIC can further improve the accuracy of the baseline model while ensuring that the baseline model has self-interpretability.