Abstract:
Zero-shot learning (ZSL) aims to recognize novel categories, which have few or even no sample for training and follow a different distribution from seen classes. With the recent advances of deep neural networks on cross-modal generation, encouraging breakthroughs have been achieved on classifying unseen categories with their synthetic samples. Extant methods synthesize unseen samples with the combination of generative adversarial nets (GANs) and variational auto-encoder (VAE) by sharing the generator and the decoder. However, due to the different data distributions produced by these two kinds of generative models, fake samples synthesized by the joint model follow the complex multi-domain distribution instead of satisfying a single model distribution. To address this issue, in this paper we propose a cross-domain adversarial generative network (CrossD-AGN) to integrate the traditional GANs and VAE into a unified framework, which is able to generate unseen samples based on the class-level semantics for zero-shot classification. We propose two symmetric cross-domain discriminators along with the cross-domain adversarial learning mechanism to learn to determine whether a synthetic sample is from the generator-domain or the decoder-domain distribution, so as to drive the generator/decoder of the joint model to improve its capacity of synthesizing fake samples. Extensive experimental results over several real-world datasets demonstrate the effectiveness and superiority of the proposed model on zero-shot visual classification.