ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2014, Vol. 51 ›› Issue (9): 1919-1928.doi: 10.7544/issn1000-1239.2014.20140138

所属专题: 2014深度学习

• 人工智能 • 上一篇    下一篇


祝军, 赵杰煜, 董振宇   

  1. (宁波大学信息科学与工程学院 浙江宁波 315211) (
  • 出版日期: 2014-09-01
  • 基金资助: 

Image Classification Using Hierarchical Feature Learning Method Combined with Image Saliency

Zhu Jun, Zhao Jieyu, Dong Zhenyu   

  1. (College of Information Science and Engineering, Ningbo University, Ningbo, Zhejiang 315211)
  • Online: 2014-09-01

摘要: 高效的图像特征表示是计算机视觉的基础.基于图像的视觉显著性机制及深度学习模型的思想,提出一种融合图像显著性的层次稀疏特征表示用于图像分类.这种层次特征学习每一层都由3个部分组成:稀疏编码、显著性最大值汇聚(saliency max pooling)和对比度归一化.通过在图像层次稀疏表示中引入图像显著信息,加强了图像特征的语义信息,得到图像显著特征表示.相比于手工指定特征,该模型采用无监督数据驱动的方式直接从图像中学习到有效的图像特征描述.最后采用支持向量机(support vector machine, SVM)分类器进行监督学习,实现对图像进行分类.在2个常用的标准图像数据集(Caltech 101和Caltech 256)上进行的实验结果表明,结合图像显著性信息的层次特征表示,相比于基于局部特征的单层稀疏表示在分类性能上有了显著提升.

关键词: 特征学习, 层次稀疏表示, 图像显著性, 图像分类, 显著性最大值汇聚

Abstract: Efficient feature representations for images are essential in many computer vision tasks. In this paper, a hierarchical feature representation combined with image saliency is proposed based on the theory of visual saliency and deep learning, which builds a feature hierarchy layer-by-layer. Each feature learning layer is composed of three parts: sparse coding, saliency max pooling and contrast normalization. To speed up the sparse coding process, we propose batch orthogonal matching pursuit which differs from the traditional method. The salient information is introduced into the image sparse representation, which compresses the feature representation and strengthens the semantic information of the feature representation. Simultaneously, contrast normalization effectively reduces the impact of local variations in illumination and foreground-background contrast, and enhances the robustness of the feature representation. Instead of using hand-crafted descriptors, our model learns an effective image representation directly from images in an unsupervised data-driven manner. The final image classification is implemented with a linear SVM classifier using the learned image representation. We compare our method with many state-of-the-art algorithms including convolutional deep belief networks, SIFT based single layer or multi-layer sparse coding methods, and some kernel based feature learning approaches. The experimental results on two commonly used benchmark datasets Caltech 101 and Caltech 256 show that our method consistently and significantly improves the performance.

Key words: feature learning, hierarchical sparse coding, image saliency, image classification, saliency max pooling