ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2020, Vol. 57 ›› Issue (4): 859-875.doi: 10.7544/issn1000-1239.2020.20190513

• 图形图像 • 上一篇    下一篇



  1. 1(中国科学院计算技术研究所 北京 100190);2(上海寒武纪信息科技有限公司 上海 200135) (
  • 出版日期: 2020-04-01
  • 基金资助: 

A Survey on Algorithm Research of Scene Parsing Based on Deep Learning

Zhang Rui1,2, Li Jintao1   

  1. 1(Institute of Computing Technology, Chinese Academic of Sciences, Beijing 100190);2(Cambricon Tech. Ltd, Shanghai 200135)
  • Online: 2020-04-01
  • Supported by: 
    This work was supported by the National Key Research and Development Program of China (2017YFA0700900, 2017YFA0700902, 2017YFA0700901, 2017YFB1003101), the National Natural Science Foundation of China (61432016, 61532016, 61672491, 61602441, 61602446, 61732002, 61702478, 61732007, 61732020), the Beijing Natural Science Foundation (JQ18013), the National Basic Research Program of China (973 Program) (2015CB358800), the National Science and Technology Major Projects of Hegaoji (2018ZX01031102), the Transformation and Transfer of Scientific and Technological Achievements of Chinese Academy of Sciences (KFJ-HGZX-013), the Key Research Projects in Frontier Science of Chinese Academy of Sciences (QYZDB-SSW-JSC001), the Strategic Priority Research Program of Chinese Academy of Sciences (XDB32050200, XDC01020000), and the Standardization Research Project of Chinese Academy of Sciences (BZ201800001).

摘要: 场景分割的目标是判断场景图像中每个像素的类别.场景分割是计算机视觉领域重要的基本问题之一,对场景图像的分析和理解具有重要意义,同时在自动驾驶、视频监控、增强现实等诸多领域具有广泛的应用价值.近年来,基于深度学习的场景分割技术取得了突破性进展,与传统场景分割算法相比获得分割精度的大幅度提升.首先分析和描述场景分割问题面临的3个主要难点:分割粒度细、尺度变化多样、空间相关性强;其次着重介绍了目前大部分基于深度学习的场景分割算法采用的“卷积-反卷积”结构;在此基础上,对近年来出现的基于深度学习的场景分割算法进行梳理,介绍针对场景分割问题的3个主要难点,分别提出基于高分辨率语义特征图、基于多尺度信息和基于空间上下文等场景分割算法;简要介绍常用的场景分割公开数据集;最后对基于深度学习的场景分割算法的研究前景进行总结和展望.

关键词: 场景分割, 图像分割, 深度学习, 神经网络, 全卷积网络

Abstract: Scene parsing aims to predict the category of each pixel in a scene image. Scene parsing is a fundamental and important task in computer vision. It has great significance of analyzing and understanding scene images, and has a wide range of applications in many fields such as automatic driving, video surveillance, and augmented reality. Recently, scene parsing algorithm based on deep learning has a breakthrough, and achieves great improvement compared with the traditional scene parsing algorithms. In this survey, we firstly analyze and describe the three difficulties in scene parsing, including fine-grained parsing results, multiple scale deformations, and strong spatial relationships. Then we focus on the “convolutional-deconvolutional” framework which is widely used in most of the deep learning based scene parsing algorithms. Furthermore, we introduce the newly proposed scene parsing algorithm based on deep learning in recent years. To tackle the three difficulties in scene parsing, the recent deep learning based algorithms employ high-resolution feature maps, multi-scale information and contextual information to further improve the performance of scene parsing. After that, we briefly introduce the common public scene parsing datasets. Finally, we make the conclusion for scene parsing algorithm based on deep learning and point out some potential opportunities.

Key words: scene parsing, image segmentation, deep learning, neural network, fully convolutional network