Zhang Rui, Li Jintao. A Survey on Algorithm Research of Scene Parsing Based on Deep Learning[J]. Journal of Computer Research and Development, 2020, 57(4): 859-875. DOI: 10.7544/issn1000-1239.2020.20190513
Citation:
Zhang Rui, Li Jintao. A Survey on Algorithm Research of Scene Parsing Based on Deep Learning[J]. Journal of Computer Research and Development, 2020, 57(4): 859-875. DOI: 10.7544/issn1000-1239.2020.20190513
Zhang Rui, Li Jintao. A Survey on Algorithm Research of Scene Parsing Based on Deep Learning[J]. Journal of Computer Research and Development, 2020, 57(4): 859-875. DOI: 10.7544/issn1000-1239.2020.20190513
Citation:
Zhang Rui, Li Jintao. A Survey on Algorithm Research of Scene Parsing Based on Deep Learning[J]. Journal of Computer Research and Development, 2020, 57(4): 859-875. DOI: 10.7544/issn1000-1239.2020.20190513
1(Institute of Computing Technology, Chinese Academic of Sciences, Beijing 100190)
2(Cambricon Tech. Ltd, Shanghai 200135)
Funds: This work was supported by the National Key Research and Development Program of China (2017YFA0700900, 2017YFA0700902, 2017YFA0700901, 2017YFB1003101), the National Natural Science Foundation of China (61432016, 61532016, 61672491, 61602441, 61602446, 61732002, 61702478, 61732007, 61732020), the Beijing Natural Science Foundation (JQ18013), the National Basic Research Program of China (973 Program) (2015CB358800), the National Science and Technology Major Projects of Hegaoji (2018ZX01031102), the Transformation and Transfer of Scientific and Technological Achievements of Chinese Academy of Sciences (KFJ-HGZX-013), the Key Research Projects in Frontier Science of Chinese Academy of Sciences (QYZDB-SSW-JSC001), the Strategic Priority Research Program of Chinese Academy of Sciences (XDB32050200, XDC01020000), and the Standardization Research Project of Chinese Academy of Sciences (BZ201800001).
Scene parsing aims to predict the category of each pixel in a scene image. Scene parsing is a fundamental and important task in computer vision. It has great significance of analyzing and understanding scene images, and has a wide range of applications in many fields such as automatic driving, video surveillance, and augmented reality. Recently, scene parsing algorithm based on deep learning has a breakthrough, and achieves great improvement compared with the traditional scene parsing algorithms. In this survey, we firstly analyze and describe the three difficulties in scene parsing, including fine-grained parsing results, multiple scale deformations, and strong spatial relationships. Then we focus on the “convolutional-deconvolutional” framework which is widely used in most of the deep learning based scene parsing algorithms. Furthermore, we introduce the newly proposed scene parsing algorithm based on deep learning in recent years. To tackle the three difficulties in scene parsing, the recent deep learning based algorithms employ high-resolution feature maps, multi-scale information and contextual information to further improve the performance of scene parsing. After that, we briefly introduce the common public scene parsing datasets. Finally, we make the conclusion for scene parsing algorithm based on deep learning and point out some potential opportunities.