Scene parsing aims to predict the category of each pixel in a scene image. Scene parsing is a fundamental and important task in computer vision. It has great significance of analyzing and understanding scene images, and has a wide range of applications in many fields such as automatic driving, video surveillance, and augmented reality. Recently, scene parsing algorithm based on deep learning has a breakthrough, and achieves great improvement compared with the traditional scene parsing algorithms. In this survey, we firstly analyze and describe the three difficulties in scene parsing, including fine-grained parsing results, multiple scale deformations, and strong spatial relationships. Then we focus on the “convolutional-deconvolutional” framework which is widely used in most of the deep learning based scene parsing algorithms. Furthermore, we introduce the newly proposed scene parsing algorithm based on deep learning in recent years. To tackle the three difficulties in scene parsing, the recent deep learning based algorithms employ high-resolution feature maps, multi-scale information and contextual information to further improve the performance of scene parsing. After that, we briefly introduce the common public scene parsing datasets. Finally, we make the conclusion for scene parsing algorithm based on deep learning and point out some potential opportunities.