ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2020, Vol. 57 ›› Issue (6): 1218-1226.doi: 10.7544/issn1000-1239.2020.20190578

Previous Articles     Next Articles

Indoor Scene Understanding by Fusing Multi-View RGB-D Image Frames

Li Xiangpan1, Zhang Biao1, Sun Fengchi2, Liu Jie3   

  1. 1(College of Computer Science, Nankai University, Tianjin 300750);2(College of Software, Nankai University, Tianjin 300750);3(College of Artificial Intelligence, Nankai University, Tianjin 300750)
  • Online:2020-06-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61873327).

Abstract: For intelligent robots, it’s an important and challenging ability to understand environment correctly, and so, scene understanding becomes a key problem in robotics community. In the future, more and more families will have service robots living with them. Family robots need to sense and understand surrounding environment reliably in an autonomous way, depending on their on-board sensors and scene understanding algorithms. Specifically, a running robot has to recognize various objects and the relations between them to autonomously implement tasks and perform intelligent man-robot interaction. Usually, RGB-D(RGB depth) visual sensors commonly used by robots to capture color and depth information have limited field of view, and so it is often difficult to directly get the single image of the whole scene in large-scale indoor spaces. Fortunately, robots can move to different locations and get more RGB-D images from multiple perspectives which can cover the whole scene in total. In this situation, we propose an indoor scene understanding algorithm based on information fusion of multi-view RGB-D images. This algorithm detects objects and extracts object relationship on single RGB-D image, then detects instance-level objects on multiple RGB-D image frames, and constructs object relation oriented topological map as the model of the whole scene. By dividing the RGB-D images into cells, then extracting color histogram features from the cells, we manage to find and associate the same objects in different frames using the object instance detection algorithm based on the longest common subsequence, overcoming the adverse influence on image fusion caused by RGB-D camera’s viewpoint changes. Finally, the experimental results on the NYUv2 dataset demonstrate the effectiveness of the proposed algorithm.

Key words: object detection, object instance detection, RGB-D image, object-relation topological map, scene understanding

CLC Number: