基于多视角RGB-D图像帧数据融合的室内场景理解

李祥攀; 张彪; 孙凤池; 刘杰

doi:10.7544/issn1000-1239.2020.20190578

基于多视角RGB-D图像帧数据融合的室内场景理解

Indoor Scene Understanding by Fusing Multi-View RGB-D Image Frames

摘要

摘要: 对于智能机器人来说，正确地理解环境是一项非常重要且充满挑战性的能力，从而成为机器人学领域一个关键问题.随着服务机器人进入家庭成为趋势，让机器人能够依靠自身搭载的传感器和场景理解算法，以自主、可靠的方式感知并理解其所处的环境，识别环境中的各类物体及其相互关系，并建立环境模型，成为自主完成任务和实现人-机器人智能交互的前提.在规模较大的室内空间中，由于机器人常用的RGB-D(RGB depth)视觉传感器(同时获取彩色图像和深度信息)视野有限，使之难以直接获取包含整个区域的单帧图像，但机器人能够运动到不同位置，采集多种视角的图像数据，这些数据总体上能够覆盖整个场景.在此背景下，提出了基于多视角RGB-D图像帧信息融合的室内场景理解算法，在单帧RGB-D图像上进行物体检测和物体关系提取，在多帧RGB-D图像上进行物体实例检测，同时构建对应整个场景的物体关系拓扑图模型.通过对RGB-D图像帧进行划分，提取图像单元的颜色直方图特征，并提出基于最长公共子序列的跨帧物体实例检测方法，确定多帧图像之间的物体对应关联，解决了RGB-D摄像机视角变化影响图像帧融合的问题.最后，在NYUv2(NYU depth dataset v2)数据集上验证了本文算法的有效性.

Abstract: For intelligent robots, it’s an important and challenging ability to understand environment correctly, and so, scene understanding becomes a key problem in robotics community. In the future, more and more families will have service robots living with them. Family robots need to sense and understand surrounding environment reliably in an autonomous way, depending on their on-board sensors and scene understanding algorithms. Specifically, a running robot has to recognize various objects and the relations between them to autonomously implement tasks and perform intelligent man-robot interaction. Usually, RGB-D(RGB depth) visual sensors commonly used by robots to capture color and depth information have limited field of view, and so it is often difficult to directly get the single image of the whole scene in large-scale indoor spaces. Fortunately, robots can move to different locations and get more RGB-D images from multiple perspectives which can cover the whole scene in total. In this situation, we propose an indoor scene understanding algorithm based on information fusion of multi-view RGB-D images. This algorithm detects objects and extracts object relationship on single RGB-D image, then detects instance-level objects on multiple RGB-D image frames, and constructs object relation oriented topological map as the model of the whole scene. By dividing the RGB-D images into cells, then extracting color histogram features from the cells, we manage to find and associate the same objects in different frames using the object instance detection algorithm based on the longest common subsequence, overcoming the adverse influence on image fusion caused by RGB-D camera’s viewpoint changes. Finally, the experimental results on the NYUv2 dataset demonstrate the effectiveness of the proposed algorithm.

HTML全文

参考文献(0)

施引文献

资源附件(0)