基于孪生空间的单目图像目标位姿一体化标注方法
Twin Space Based Monocular Image Object Pose All-in-One Labeling Method
-
摘要: 多目标位姿估计问题是无人驾驶、人机交互等领域的基础问题之一, 但目前受采集设备限制, 该领域数据大多集中在较小空间范围, 这使得刚体位姿估计的实用价值受到限制. 针对上述问题, 提出了一种基于孪生空间的单目图像目标位姿一体化标注方法, 并设计了一套位姿标注工具 LabelImg3D. 首先, 在孪生空间中放置同焦距的虚拟相机, 并构建与真实目标等同的3维模型;然后在孪生空间中放置真实空间拍摄图像(一次投影图), 使其填充虚拟相机视场;最后对3维模型进行平移旋转, 使目标二次投影与一次投影在虚拟相机中保持一致, 从而一体化得到目标位姿. 基于该方法, 开源了一套标注工具LabelImg3D (https://github.com/CongliangLi/LabelImg3D). 通过在KITTI及P-LM数据集上的测试, 实验结果表明, 该方法对尺寸变化不明显的目标, 平均位移精度可达85%以上, 旋转精度可达90%以上, 且该方法仅借助于单目相机, 大大降低了目标3维位姿数据的采集难度.Abstract: The multi-object pose estimation problem is one of the fundamental challenges in the fields of robotics and intelligent transportation. However, the current research on 3D pose estimation of rigid objects focuses on a relatively small scale, which leads to a shortage of practical applications in this field. In this paper, we propose twin space based monocular image object pose all-in-one labeling method, and publish a pose labeling tool, called LabelImg3D. We construct a twin space equivalent to the reality space and a 3D model of the real rigid object. After that, we place the real space image (primary projection) in the twin space so that the image taken by the simulated camera in the twin space (secondary projection) can match with the primary projection. Lastly, by moving and rotating the 3D model in the twin space, the object in the secondary projection image and that in the primary projection image are aligned in the image-space so that the poses of the object can be obtained. In this paper, we open source a labeling tool LabelImg3D (https://github.com/CongliangLi/LabelImg3D). The experimental results demonstrate that our method can achieve a translation accuracy of more than 85% and a rotation accuracy of more than 90% for the same type of object with little dimensional variation. In addition, our method only uses a monocular camera, which greatly reduces the difficulty of estimating the object’s 3D positional data.