Citation: | Li Congliang, Sun Shijie, Zhang Zhaoyang, Liu Zedong, Lei Qi, Song Huansheng. Twin Space Based Monocular Image Object Pose All-in-One Labeling Method[J]. Journal of Computer Research and Development, 2023, 60(11): 2671-2680. DOI: 10.7544/issn1000-1239.202220383 |
The multi-object pose estimation problem is one of the fundamental challenges in the fields of robotics and intelligent transportation. However, the current research on 3D pose estimation of rigid objects focuses on a relatively small scale, which leads to a shortage of practical applications in this field. In this paper, we propose twin space based monocular image object pose all-in-one labeling method, and publish a pose labeling tool, called LabelImg3D. We construct a twin space equivalent to the reality space and a 3D model of the real rigid object. After that, we place the real space image (primary projection) in the twin space so that the image taken by the simulated camera in the twin space (secondary projection) can match with the primary projection. Lastly, by moving and rotating the 3D model in the twin space, the object in the secondary projection image and that in the primary projection image are aligned in the image-space so that the poses of the object can be obtained. In this paper, we open source a labeling tool LabelImg3D (https://github.com/CongliangLi/LabelImg3D). The experimental results demonstrate that our method can achieve a translation accuracy of more than 85% and a rotation accuracy of more than 90% for the same type of object with little dimensional variation. In addition, our method only uses a monocular camera, which greatly reduces the difficulty of estimating the object’s 3D positional data.
[1] |
Liu Jinhui, Zou Zhikang, Ye Xiaoqing, et al. Leaping from 2D detection to efficient 6DoF object pose estimation[C]//Proc of European Conf on Computer Vision. Berlin: Springer, 2020: 707−714
|
[2] |
Unicomb J, Dantanarayana L, Arukgoda J, et al. Distance function based 6DoF localization for unmanned aerial vehicles in GPS denied environments[C]//Proc of the 30th IEEE Int Conf on Intelligent Robots and Systems. Piscataway, NJ: IEEE, 2017: 5292−5297
|
[3] |
宗丹,李淳芃,夏时洪,等. 基于关键姿态分析的运动图自动构建[J]. 计算机研究与发展,2010,47(8):1321−1328
Zong Dan, Li Chunpeng, Xia Shihong, et al. Key-postures based automated construction of motion graph[J]. Journal of Computer Research and Development, 2010, 47(8): 1321−1328 (in Chinese)
|
[4] |
Albiero V, Chen Xingyu, Yin Xi, et al. Img2pose: Face alignment and detection via 6DoF, face pose estimation[C]//Proc of the 30th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 7617−7627
|
[5] |
Wu Di, Zhuang Zhaoyong, Xiang Canqun, et al. 6D-VNet: End-to-end 6DoF vehicle pose estimation from monocular RGB images[C]//Proc of the 28th IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops (CVPRW). Piscataway, NJ: IEEE, 2019: 1238−1247
|
[6] |
Mo Hong, Zhao Xuanming, Wang Feiyue. Application of interval type-2 fuzzy sets in unmanned vehicle visual guidance[J]. International Journal of Fuzzy Systems, 2019, 21(6): 1661−1668 doi: 10.1007/s40815-019-00680-4
|
[7] |
Saadi L, Besbes B, Kramm S, et al. Optimizing RGB-D fusion for accurate 6DoF pose estimation[J]. IEEE Robotics and Automation Letters, 2021, 6(2): 2413−2420 doi: 10.1109/LRA.2021.3061347
|
[8] |
Komorowski J, Wysoczanska M, Trzcinski T. Egonn: Egocentric neural network for point cloud based 6DoF relocalization at the city scale[J]. IEEE Robotics and Automation Letters, 2021, 7(2): 722−729
|
[9] |
王硕,祝海江,李和平,等. 基于共面圆的距离传感器与相机的相对位姿标定[J]. 自动化学报,2020,46(6):1154−1165 doi: 10.16383/j.aas.c190115
Wang Shuo, Zhu Haijiang, Li Heping, et al. Relative pose calibration of distance sensor and camera based on coplanar circles[J]. Journal of Automation, 2020, 46(6): 1154−1165 (in Chinese) doi: 10.16383/j.aas.c190115
|
[10] |
李祥攀,张彪,孙凤池,等. 基于多视角RGB-D图像帧数据融合的室内场景理解[J]. 计算机研究与发展,2020,57(6):1218−1226 doi: 10.7544/issn1000-1239.2020.20190578
Li Xiangpan, Zhang Biao, Sun Fengchi, et al. Indoor scene understanding by fusing multi-view RGB-D image frames[J]. Journal of Computer Research and Development, 2020, 57(6): 1218−1226 (in Chinese) doi: 10.7544/issn1000-1239.2020.20190578
|
[11] |
Rebecq H, Ranftl R, Koltun V, et al. High speed and high dynamic range video with an event camera[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(6): 1964−1980 doi: 10.1109/TPAMI.2019.2963386
|
[12] |
Tekin B, Sinnha S N, Fua P. Real-time seamless single shot 6D object pose prediction[C]//Proc of the 27th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018: 292−301
|
[13] |
Xiang Yu, Schmidt T, Narayanan V, et al. PoseCNN: A convolutional neural network for 6D object pose estimation in cluttered scenes[J]. arXiv preprint, arXiv: 1711.00199, 2017
|
[14] |
Brachmann E, Krull A, Michel F, et al. Learning 6D object pose estimation using 3D object coordinates[C]//Proc of European Conf on Computer Vision. Berlin: Springer, 2014: 536−551
|
[15] |
Rennie C, Shome R, Bekris K E, et al. A dataset for improved RGBD-based object detection and pose estimation for warehouse pick-and-place[J]. IEEE Robotics and Automation Letters, 2016, 1(2): 1179−1185 doi: 10.1109/LRA.2016.2532924
|
[16] |
Hodan T, Haluza P, Obdrzalek Š, et al. T-LESS: An RGB-D dataset for 6D pose estimation of texture-less objects[C]//Proc of 2017 IEEE Winter Conf on Applications of Computer Vision (WACV). Piscataway, NJ: IEEE, 2017: 880−888
|
[17] |
Kaskman R, Zakharov S, Shugurov I, et al. HomebrewedDB: RGB-D dataset for 6D pose estimation of 3D objects[C]//Proc of 2019 IEEE/CVF Int Conf on Computer Vision Workshop (ICCVW). Piscataway, NJ: IEEE, 2020: 2767−2776
|
[18] |
Gelger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite[C]//Proc of the 21st IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2012: 3354−3361
|
[19] |
Slinko I, Vorontsova A, Zhukov D, et al. Training deep SLAM on single frames[J]. arXiv preprint, arXiv: 1912.05405, 2019
|
[20] |
Kumar V, Wang Qiang, Wang Minghua, et al. Computer vision based object grasping 6DoF robotic arm using picamera[C]//Proc of the 4th Int Conf on Control, Automation and Robotics (ICCAR). Piscataway, NJ: IEEE, 2018: 111−115
|
[21] |
赵丽科,郑顺义,王晓南,等. 单目序列的刚体目标位姿测量[J]. 浙江大学学报:工学版,2018,52(12):2372−2381
Zhao Like, Zheng Shunyi, Wang Xiaonan, et al. Monocular sequence of rigid body target positional measurements[J]. Journal of Zhejiang University: Engineering Edition, 2018, 52(12): 2372−2381 (in Chinese)
|
[22] |
袁媛,刘柯,孙增玉,等. 单目视觉三维运动位姿测量方法研究[J]. 宇航计测技术,2019,39(6):1−6 doi: 10.12060/j.issn.1000-7202.2019.06.01
Yuan Yuan, Liu Ke, Sun Zengyu, et al. Research on monocular vision 3D motion pose measurement method[J]. Astronautics Measurement Technology, 2019, 39(6): 1−6 (in Chinese) doi: 10.12060/j.issn.1000-7202.2019.06.01
|
[23] |
An Pengju, Fang Kun, Jiang Qiangqiang, et al. Measurement of rock joint surfaces by using smartphone structure from motion (SfM) photogrammetry[J]. Sensors, 2021, 21(3): 922−945 doi: 10.3390/s21030922
|
[24] |
Zhao Chaoqiang, Sun Qiyu, Zhang Chongzhen, et al. Monocular depth estimation based on deep learning: An overview[J]. Science China Technological Sciences, 2020, 63(9): 1612−1627 doi: 10.1007/s11431-020-1582-8
|
[25] |
Xiang Yu, Kim W, Chen Wei et al. Objectnet3D: A large scale database for 3D object recognition[C]//Proc of European Conf on Computer Vision. Berlin: Springer, 2016: 160−176
|
[26] |
Schroeder W J, Avila L S, Hoffman W. Visualizing with VTK: A tutorial[J]. IEEE Computer Graphics and Applications, 2000, 20(5):20−27
|
[1] | Wang Haotian, Ding Yan, He Xianhao, Xiao Guoqing, Yang Wangdong. SparseMode: A Sparse Compiler Framework for Efficient SpMV Vectorized Code Generation[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550139 |
[2] | Yan Zhiyuan, Xie Biwei, Bao Yungang. HVMS: A Hybrid Vectorization-Optimized Mechanism of SpMV[J]. Journal of Computer Research and Development, 2024, 61(12): 2969-2984. DOI: 10.7544/issn1000-1239.202330204 |
[3] | Feng Jingge, He Yeping, Tao Qiuming, Ma Hengtai. SLP Vectorization Method Based on Multiple Isomorphic Transformations[J]. Journal of Computer Research and Development, 2023, 60(12): 2907-2927. DOI: 10.7544/issn1000-1239.202220354 |
[4] | Li Xiaodan, Wu Wenling, Zhang Li. Efficient Search for Optimal Vector Permutations of uBlock-like Structures[J]. Journal of Computer Research and Development, 2022, 59(10): 2275-2285. DOI: 10.7544/issn1000-1239.20220485 |
[5] | Chen Yu, Liu Zhongjin, Zhao Weiwei, Ma Yuan, Shi Zhiqiang, Sun Limin. A Large-Scale Cross-Platform Homologous Binary Retrieval Method[J]. Journal of Computer Research and Development, 2018, 55(7): 1498-1507. DOI: 10.7544/issn1000-1239.2018.20180078 |
[6] | Li Junnan, Yang Xiangrui, Sun Zhigang. DrawerPipe: A Reconfigurable Packet Processing Pipeline for FPGA[J]. Journal of Computer Research and Development, 2018, 55(4): 717-728. DOI: 10.7544/issn1000-1239.2018.20170927 |
[7] | Zhao Jianghua, Mu Shuting, Wang Xuezhi, Lin Qinghui, Zhang Xi, Zhou Yuanchun. Crowdsourcing-Based Scientific Data Processing[J]. Journal of Computer Research and Development, 2017, 54(2): 284-294. DOI: 10.7544/issn1000-1239.2017.20160850 |
[8] | Luo Zhangqi, Huang Kun, Zhang Dafang, Guan Hongtao, Xie Gaogang. A Many-Core Processor Resource Allocation Scheme for Packet Processing[J]. Journal of Computer Research and Development, 2014, 51(6): 1159-1166. |
[9] | Wen Shuguang, Xie Gaogang. libpcap-MT: A General Purpose Packet Capture Library with Multi-Thread[J]. Journal of Computer Research and Development, 2011, 48(5): 756-764. |
[10] | Tian Daxin, Liu Yanheng, Li Yongli, Tang Yi. A Fast Matching Algorithm and Conflict Detection for Packet Filter Rules[J]. Journal of Computer Research and Development, 2005, 42(7): 1128-1135. |