局部特征引导的室内自监督单目深度估计方法

艾浩军; 张锋; 吕鹏飞; 唐雪华; 王中元

doi:10.7544/issn1000-1239.202440951

局部特征引导的室内自监督单目深度估计方法

Improving Self-Supervised Monocular Indoor Depth Estimation with Local Feature Guidance

摘要

摘要: 近年来，自监督单目深度估计方法取得了显著的性能提升，但在复杂的室内场景生成结构化深度图时性能明显下降，为此，提出局部特征引导知识蒸馏的自监督单目深度估计方法（LoFtDepth）改进训练过程. 首先，使用预训练的深度估计网络预测结构化的相对深度图作为深度先验，从中提取局部特征作为边界点引导局部深度估计细化，减少深度无关特征的干扰，将深度先验中的边界知识传递到自监督深度估计网络中. 同时，引入逆自动掩模加权的表面法线损失，通过对齐自监督网络预测的深度图和深度先验在无纹理区域的法线方向，提升深度估计精度. 最后，根据相机运动的连续性，对相机位姿残差估计施加位姿一致性约束以适应室内场景相机位姿的频繁变化，减小训练误差，提升模型性能. 主要的室内公开数据集上的实验结果表明，LoFtDepth性能提升显著，将相对误差降至0.121，且生成的深度图具有更高的全局准确度和良好的结构特征.

Abstract: In recent years, self-supervised monocular depth estimation methods have achieved impressive improvements. However, their performance degrades significantly when generating structured depth maps in complex indoor scenarios. To bridge this gap, focusing on the training process, we propose LoFtDepth, a novel method that combines self-supervised monocular depth estimation with local feature guided knowledge distillation. Firstly, an off-the-shelf depth estimation network is used to generate structured relative depth maps as depth priors. Local features are then extracted from these priors as boundary points, guiding the local depth refinement. This reduces the interference of depth-irrelevant features and transfers the boundary knowledge of depth priors to the self-supervised depth estimation network. Additionally, we introduce an inverse auto-mask weighted surface normal loss. This encourages normal directions of depth maps predicted by self-supervised network to align with those of depth priors in untextured regions. As a result, the depth estimation accuracy is enhanced. Finally, according to the coherence of camera motion, we impose a pose consistency constraint on residual pose estimation. This constraint enables effective adaptation to indoor scenes where camera poses change frequently, thereby mitigating training errors and boosting model performance. Extensive experiments on major indoor datasets demonstrate that LoFtDepth outperforms previous methods. It reduces the absolute relative error to 0.121, and successfully generates accurate and well-structured depth maps.

HTML全文

参考文献(43)

施引文献

资源附件(0)