Improving Self-Supervised Monocular Indoor Depth Estimation Method with Local Feature Guidance

Ai Haojun; Zhang Feng; Lü Pengfei; Tang Xuehua; Wang Zhongyuan

doi:10.7544/issn1000-1239.202440951

Ai Haojun, Zhang Feng, Lü Pengfei, Tang Xuehua, Wang Zhongyuan. Improving Self-Supervised Monocular Indoor Depth Estimation Method with Local Feature GuidanceJ. Journal of Computer Research and Development, 2026, 63(2): 338-351. DOI: 10.7544/issn1000-1239.202440951

Citation:

Improving Self-Supervised Monocular Indoor Depth Estimation Method with Local Feature Guidance

Graphical Abstract

Abstract

Abstract

In recent years, self-supervised monocular depth estimation methods have achieved impressive improvements. However, their performance degrades significantly when generating structured depth maps in complex indoor scenarios. To bridge this gap, focusing on the training process, we propose LoFtDepth, a novel method that combines self-supervised monocular depth estimation with local feature guided knowledge distillation. Firstly, an off-the-shelf depth estimation network is used to generate structured relative depth maps as depth priors. Local features are then extracted from these priors as boundary points, guiding the local depth refinement. This reduces the interference of depth-irrelevant features and transfers the boundary knowledge of depth priors to the self-supervised depth estimation network. Additionally, we introduce an inverse auto-mask weighted surface normal loss. This encourages normal directions of depth maps predicted by self-supervised network to align with those of depth priors in untextured regions. As a result, the depth estimation accuracy is enhanced. Finally, according to the coherence of camera motion, we impose a pose consistency constraint on residual pose estimation. This constraint enables effective adaptation to indoor scenes where camera poses change frequently, thereby mitigating training errors and boosting model performance. Extensive experiments on major indoor datasets demonstrate that LoFtDepth outperforms previous methods. It reduces the absolute relative error to 0.121, and successfully generates accurate and well-structured depth maps.

FullText(HTML)

References (43)

Cited By

Turn off MathJax

Article Contents

Improving Self-Supervised Monocular Indoor Depth Estimation Method with Local Feature Guidance

Abstract

Catalog

Export File

Citation

Format

Content