Abstract:
In this paper, the mutually beneficial relationship between depth estimation and semantic segmentation is investigated, and a self-supervised monocular depth estimation method for joint semantic segmentation USegDepth is proposed. The shared encoder for semantic segmentation and depth estimation is implemented to achieve semantic guidance. To further improve the across multiple tasks performance of the encoder, a multi-task feature extraction module is designed. The module is stacked to generate the shared encoder, solving the poor feature representation problem of the model due to limited receptive field and lack of cross-channel interaction, and the model accuracy is improved further. And a cross-task interaction module is proposed for bidirectional cross-domain information interaction to refine the depth features, improving depth estimation performance, especially in weak texture regions and object boundaries with limited luminosity consistency supervision. Through training and evaluation on KITTI dataset, the experimental results show that the mean square relative error of USegDepth is reduced by 0.176 percentage points compared with that of SGDepth, and the threshold accuracy reaches 98.4% at a threshold value of 1.25
3, proving the high accuracy of USegDepth in depth prediction.