Abstract:
In this paper, the mutually beneficial relationship between depth estimation and semantic segmentation is investigated, and proposed a self-supervised monocular depth estimation method for joint semantic segmentation USegDepth. The shared encoder for semantic segmentation and depth estimation is implemented to achieve semantic guidance. To further improve the across multiple tasks performance of the encoder, a multi-task feature extraction module was designed, which is stacked to generate the shared encoder, solving the poor feature representation problem of the model due to limited receptive field and lack of cross-channel interaction, further improves the model accuracy. And a cross-task interaction module is proposed for bidirectional cross-domain information interaction to refine the depth features, improving depth estimation performance, especially in weak texture regions and object boundaries with limited luminosity consistency supervision. Through training and evaluation on the KITTI dataset, the experimental results show that the mean square relative error of the USegDepth is reduced by 0.176 percentage points compared to SGDepth, and the threshold accuracy reaches 98.4% at a threshold value of 1.25^3, proving the high accuracy of the USegDepth in depth prediction.