ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2020, Vol. 57 ›› Issue (7): 1531-1538.doi: 10.7544/issn1000-1239.2020.20190478

Previous Articles     Next Articles

Task-Adaptive End-to-End Networks for Stereo Matching

Li Tong1, Ma Wei1, Xu Shibiao2, Zhang Xiaopeng2   

  1. 1(Faculty of Information Technology, Beijing University of Technology, Beijing 100124);2(Institute of Automation, Chinese Academy of Sciences, Beijing 100190)
  • Online:2020-07-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61771026, 61671451) and the Open Project Program of the National Laboratory of Pattern Recognition (NLPR).

Abstract: Estimating depth/disparity information from stereo pairs via stereo matching is a classical research topic in computer vision. Recently, along with the development of deep learning technologies, many end-to-end deep networks have been proposed for stereo matching. These networks generally borrow convolutional neural network (CNN) structures originally designed for other tasks to extract features. These structures are generally redundant for the task of stereo matching. Besides, 3D convolutions in these networks are too complex to be extended for large perception fields which are helpful for disparity estimation. In order to overcome these problems, we propose a deep network structure based on the properties of stereo matching. In the proposed network, a concise and effective feature extraction module is presented. Moreover, a separated 3D convolution is introduced to avoid parameter explosion caused by increasing the size of convolution kernels. We validate our network on the dataset of SceneFlow in aspects of both accuracy and computation costs. Results show that the proposed network obtains state-of-the-art performance. Compared with the other structures, our feature extraction module can reduce 90% parameters and 25% time cost while achieving comparable accuracy. At the same time, our separated 3D convolution, accompanied by group normalization (GN), achieves lower end-point-error (EPE) than baseline methods.

Key words: stereo matching, disparity estimation, feature extraction, 3D convolution, end-to-end network

CLC Number: