Multi-Scale Cost Aggregation Framework for 3D Shape from Focus

Yan Tao; Shang Qihui; Wu Peng; Zhang Jiangfeng; Qian Yuhua; Chen Bin

doi:10.7544/issn1000-1239.202330984

Journal of Computer Research and Development > 2025 > Accepted Manuscript > DOI: 10.7544/issn1000-1239.202330984 CSTR: 32373.14.issn1000-1239.202330984

Yan Tao, Shang Qihui, Wu Peng, Zhang Jiangfeng, Qian Yuhua, Chen Bin. Multi-Scale Cost Aggregation Framework for 3D Shape from Focus[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202330984

Citation:

PDF (4693 KB)

Multi-Scale Cost Aggregation Framework for 3D Shape from Focus

Yan Tao^{1, 2,},
Shang Qihui^{1, 2},
Wu Peng¹,
Zhang Jiangfeng^{1, 2},
Qian Yuhua^1, ,,
Chen Bin^{3, 4}

1.
Institute of Big Data Science and Industry, Shanxi University, Taiyuan 030006
2.
School of Computer and Information Technology, Shanxi University, Taiyuan 030006
3.
Chongqing Research Institute of Harbin Institute of Technology, Chongqing 401151
4.
International Research Institute for Artificial Intelligence, Harbin Institute of Technology, Shenzhen, guangdong 518055

Funds: This work was supported by the National Key Natural Science Foundation of China (62136005), the National Natural Science Foundation of China (62472268), the National Key Research and Development Program of China (2021ZD0112400), the Funds for Central Government Guided Local Science and Technology Development of China (YDZJSX20231C001，YDZJSX20231B001), the Research Project Supported by Shanxi Scholarship Council of China (2024-020), and the Graduate Education Innovation Project of Shanxi Province (2024KY036).

More Information

Author Bio:
Yan Tao: born in 1987. PhD, associate professor, member of CCF. His main research interests include 3D shape reconstruction and machine vision

Shang Qihui: born in 1998. Master degree candidate. His main research interests include 3D shape reconstruction and data mining

Wu Peng: born in 1987. PhD, associate professor, member of CCF. His main research interests include real time operating system and blockchain

Zhang Jiangfeng: born in 1998. PhD candidate, student member of CCF. His main research interests include deep learning and 3D shape reconstruction

Qian Yuhua: born in 1976. PhD, professor, PhD supervisor, senior member of CCF. His main research interests include artificial intelligence and machine learning

Chen Bin: born in 1970. PhD, professor, PhD supervisor, member of CCF. His main research interests include artificial intelligence and Large-scale models
Received Date: December 04, 2023
Revised Date: January 01, 2025
Accepted Date: January 25, 2025
Available Online: January 25, 2025

Graphical Abstract

Abstract

Abstract

3D shape reconstruction aims to recover the 3D structure information of the scene by using image sequences with different focus levels. Most of the existing 3D shape reconstruction methods evaluate the focus level of the image sequence from a single scale, and guide the reconstruction process by introducing regularization or post-processing methods. Due to the limitation of the selection space of depth information, the reconstruction results often cannot converge effectively. To address this issue, this paper proposes a multi-scale cost aggregation framework for shape from focus, MSCAS. Firstly, non-downsampling multi-scale transformation is introduced to increase the depth information selection space of the input image sequence, and then the cost aggregation is performed by combining the intra-scale sequence correlation and the inter-scale information constraint. Through this expansion-aggregation mode, the doubling of scene depth representation information and the effective fusion of cross-scale and cross-sequence representation information are realized. As a general framework, the MSCAS framework can embed existing model design methods and deep learning methods to achieve performance improvement. The experimental results show that the MSCAS framework in this paper reduces the root mean square error (RMSE) on average by 14.91% and improves the structural similarity (SSIM) by 56.69% in the four datasets after embedding the model design class SFF method. After embedding the deep learning class SFF method, the RMSE in the four datasets decreases by an average of 1.55% and the SSIM increases by an average of 1.61%. These results verify the effectiveness of the MSCAS framework.
- multi-scale cost aggregation,
- depth estimation,
- focus measure,
- multi-focus image sequence,
- 3D shape reconstruction

FullText(HTML)

References (36)

References

[1]	Nayar S K, Nakagawa Y. Shape from focus[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1994, 16(8): 824−831 doi: 10.1109/34.308479
[2]	Tao M W, Srinivasan P P, Hadap S, et al. Shape estimation from shading, defocus, and correspondence using light-field angular coherence[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 546−560
[3]	Clerc M, Mallat S. The texture gradient equation for recovering shape from texture[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(4): 536−549 doi: 10.1109/34.993560
[4]	Lee J Y, Park R H. Complex-valued disparity: Unified depth model of depth from stereo, depth from focus, and depth from defocus based on the light field gradient[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(3): 830−841 doi: 10.1109/TPAMI.2019.2946159
[5]	Zhan Yu, Guo Xinqing, Lin Haibing, et al. Line assisted light field triangulation and stereo matching[C]//Proc of the 14th IEEE Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2013: 2792−2799
[6]	Li Jianqiao, Lu Mindong, Nian Ze. Continuous depth map reconstruction from light fields[J]. IEEE Transactions on Image Processing, 2015, 24(11): 3257−3265 doi: 10.1109/TIP.2015.2440760
[7]	闫涛,陈斌,刘凤娴,等. 基于多景深融合模型的显微三维重建方法[J]. 计算机辅助设计与图形学学报,2017,29(9):1613−1623 doi: 10.3969/j.issn.1003-9775.2017.09.004 Yan Tao, Chen Bin, Liu Fengxian, et al. Multi-focus image fusion model for micro 3D reconstruction[J]. Journal of Computer-Aided Design and Computer Graphics, 2017, 29(9): 1613−1623 (in Chinese) doi: 10.3969/j.issn.1003-9775.2017.09.004
[8]	闫涛,高浩轩,张江峰,等. 分组并行的实时微观三维形貌重建方法[J]. 软件学报,2024,35(4):1717−1731 Yan Tao, Gao Haoxuan, Zhang Jiangfeng, et al. Grouping parallel lightweight real-time microscopic 3D shape reconstruction method[J]. Journal of Software, 2024, 35(4): 1717−1731(in Chinese)
[9]	闫涛,钱宇华,李飞江,等. 三维时频变换视角的智能微观三维形貌重建方法[J]. 中国科学:信息科学,2023,53(2):282−308 doi: 10.1360/SSI-2021-0386 Yan Tao, Qian Yuhua, Li Feijiang, et al. Intelligent microscopic 3D shape reconstruction method based on 3D time-frequency transformation[J]. Scientia Sinica Informationis, 2023, 53(2): 282−308(in Chinese) doi: 10.1360/SSI-2021-0386
[10]	Ali U, Mahmood M T. Robust focus volume regularization in shape from focus[J]. IEEE Transactions on Image Processing, 2021, 30: 7215−7227 doi: 10.1109/TIP.2021.3100268
[11]	Yang Fengting, Huang Xiaolei, Zhou Zihan. Deep depth from focus with differential focus volume[C]//Proc of the 35th IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2022: 12642−12651
[12]	Muhammad M S, Choi T S. Sampling for shape from focus in optical microscopy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(3): 564−573 doi: 10.1109/TPAMI.2011.144
[13]	Pertuz S, Puig D, Garcia M A. Analysis of focus measure operators for shape-from-focus[J]. Pattern Recognition, 2013, 46(5): 1415−1432 doi: 10.1016/j.patcog.2012.11.011
[14]	Ma Z, Kim D, Shin Y G. Shape-from-focus reconstruction using nonlocal matting Laplacian prior followed by MRF based refinement[J]. Pattern Recognition, 2020, 103: 107302 doi: 10.1016/j.patcog.2020.107302
[15]	Thelen A, Frey S, Hirsch S, et al. Improvements in shape-from-focus for holographic reconstructions with regard to focus operators, neighborhood-size, and height value interpolation[J]. IEEE Transactions on Image Processing, 2009, 18(1): 151−157 doi: 10.1109/TIP.2008.2007049
[16]	Jeon H G, Surh J, Im S, et al. Ring difference filter for fast and noise robust depth from focus[J]. IEEE Transactions on Image Processing, 2020, 29: 1045−1060 doi: 10.1109/TIP.2019.2937064
[17]	Yan Tao, Wu Peng, Qian Yuhua, et al. Multiscale fusion and aggregation PCNN for 3D shape recovery[J]. Information Sciences, 2020, 536: 277−297 doi: 10.1016/j.ins.2020.05.100
[18]	Singh S, Gupta D, Anand RS, et al. Nonsubsampled shearlet based CT and MR medical image fusion using biologically inspired spiking neural network[J]. Biomedical Signal Processing and Control, 2015, 18: 91−101 doi: 10.1016/j.bspc.2014.11.009
[19]	Minhas R, Mohammed A A, Wu Q M J. Shape from focus using fast discrete curvelet transform[J]. Pattern Recognition, 2011, 44(4): 839−853 doi: 10.1016/j.patcog.2010.10.015
[20]	Wee C Y, Paramesran R. Measure of image sharpness using eigenvalues[J]. Information Sciences, 2007, 177(12): 2533−2552 doi: 10.1016/j.ins.2006.12.023
[21]	Yan Tao, Hu Zhiguo, Qian Yuhua, et al. 3D shape reconstruction from multifocus image fusion using a multidirectional modified Laplacian operator[J]. Pattern Recognition, 2020, 98: 107065 doi: 10.1016/j.patcog.2019.107065
[22]	Honauer K, Johannsen O, Kondermann D, et al. A dataset and evaluation methodology for depth estimation on 4D Light Fields[C]//Proc of the 13th Asian Conf on Computer Vision. Berlin: Springer, 2016: 19−34
[23]	Won C, Jeon H. Learning depth from focus in the wild[J]. arXiv preprint, arXiv: 2207.09658, 2022
[24]	Mayer N, Ilg E, Husser P, et al. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation[C]//Proc of the 30th IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 4040−4048
[25]	Maximov M, Galim K, Leal L. Focus on defocus: bridging the synthetic to real domain gap for depth estimation[C]//Proc of the 35th IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 1071−1080
[26]	Hazirbas C, Soyer S G, Staab M C, et al. Deep depth from focus[C]//Proc of the 14th Asian Conf on Computer Vision. Berlin: Springer, 2018: 525−541
[27]	Wang Ninghsu, Wang Ren, Liu Yulun, et al. Bridging unsupervised and supervised depth from focus via all-in-focus supervision[C]//Proc of the 21st IEEE Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2021: 12621−12631
[28]	张江峰,闫涛,王克琪,等. 多景深图像聚焦信息的三维形貌重建:数据集与模型[J]. 计算机学报,2023,46(8):1734−1752 doi: 10.11897/SP.J.1016.2023.01734 Zhang Jiangfeng, Yan Tao, Wang Keqi, et al. 3D shape reconstruction from multi depth of field images: Datasets and models[J]. Chinese Journal of Computers, 2023, 46(8): 1734−1752(in Chinese) doi: 10.11897/SP.J.1016.2023.01734
[29]	Liu C, Yuen J, Torralba A. SIFT flow: Dense correspondence across scene and its application[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(5): 978−994 doi: 10.1109/TPAMI.2010.147
[30]	Hosni A, Rheman C, Bleyer M, et al. Fast cost-volume filtering for visual correspondence and beyond[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(2): 504−511 doi: 10.1109/TPAMI.2012.156
[31]	He Kaiming, Sun Jian, Tang Xiaoou. Guided Image Filtering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(6): 1397−1409 doi: 10.1109/TPAMI.2012.213
[32]	Yang Qinxiong. A non-local cost aggregation method for stereo matching[C]//Proc of the 25th IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2012: 1402−1409
[33]	Zhang Kang, Fang Yuqiang, Min Dongbo, et al. Cross-scale cost aggregation for stereo matching[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2017, 27(5): 965−976 doi: 10.1109/TCSVT.2015.2513663
[34]	梁新彦,钱宇华,郭倩,等. 多粒度融合驱动的超多视图分类方法[J],计算机研究与发展,2022,59(8):1653−1667 Liang Xinyan, Qian Yuhua, Guo Qian, et al. Multi-granulation fusion-driven method for many-view classification[J]. Journal of Computer Research and Development, 2022, 59(8): 1653−1667 (in Chinese)
[35]	王晓慧,贾珈,蔡莲红. 基于小波图像融合的表情细节合成[J],计算机研究与发展,2013,50(2):387−393 Wang Xiaohui, Jia Jia, Cai Lianhong. Expression detail synthesis based on wavelet-based image fusion[J]. Journal of Computer Research and Development, 2013, 50(2): 387−393 (in Chinese)
[36]	Zhang Lei, Bao Paul, Wu Xiaolin. Multiscale LMMSE-based image denoising with optimal wavelet selection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2005, 15(4): 469−481 doi: 10.1109/TCSVT.2005.844456