FloatingFusion: Depth from ToF and Image-Stabilized Stereo Cameras

Meuleman, Andreas; Kim, Hakyeong; Tompkin, James; Kim, Min H.

doi:10.1007/978-3-031-19769-7_35

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13661))

Included in the following conference series:

European Conference on Computer Vision

3245 Accesses
3 Citations
3 Altmetric

Abstract

High-accuracy per-pixel depth is vital for computational photography, so smartphones now have multimodal camera systems with time-of-flight (ToF) depth sensors and multiple color cameras. However, producing accurate high-resolution depth is still challenging due to the low resolution and limited active illumination power of ToF sensors. Fusing RGB stereo and ToF information is a promising direction to overcome these issues, but a key problem remains: to provide high-quality 2D RGB images, the main color sensor’s lens is optically stabilized, resulting in an unknown pose for the floating lens that breaks the geometric relationships between the multimodal image sensors. Leveraging ToF depth estimates and a wide-angle RGB camera, we design an automatic calibration technique based on dense 2D/3D matching that can estimate camera extrinsic, intrinsic, and distortion parameters of a stabilized main RGB sensor from a single snapshot. This lets us fuse stereo and ToF cues via a correlation volume. For fusion, we apply deep learning via a real-world training dataset with depth supervision estimated by a neural reconstruction method. For evaluation, we acquire a test dataset using a commercial high-power depth camera and show that our approach achieves higher accuracy than existing baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agresti, G., Minto, L., Marin, G., Zanuttigh, P.: Deep learning for confidence information in stereo and ToF data fusion. In: ICCV Workshops (2017)
Google Scholar
Agresti, G., Minto, L., Marin, G., Zanuttigh, P.: Stereo and ToF data fusion by learning from synthetic data. Inf. Fus. 49, 161–173 (2019)
Google Scholar
Agresti, G., Zanuttigh, P.: Deep learning for multi-path error removal in ToF sensors. In: ECCV Workshops (2018)
Google Scholar
Yu, A., Fridovich-Keil, S., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: radiance fields without neural networks (2021)
Google Scholar
Attal, B., et al.: Törf: time-of-flight radiance fields for dynamic scene view synthesis. In: NeurIPS (2021)
Google Scholar
Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-NeRF: a multiscale representation for anti-aliasing neural radiance fields. In: ICCV (2021)
Google Scholar
Brown, D.C.: Decentering distortion of lenses. Photogramm. Eng. (1966)
Google Scholar
Brown, M.A., Süsstrunk, S.: Multi-spectral sift for scene category recognition. In: CVPR (2011)
Google Scholar
Conrady, A.E.: Decentred lens-systems. Monthly Notices of the Royal Astronomical Society (1919)
Google Scholar
Dal Mutto, C., Zanuttigh, P., Cortelazzo, G.M.: Probabilistic ToF and stereo data fusion based on mixed pixels measurement models. IEEE Trans. Patt. Anal. Mach. Intell. (TPAMI) 37, 2260–2272 (2015)
Google Scholar
Deng, K., Liu, A., Zhu, J.Y., Ramanan, D.: Depth-supervised nerf: fewer views and faster training for free. arXiv preprint arXiv:2107.02791 (2021)
DiVerdi, S., Barron, J.T.: Geometric calibration for mobile, stereo, autofocus cameras. In: WACV (2016)
Google Scholar
Efe, U., Ince, K.G., Alatan, A.: Dfm: A performance baseline for deep feature matching. In: CVPR Workshops (2021)
Google Scholar
Evangelidis, G.D., Hansard, M.E., Horaud, R.: Fusion of range and stereo data for high-resolution scene-modeling. IEEE Trans. Patt. Anal. Mach. Intell. (TPAMI) 37, 2178–2192 (2015)
Google Scholar
Gao, Y., Esquivel, S., Koch, R., Keinert, J.: A novel self-calibration method for a stereo-ToF system using a Kinect V2 and two 4K GoPro cameras. In: 3DV (2017)
Google Scholar
Gil, Y., Elmalem, S., Haim, H., Marom, E., Giryes, R.: Online training of stereo self-calibration using monocular depth estimation. IEEE Trans. Comput. Imaging 7, 812–823 (2021)
Google Scholar
Guo, Q., Frosio, I., Gallo, O., Zickler, T., Kautz, J.: Tackling 3D ToF artifacts through learning and the flat dataset. In: ECCV (2018)
Google Scholar
Ha, H., Lee, J.H., Meuleman, A., Kim, M.H.: NormalFusion: real-time acquisition of surface normals for high-resolution RGB-D scanning. In: CVPR (2021)
Google Scholar
Hansard, M., Lee, S., Choi, O., Horaud, R.: Time of Flight Cameras: Principles, Methods, and Applications. Springer Briefs in Computer Science, Springer (2012). https://doi.org/10.1007/978-1-4471-4658-2
Holynski, A., Kopf, J.: Fast depth densification for occlusion-aware augmented reality. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 37, 1–11 (2018)
Google Scholar
Jeong, Y., Ahn, S., Choy, C., Anandkumar, A., Cho, M., Park, J.: Self-calibrating neural radiance fields. In: ICCV (2021)
Google Scholar
Jung, H., Brasch, N., Leonardis, A., Navab, N., Busam, B.: Wild ToFu: improving range and quality of indirect time-of-flight depth with RGB fusion in challenging environments. In: 3DV (2021)
Google Scholar
Kopf, J., et al.: One shot 3D photography. In: ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH) (2020)
Google Scholar
Kopf, J., Rong, X., Huang, J.B.: Robust consistent video depth estimation. In: CVPR (2021)
Google Scholar
Li, L.: Time-of-flight camera – an introduction (2014). https://www.ti.com/lit/wp/sloa190b/sloa190b.pdf
Li, Z., et al.: Learning the depths of moving people by watching frozen people. In: CVPR (2019)
Google Scholar
Lin, C.H., Ma, W.C., Torralba, A., Lucey, S.: BARF: bundle-adjusting neural radiance fields. In: ICCV (2021)
Google Scholar
Lipson, L., Teed, Z., Deng, J.: RAFT-Stereo: multilevel recurrent field transforms for stereo matching. In: 3DV (2021)
Google Scholar
Lombardi, S., Simon, T., Saragih, J., Schwartz, G., Lehrmann, A., Sheikh, Y.: Neural volumes: learning dynamic renderable volumes from images. ACM Trans. Graph. (2019)
Google Scholar
Lombardi, S., Simon, T., Schwartz, G., Zollhoefer, M., Sheikh, Y., Saragih, J.: Mixture of volumetric primitives for efficient neural rendering. ACM Trans. Graph. (2021)
Google Scholar
Luo, X., Huang, J., Szeliski, R., Matzen, K., Kopf, J.: Consistent video depth estimation. ACM Trans. Graph. (Proceedings of ACM SIGGRAPH) (2020)
Google Scholar
Marco, J., et al.: DeepToF: off-the-shelf real-time correction of multipath interference in time-of-flight imaging. ACM Trans. Graph. 36, 1–12 (2017)
Google Scholar
Marin, G., Zanuttigh, P., Mattoccia, S.: Reliable fusion of ToF and stereo depth driven by confidence measures. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 386–401. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_24
Chapter Google Scholar
Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: CVPR (2015)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)
Google Scholar
Pham, F.: Fusione di dati stereo e time-of-flight mediante tecniche di deep learning (2019). https://github.com/frankplus/tof-stereo-fusion
Poggi, M., Agresti, G., Tosi, F., Zanuttigh, P., Mattoccia, S.: Confidence estimation for ToF and stereo sensors and its application to depth data fusion. IEEE Sens. J. 20, 1411–1421(2020)
Google Scholar
Qiu, D., Pang, J., Sun, W., Yang, C.: Deep end-to-end alignment and refinement for time-of-flight RGB-D modules. In: ICCV (2019)
Google Scholar
Gao, R., Fan, N., Li, C., Liu, W., Chen, Q.: Joint depth and normal estimation from real-world time-of-flight raw data. In: IROS (2021)
Google Scholar
Sachs, D., Nasiri, S., Goehl, D.: Image stabilization technology overview. InvenSense Whitepaper (2006)
Google Scholar
Scharstein, D., et al.: High-resolution stereo datasets with subpixel-accurate ground truth. In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 31–42. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11752-2_3
Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)
Google Scholar
Shih, M.L., Su, S.Y., Kopf, J., Huang, J.B.: 3D photography using context-aware layered depth inpainting. In: CVPR (2020)
Google Scholar
Son, K., Liu, M.Y., Taguchi, Y.: Learning to remove multipath distortions in time-of-flight range images for a robotic arm setup. In: ICRA (2016)
Google Scholar
Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-d: A RGB-d scene understanding benchmark suite. In: CVPR (2015)
Google Scholar
Su, S., Heide, F., Wetzstein, G., Heidrich, W.: Deep end-to-end time-of-flight imaging. In: CVPR (2018)
Google Scholar
Teed, Z., Deng, J.: RAFT: recurrent all-pairs field transforms for optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24
Chapter Google Scholar
Valentin, J., et al.: Depth from motion for smartphone ar. SIGGRAPH Asia (2018)
Google Scholar
Wadhwa, N., et al.: Synthetic depth-of-field with a single-camera mobile phone. SIGGRAPH (2018)
Google Scholar
Wang, J., Qiu, K.F., Chao, P.: Control design and digital implementation of a fast 2-degree-of-freedom translational optical image stabilizer for image sensors in mobile camera phones. Sensors 17, 2333 (2017)
Google Scholar
Wei, Y., Liu, S., Rao, Y., Zhao, W., Lu, J., Zhou, J.: NerfingMVS: guided optimization of neural radiance fields for indoor multi-view stereo. In: ICCV (2021)
Google Scholar
Xian, W., Huang, J.B., Kopf, J., Kim, C.: Space-time neural irradiance fields for free-viewpoint video. In: CVPR (2021)
Google Scholar
Zhang, X., Matzen, K., Nguyen, V., Yao, D., Zhang, Y., Ng, R.: Synthetic defocus and look-ahead autofocus for casual videography. SIGGRAPH (2019)
Google Scholar
Zhou, Y., Barnes, C., Jingwan, L., Jimei, Y., Hao, L.: On the continuity of rotation representations in neural networks. In: CVPR (2019)
Google Scholar
Zhu, R., Yu, D., Ji, S., Lu, M.: Matching RGB and infrared remote sensing images with densely-connected convolutional neural networks. Remote Sens. 11, 2836 (2019)
Google Scholar

Download references

Acknowledgement

Min H. Kim acknowledges funding from Samsung Electronics, in addition to the partial support of the MSIT/IITP of Korea (RS-2022-00155620, 2022-0-00058, and 2017-0-00072), the NIRCH of Korea (2021A02P02-001), Microsoft Research Asia, and the Samsung Research Funding Center (SRFC-IT2001-04) for developing 3D imaging algorithms. James Tompkin thanks US NSF CAREER-2144956.

Author information

Authors and Affiliations

KAIST, Daejeon, South Korea
Andreas Meuleman, Hakyeong Kim & Min H. Kim
Brown University, Providence, USA
James Tompkin

Authors

Andreas Meuleman
View author publications
You can also search for this author in PubMed Google Scholar
Hakyeong Kim
View author publications
You can also search for this author in PubMed Google Scholar
James Tompkin
View author publications
You can also search for this author in PubMed Google Scholar
Min H. Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Min H. Kim .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 14637 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Meuleman, A., Kim, H., Tompkin, J., Kim, M.H. (2022). FloatingFusion: Depth from ToF and Image-Stabilized Stereo Cameras. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13661. Springer, Cham. https://doi.org/10.1007/978-3-031-19769-7_35

Download citation

DOI: https://doi.org/10.1007/978-3-031-19769-7_35
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19768-0
Online ISBN: 978-3-031-19769-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

FloatingFusion: Depth from ToF and Image-Stabilized Stereo Cameras