Skip to main content

FloatingFusion: Depth from ToF and Image-Stabilized Stereo Cameras

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

High-accuracy per-pixel depth is vital for computational photography, so smartphones now have multimodal camera systems with time-of-flight (ToF) depth sensors and multiple color cameras. However, producing accurate high-resolution depth is still challenging due to the low resolution and limited active illumination power of ToF sensors. Fusing RGB stereo and ToF information is a promising direction to overcome these issues, but a key problem remains: to provide high-quality 2D RGB images, the main color sensor’s lens is optically stabilized, resulting in an unknown pose for the floating lens that breaks the geometric relationships between the multimodal image sensors. Leveraging ToF depth estimates and a wide-angle RGB camera, we design an automatic calibration technique based on dense 2D/3D matching that can estimate camera extrinsic, intrinsic, and distortion parameters of a stabilized main RGB sensor from a single snapshot. This lets us fuse stereo and ToF cues via a correlation volume. For fusion, we apply deep learning via a real-world training dataset with depth supervision estimated by a neural reconstruction method. For evaluation, we acquire a test dataset using a commercial high-power depth camera and show that our approach achieves higher accuracy than existing baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agresti, G., Minto, L., Marin, G., Zanuttigh, P.: Deep learning for confidence information in stereo and ToF data fusion. In: ICCV Workshops (2017)

    Google Scholar 

  2. Agresti, G., Minto, L., Marin, G., Zanuttigh, P.: Stereo and ToF data fusion by learning from synthetic data. Inf. Fus. 49, 161–173 (2019)

    Google Scholar 

  3. Agresti, G., Zanuttigh, P.: Deep learning for multi-path error removal in ToF sensors. In: ECCV Workshops (2018)

    Google Scholar 

  4. Yu, A., Fridovich-Keil, S., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: radiance fields without neural networks (2021)

    Google Scholar 

  5. Attal, B., et al.: Törf: time-of-flight radiance fields for dynamic scene view synthesis. In: NeurIPS (2021)

    Google Scholar 

  6. Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-NeRF: a multiscale representation for anti-aliasing neural radiance fields. In: ICCV (2021)

    Google Scholar 

  7. Brown, D.C.: Decentering distortion of lenses. Photogramm. Eng. (1966)

    Google Scholar 

  8. Brown, M.A., Süsstrunk, S.: Multi-spectral sift for scene category recognition. In: CVPR (2011)

    Google Scholar 

  9. Conrady, A.E.: Decentred lens-systems. Monthly Notices of the Royal Astronomical Society (1919)

    Google Scholar 

  10. Dal Mutto, C., Zanuttigh, P., Cortelazzo, G.M.: Probabilistic ToF and stereo data fusion based on mixed pixels measurement models. IEEE Trans. Patt. Anal. Mach. Intell. (TPAMI) 37, 2260–2272 (2015)

    Google Scholar 

  11. Deng, K., Liu, A., Zhu, J.Y., Ramanan, D.: Depth-supervised nerf: fewer views and faster training for free. arXiv preprint arXiv:2107.02791 (2021)

  12. DiVerdi, S., Barron, J.T.: Geometric calibration for mobile, stereo, autofocus cameras. In: WACV (2016)

    Google Scholar 

  13. Efe, U., Ince, K.G., Alatan, A.: Dfm: A performance baseline for deep feature matching. In: CVPR Workshops (2021)

    Google Scholar 

  14. Evangelidis, G.D., Hansard, M.E., Horaud, R.: Fusion of range and stereo data for high-resolution scene-modeling. IEEE Trans. Patt. Anal. Mach. Intell. (TPAMI) 37, 2178–2192 (2015)

    Google Scholar 

  15. Gao, Y., Esquivel, S., Koch, R., Keinert, J.: A novel self-calibration method for a stereo-ToF system using a Kinect V2 and two 4K GoPro cameras. In: 3DV (2017)

    Google Scholar 

  16. Gil, Y., Elmalem, S., Haim, H., Marom, E., Giryes, R.: Online training of stereo self-calibration using monocular depth estimation. IEEE Trans. Comput. Imaging 7, 812–823 (2021)

    Google Scholar 

  17. Guo, Q., Frosio, I., Gallo, O., Zickler, T., Kautz, J.: Tackling 3D ToF artifacts through learning and the flat dataset. In: ECCV (2018)

    Google Scholar 

  18. Ha, H., Lee, J.H., Meuleman, A., Kim, M.H.: NormalFusion: real-time acquisition of surface normals for high-resolution RGB-D scanning. In: CVPR (2021)

    Google Scholar 

  19. Hansard, M., Lee, S., Choi, O., Horaud, R.: Time of Flight Cameras: Principles, Methods, and Applications. Springer Briefs in Computer Science, Springer (2012). https://doi.org/10.1007/978-1-4471-4658-2

  20. Holynski, A., Kopf, J.: Fast depth densification for occlusion-aware augmented reality. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 37, 1–11 (2018)

    Google Scholar 

  21. Jeong, Y., Ahn, S., Choy, C., Anandkumar, A., Cho, M., Park, J.: Self-calibrating neural radiance fields. In: ICCV (2021)

    Google Scholar 

  22. Jung, H., Brasch, N., Leonardis, A., Navab, N., Busam, B.: Wild ToFu: improving range and quality of indirect time-of-flight depth with RGB fusion in challenging environments. In: 3DV (2021)

    Google Scholar 

  23. Kopf, J., et al.: One shot 3D photography. In: ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH) (2020)

    Google Scholar 

  24. Kopf, J., Rong, X., Huang, J.B.: Robust consistent video depth estimation. In: CVPR (2021)

    Google Scholar 

  25. Li, L.: Time-of-flight camera – an introduction (2014). https://www.ti.com/lit/wp/sloa190b/sloa190b.pdf

  26. Li, Z., et al.: Learning the depths of moving people by watching frozen people. In: CVPR (2019)

    Google Scholar 

  27. Lin, C.H., Ma, W.C., Torralba, A., Lucey, S.: BARF: bundle-adjusting neural radiance fields. In: ICCV (2021)

    Google Scholar 

  28. Lipson, L., Teed, Z., Deng, J.: RAFT-Stereo: multilevel recurrent field transforms for stereo matching. In: 3DV (2021)

    Google Scholar 

  29. Lombardi, S., Simon, T., Saragih, J., Schwartz, G., Lehrmann, A., Sheikh, Y.: Neural volumes: learning dynamic renderable volumes from images. ACM Trans. Graph. (2019)

    Google Scholar 

  30. Lombardi, S., Simon, T., Schwartz, G., Zollhoefer, M., Sheikh, Y., Saragih, J.: Mixture of volumetric primitives for efficient neural rendering. ACM Trans. Graph. (2021)

    Google Scholar 

  31. Luo, X., Huang, J., Szeliski, R., Matzen, K., Kopf, J.: Consistent video depth estimation. ACM Trans. Graph. (Proceedings of ACM SIGGRAPH) (2020)

    Google Scholar 

  32. Marco, J., et al.: DeepToF: off-the-shelf real-time correction of multipath interference in time-of-flight imaging. ACM Trans. Graph. 36, 1–12 (2017)

    Google Scholar 

  33. Marin, G., Zanuttigh, P., Mattoccia, S.: Reliable fusion of ToF and stereo depth driven by confidence measures. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 386–401. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_24

    Chapter  Google Scholar 

  34. Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: CVPR (2015)

    Google Scholar 

  35. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)

    Google Scholar 

  36. Pham, F.: Fusione di dati stereo e time-of-flight mediante tecniche di deep learning (2019). https://github.com/frankplus/tof-stereo-fusion

  37. Poggi, M., Agresti, G., Tosi, F., Zanuttigh, P., Mattoccia, S.: Confidence estimation for ToF and stereo sensors and its application to depth data fusion. IEEE Sens. J. 20, 1411–1421(2020)

    Google Scholar 

  38. Qiu, D., Pang, J., Sun, W., Yang, C.: Deep end-to-end alignment and refinement for time-of-flight RGB-D modules. In: ICCV (2019)

    Google Scholar 

  39. Gao, R., Fan, N., Li, C., Liu, W., Chen, Q.: Joint depth and normal estimation from real-world time-of-flight raw data. In: IROS (2021)

    Google Scholar 

  40. Sachs, D., Nasiri, S., Goehl, D.: Image stabilization technology overview. InvenSense Whitepaper (2006)

    Google Scholar 

  41. Scharstein, D., et al.: High-resolution stereo datasets with subpixel-accurate ground truth. In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 31–42. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11752-2_3

  42. Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)

    Google Scholar 

  43. Shih, M.L., Su, S.Y., Kopf, J., Huang, J.B.: 3D photography using context-aware layered depth inpainting. In: CVPR (2020)

    Google Scholar 

  44. Son, K., Liu, M.Y., Taguchi, Y.: Learning to remove multipath distortions in time-of-flight range images for a robotic arm setup. In: ICRA (2016)

    Google Scholar 

  45. Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-d: A RGB-d scene understanding benchmark suite. In: CVPR (2015)

    Google Scholar 

  46. Su, S., Heide, F., Wetzstein, G., Heidrich, W.: Deep end-to-end time-of-flight imaging. In: CVPR (2018)

    Google Scholar 

  47. Teed, Z., Deng, J.: RAFT: recurrent all-pairs field transforms for optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24

    Chapter  Google Scholar 

  48. Valentin, J., et al.: Depth from motion for smartphone ar. SIGGRAPH Asia (2018)

    Google Scholar 

  49. Wadhwa, N., et al.: Synthetic depth-of-field with a single-camera mobile phone. SIGGRAPH (2018)

    Google Scholar 

  50. Wang, J., Qiu, K.F., Chao, P.: Control design and digital implementation of a fast 2-degree-of-freedom translational optical image stabilizer for image sensors in mobile camera phones. Sensors 17, 2333 (2017)

    Google Scholar 

  51. Wei, Y., Liu, S., Rao, Y., Zhao, W., Lu, J., Zhou, J.: NerfingMVS: guided optimization of neural radiance fields for indoor multi-view stereo. In: ICCV (2021)

    Google Scholar 

  52. Xian, W., Huang, J.B., Kopf, J., Kim, C.: Space-time neural irradiance fields for free-viewpoint video. In: CVPR (2021)

    Google Scholar 

  53. Zhang, X., Matzen, K., Nguyen, V., Yao, D., Zhang, Y., Ng, R.: Synthetic defocus and look-ahead autofocus for casual videography. SIGGRAPH (2019)

    Google Scholar 

  54. Zhou, Y., Barnes, C., Jingwan, L., Jimei, Y., Hao, L.: On the continuity of rotation representations in neural networks. In: CVPR (2019)

    Google Scholar 

  55. Zhu, R., Yu, D., Ji, S., Lu, M.: Matching RGB and infrared remote sensing images with densely-connected convolutional neural networks. Remote Sens. 11, 2836 (2019)

    Google Scholar 

Download references

Acknowledgement

Min H. Kim acknowledges funding from Samsung Electronics, in addition to the partial support of the MSIT/IITP of Korea (RS-2022-00155620, 2022-0-00058, and 2017-0-00072), the NIRCH of Korea (2021A02P02-001), Microsoft Research Asia, and the Samsung Research Funding Center (SRFC-IT2001-04) for developing 3D imaging algorithms. James Tompkin thanks US NSF CAREER-2144956.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Min H. Kim .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 14637 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Meuleman, A., Kim, H., Tompkin, J., Kim, M.H. (2022). FloatingFusion: Depth from ToF and Image-Stabilized Stereo Cameras. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13661. Springer, Cham. https://doi.org/10.1007/978-3-031-19769-7_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19769-7_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19768-0

  • Online ISBN: 978-3-031-19769-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics