6D Camera Relocalization in Ambiguous Scenes via Continuous Multimodal Inference

Bui, Mai; Birdal, Tolga; Deng, Haowen; Albarqouni, Shadi; Guibas, Leonidas; Ilic, Slobodan; Navab, Nassir

doi:10.1007/978-3-030-58523-5_9

Mai Bui¹²,
Tolga Birdal¹³,
Haowen Deng^12,14,
Shadi Albarqouni^12,15,
Leonidas Guibas¹³,
Slobodan Ilic^12,14 &
…
Nassir Navab^12,16

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12363))

Included in the following conference series:

European Conference on Computer Vision

3343 Accesses
6 Citations

Abstract

We present a multimodal camera relocalization framework that captures ambiguities and uncertainties with continuous mixture models defined on the manifold of camera poses. In highly ambiguous environments, which can easily arise due to symmetries and repetitive structures in the scene, computing one plausible solution (what most state-of-the-art methods currently regress) may not be sufficient. Instead we predict multiple camera pose hypotheses as well as the respective uncertainty for each prediction. Towards this aim, we use Bingham distributions, to model the orientation of the camera pose, and a multivariate Gaussian to model the position, with an end-to-end deep neural network. By incorporating a Winner-Takes-All training scheme, we finally obtain a mixture model that is well suited for explaining ambiguities in the scene, yet does not suffer from mode collapse, a common problem with mixture density networks. We introduce a new dataset specifically designed to foster camera localization research in ambiguous environments and exhaustively evaluate our method on synthetic as well as real data on both ambiguous scenes and on non-ambiguous benchmark datasets. We plan to release our code and dataset under multimodal3dvision.github.io.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://3dwarehouse.sketchup.com/.

References

Arun Srivatsan, R., Xu, M., Zevallos, N., Choset, H.: Probabilistic pose estimation using a Bingham distribution-based linear filter. Int. J. Robot. Res. 37(13–14), 1610–1631 (2018)
Article Google Scholar
Barfoot, T.D., Furgale, P.T.: Associating uncertainty with three-dimensional poses for use in estimation problems. IEEE Trans. Robot. 30(3), 679–693 (2014)
Article Google Scholar
Bingham, C.: An antipodally symmetric distribution on the sphere. Ann. Stat. 1201–1225 (1974)
Google Scholar
Birdal, T., Arbel, M., Şimşekli, U., Guibas, L.: Synchronizing probability measures on rotations via optimal transport. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Birdal, T., Bala, E., Eren, T., Ilic, S.: Online inspection of 3D parts via a locally overlapping camera network. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–10. IEEE (2016)
Google Scholar
Birdal, T., Simsekli, U.: Probabilistic permutation synchronization using the Riemannian structure of the Birkhoff polytope. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11105–11116 (2019)
Google Scholar
Birdal, T., Simsekli, U., Eken, M.O., Ilic, S.: Bayesian pose graph optimization via Bingham distributions and tempered geodesic MCMC. In: Advances in Neural Information Processing Systems, pp. 308–319 (2018)
Google Scholar
Bishop, C.M.: Mixture density networks (1994)
Google Scholar
Bourmaud, G., Mégret, R., Arnaudon, M., Giremus, A.: Continuous-discrete extended Kalman filter on matrix lie groups using concentrated Gaussian distributions. Jo. Math. Imaging Vis. 51(1), 209–228 (2015)
Article MathSciNet Google Scholar
Brachmann, E., et al.: DSAC-differentiable RANSAC for camera localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
Brachmann, E., Michel, F., Krull, A., Ying Yang, M., Gumhold, S., et al.: Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3364–3372 (2016)
Google Scholar
Brachmann, E., Rother, C.: Learning less is more-6D camera localization via 3D surface regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4654–4662 (2018)
Google Scholar
Brahmbhatt, S., Gu, J., Kim, K., Hays, J., Kautz, J.: Geometry-aware learning of maps for camera localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2616–2625 (2018)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Bui, M., Albarqouni, S., Ilic, S., Navab, N.: Scene coordinate and correspondence learning for image-based localization. In: British Machine Vision Conference (BMVC) (2018)
Google Scholar
Busam, B., Birdal, T., Navab, N.: Camera pose filtering with local regression geodesics on the Riemannian manifold of dual quaternions. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2436–2445 (2017)
Google Scholar
Cadena, C., et al.: Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans. Robot. 32(6), 1309–1332 (2016)
Article Google Scholar
Clark, R., Wang, S., Markham, A., Trigoni, N., Wen, H.: VidLoc: a deep spatio-temporal model for 6-DoF video-clip relocalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
Corona, E., Kundu, K., Fidler, S.: Pose estimation for objects with rotational symmetry. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7215–7222. IEEE (2018)
Google Scholar
Cui, H., et al.: Multimodal trajectory predictions for autonomous driving using deep convolutional networks. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 2090–2096. IEEE (2019)
Google Scholar
Deng, H., Birdal, T., Ilic, S.: 3D local features for direct pairwise registration. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Google Scholar
Durrant-Whyte, H., Bailey, T.: Simultaneous localization and mapping: part I. IEEE Robot. Autom. Mag. 13(2), 99–110 (2006)
Article Google Scholar
Falorsi, L., de Haan, P., Davidson, T.R., Forré, P.: Reparameterizing distributions on lie groups. arXiv preprint arXiv:1903.02958 (2019)
Feng, W., Tian, F.P., Zhang, Q., Sun, J.: 6D dynamic camera relocalization from single reference image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4049–4057 (2016)
Google Scholar
Firman, M., Campbell, N.D., Agapito, L., Brostow, G.J.: DiverseNet: when one right answer is not enough. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5598–5607 (2018)
Google Scholar
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Article MathSciNet Google Scholar
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016)
Google Scholar
Gilitschenski, I., Sahoo, R., Schwarting, W., Amini, A., Karaman, S., Rus, D.: Deep orientation uncertainty learning based on a Bingham loss. In: International Conference on Learning Representations (2020)
Google Scholar
Glover, J., Kaelbling, L.P.: Tracking the spin on a ping pong ball with the quaternion Bingham filter. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 4133–4140, May 2014
Google Scholar
Glover, J., Bradski, G., Rusu, R.B.: Monte Carlo pose estimation with quaternion kernels and the Bingham distribution. In: Robotics Science System (2012)
Google Scholar
Glover, J.M.: The quaternion Bingham distribution, 3D object detection, and dynamic manipulation. Ph.D. thesis, Massachusetts Institute of Technology (2014)
Google Scholar
Grassia, F.S.: Practical parameterization of rotations using the exponential map. J. Graph. Tools 3(3), 29–48 (1998)
Article Google Scholar
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1321–1330. JMLR. org (2017)
Google Scholar
Guzman-Rivera, A., Batra, D., Kohli, P.: Multiple choice learning: learning to produce multiple structured outputs. In: Advances in Neural Information Processing Systems, pp. 1799–1807 (2012)
Google Scholar
Haarbach, A., Birdal, T., Ilic, S.: Survey of higher order rigid body motion interpolation methods for keyframe animation and continuous-time trajectory estimation. In: 2018 Sixth International Conference on 3D Vision (3DV), pp. 381–389. IEEE (2018). https://doi.org/10.1109/3DV.2018.00051
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Herz, C.S.: Bessel functions of matrix argument. Ann. Math. 61(3), 474–523 (1955). http://www.jstor.org/stable/1969810
Article MathSciNet Google Scholar
Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37331-2_42
Chapter Google Scholar
Horaud, R., Conio, B., Leboulleux, O., Lacolle, B.: An analytic solution for the perspective 4-point problem. In: Proceedings CVPR 1989: IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE (1989)
Google Scholar
Kendall, A., Cipolla, R.: Modelling uncertainty in deep learning for camera relocalization. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 4762–4769. IEEE (2016)
Google Scholar
Kendall, A., Cipolla, R., et al.: Geometric loss functions for camera pose regression with deep learning. In: Proceedings of CVPR, vol. 3, p. 8 (2017)
Google Scholar
Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: Advances in Neural Information Processing Systems (2017)
Google Scholar
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DoF camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2938–2946 (2015)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Kume, A., Wood, A.T.: Saddlepoint approximations for the bingham and fisher-bingham normalising constants. Biometrika 92(2), 465–476 (2005)
Article MathSciNet Google Scholar
Kurz, G., Gilitschenski, I., Julier, S., Hanebeck, U.D.: Recursive estimation of orientation based on the Bingham distribution. In: 2013 16th International Conference on Information Fusion (FUSION), pp. 1487–1494. IEEE (2013)
Google Scholar
Kurz, G., et al.: Directional statistics and filtering using libdirectional. arXiv preprint arXiv:1712.09718 (2017)
Labbé, M., Michaud, F.: Rtab-map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation. J. Field Robot. 36(2), 416–446 (2019)
Article Google Scholar
Makansi, O., Ilg, E., Cicek, O., Brox, T.: Overcoming limitations of mixture density networks: a sampling and fitting framework for multimodal future prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7144–7153 (2019)
Google Scholar
Manhardt, F., et al.: Explaining the ambiguity of object detection and 6D pose from visual data. In: International Conference of Computer Vision. IEEE/CVF (2019)
Google Scholar
Mardia, K.V., Jupp, P.E.: Directional Statistics. Wiley, Hoboken (2009)
MATH Google Scholar
Massiceti, D., Krull, A., Brachmann, E., Rother, C., Torr, P.H.: Random forests versus neural networks–what’s best for camera localization? In: 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE (2017)
Google Scholar
Morawiec, A., Field, D.: Rodrigues parameterization for orientation and misorientation distributions. Philos. Mag. A 73(4), 1113–1130 (1996)
Article Google Scholar
Murray, R.M.: A Mathematical Introduction to Robotic Manipulation. CRC Press, Boca Raton (1994)
MATH Google Scholar
Paszke, A., et al.: Automatic differentiation in PyTorch. In: NIPS Autodiff Workshop (2017)
Google Scholar
Peretroukhin, V., Wagstaff, B., Giamou, M., Kelly, J.: Probabilistic regression of rotations using quaternion averaging and a deep multi-headed network. arXiv preprint arXiv:1904.03182 (2019)
Piasco, N., Sidibé, D., Demonceaux, C., Gouet-Brunet, V.: A survey on visual-based localization: on the benefit of heterogeneous data. Pattern Recogn. 74, 90–109 (2018)
Article Google Scholar
Pitteri, G., Ramamonjisoa, M., Ilic, S., Lepetit, V.: On object symmetries and 6D pose estimation from images. In: 3D Vision (3DV). IEEE (2019)
Google Scholar
Prokudin, S., Gehler, P., Nowozin, S.: Deep directional statistics: pose estimation with uncertainty quantification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 534–551 (2018)
Google Scholar
Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3D object detection in point clouds. In: The IEEE International Conference on Computer Vision (ICCV), October 2019
Google Scholar
Riedel, S., Marton, Z.C., Kriegel, S.: Multi-view orientation estimation using Bingham mixture models. In: 2016 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR), pp. 1–6. IEEE (2016)
Google Scholar
Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40(2), 99–121 (2000)
Article Google Scholar
Rupprecht, C., et al.: Learning in an uncertain world: representing ambiguity through multiple hypotheses. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3591–3600 (2017)
Google Scholar
Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H., Davison, A.J.: SLAM++: simultaneous localisation and mapping at the level of objects. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1352–1359 (2013)
Google Scholar
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems, pp. 2234–2242 (2016)
Google Scholar
Sattler, T., Havlena, M., Radenovic, F., Schindler, K., Pollefeys, M.: Hyperpoints and fine vocabularies for large-scale location recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2102–2110 (2015)
Google Scholar
Sattler, T., Zhou, Q., Pollefeys, M., Leal-Taixe, L.: Understanding the limitations of CNN-based absolute camera pose regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3302–3312 (2019)
Google Scholar
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in RGB-D images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2930–2937 (2013)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Google Scholar
Suvrit, S., Ley, C., Verdebout, T.: Directional statistics in machine learning: a brief review. In: Applied Directional Statistics. Chapman and Hall/CRC (2018)
Google Scholar
Ullman, S.: The interpretation of structure from motion. Proc. Roy. Soc. London. Ser. B. Biol. Sci. 203(1153), 405–426 (1979)
Google Scholar
Valentin, J., Nießner, M., Shotton, J., Fitzgibbon, A., Izadi, S., Torr, P.H.: Exploiting uncertainty in regression forests for accurate camera relocalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4400–4408 (2015)
Google Scholar
Yamaji, A.: Genetic algorithm for fitting a mixed bingham distribution to 3D orientations: a tool for the statistical and paleostress analyses of fracture orientations. Island Arc 25(1), 72–83 (2016)
Article Google Scholar
Zakharov, S., Shugurov, I., Ilic, S.: DPOD: 6D pose object detector and refiner. In: The IEEE International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Zeisl, B., Sattler, T., Pollefeys, M.: Camera pose voting for large-scale image-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2704–2712 (2015)
Google Scholar
Zolfaghari, M., Çiçek, Ö., Ali, S.M., Mahdisoltani, F., Zhang, C., Brox, T.: Learning representations for predicting future activities. arXiv:1905.03578 (2019)

Download references

Acknowledgements

This project is supported by Bavaria California Technology Center (BaCaTeC), Stanford-Ford Alliance, NSF grant IIS-1763268, Vannevar Bush Faculty Fellowship, Samsung GRO program, the Stanford SAIL Toyota Research, and the PRIME programme of the German Academic Exchange Service (DAAD) with funds from the German Federal Ministry of Education and Research (BMBF).

Author information

Authors and Affiliations

Technical University of Munich, Munich, Germany
Mai Bui, Haowen Deng, Shadi Albarqouni, Slobodan Ilic & Nassir Navab
Stanford University, Stanford, USA
Tolga Birdal & Leonidas Guibas
Siemens AG, Munich, Germany
Haowen Deng & Slobodan Ilic
ETH Zurich, Zurich, Switzerland
Shadi Albarqouni
Johns Hopkins University, Baltimore, USA
Nassir Navab

Authors

Mai Bui
View author publications
You can also search for this author in PubMed Google Scholar
Tolga Birdal
View author publications
You can also search for this author in PubMed Google Scholar
Haowen Deng
View author publications
You can also search for this author in PubMed Google Scholar
Shadi Albarqouni
View author publications
You can also search for this author in PubMed Google Scholar
Leonidas Guibas
View author publications
You can also search for this author in PubMed Google Scholar
Slobodan Ilic
View author publications
You can also search for this author in PubMed Google Scholar
Nassir Navab
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mai Bui .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4717 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bui, M. et al. (2020). 6D Camera Relocalization in Ambiguous Scenes via Continuous Multimodal Inference. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12363. Springer, Cham. https://doi.org/10.1007/978-3-030-58523-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-58523-5_9
Published: 04 December 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58522-8
Online ISBN: 978-3-030-58523-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics