Skip to main content

Camera Pose Estimation and Localization with Active Audio Sensing

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13697))

Included in the following conference series:

Abstract

In this work, we show how to estimate a device’s position and orientation indoors by echolocation, i.e., by interpreting the echoes of an audio signal that the device itself emits. Established visual localization methods rely on the device’s camera and yield excellent accuracy if unique visual features are in view and depicted clearly. We argue that audio sensing can offer complementary information to vision for device localization, since audio is invariant to adverse visual conditions and can reveal scene information beyond a camera’s field of view. We first propose a strategy for learning an audio representation that captures the scene geometry around a device using supervision transfer from vision. Subsequently, we leverage this audio representation to complement vision in three device localization tasks: relative pose estimation, place recognition, and absolute pose regression. Our proposed methods outperform state-of-the-art vision models on new audio-visual benchmarks for the Replica and Matterport3D datasets.

K. Yang and C. Godard—Work done while at Niantic, during Karren’s internship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: CVPR (2016)

    Google Scholar 

  2. Arandjelovic, R., Zisserman, A.: Look, listen and learn. In: ICCV (2017)

    Google Scholar 

  3. Babenko, A., Lempitsky, V.: Aggregating local deep features for image retrieval. In: ICCV (2015)

    Google Scholar 

  4. Balntas, V., Li, S., Prisacariu, V.: RelocNet: continuous metric learning relocalisation using neural nets. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 782–799. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_46

    Chapter  Google Scholar 

  5. Balntas, V., Riba, E., Ponsa, D., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: BMVC (2016)

    Google Scholar 

  6. Bhowmik, A., Gumhold, S., Rother, C., Brachmann, E.: Reinforced feature points: optimizing feature detection and description for a high-level task. In: CVPR, June 2020

    Google Scholar 

  7. Brachmann, E., et al.: DSAC - differentiable RANSAC for camera localization. In: CVPR (2017)

    Google Scholar 

  8. Brachmann, E., Michel, F., Krull, A., Yang, M.Y., Gumhold, S., Rother, C.: Uncertainty-driven 6d pose estimation of objects and scenes from a single RGB image. In: CVPR (2016)

    Google Scholar 

  9. Brachmann, E., Rother, C.: Learning less is more - 6D camera localization via 3D surface regression. In: CVPR (2018)

    Google Scholar 

  10. Brachmann, E., Rother, C.: Expert sample consensus applied to camera re-localization. In: ICCV (2019)

    Google Scholar 

  11. Brachmann, E., Rother, C.: Neural-guided RANSAC: learning where to sample model hypotheses. In: ICCV (2019)

    Google Scholar 

  12. Brachmann, E., Rother, C.: Visual camera re-localization from RGB and RGB-D images using DSAC. TPAMI (2021)

    Google Scholar 

  13. Brahmbhatt, S., Gu, J., Kim, K., Hays, J., Kautz, J.: Geometry-aware learning of maps for camera localization. In: CVPR (2018)

    Google Scholar 

  14. Bui, M., et al.: 6D camera relocalization in ambiguous scenes via continuous multimodal inference. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 139–157. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_9

    Chapter  Google Scholar 

  15. Cai, R., Hariharan, B., Snavely, N., Averbuch-Elor, H.: Extreme rotation estimation using dense correlation volumes. In: CVPR (2021)

    Google Scholar 

  16. Castle, R., Klein, G., Murray, D.W.: Video-rate localization in multiple maps for wearable augmented reality. In: 2008 12th IEEE International Symposium on Wearable Computers, pp. 15–22. IEEE (2008)

    Google Scholar 

  17. Chang, A., et al.: Matterport3D: learning from RGB-D data in indoor environments. In: 3DV (2017)

    Google Scholar 

  18. Chen, C., Al-Halah, Z., Grauman, K.: Semantic audio-visual navigation. In: CVPR (2021)

    Google Scholar 

  19. Chen, C., et al.: Audio-visual embodied navigation. Environment 97, 103 (2019)

    Google Scholar 

  20. Chen, C., et al.: SoundSpaces: audio-visual navigation in 3D environments. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12351, pp. 17–36. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_2

    Chapter  Google Scholar 

  21. Chen, C., Majumder, S., Al-Halah, Z., Gao, R., Ramakrishnan, S.K., Grauman, K.: Learning to set waypoints for audio-visual navigation. arXiv preprint arXiv:2008.09622 (2020)

  22. Chen, K., Snavely, N., Makadia, A.: Wide-baseline relative camera pose estimation with directional learning. In: CVPR (2021)

    Google Scholar 

  23. Chen, Z., Hu, X., Owens, A.: Structure from silence: learning scene structure from ambient sound. arXiv preprint arXiv:2111.05846 (2021)

  24. Christensen, J.H., Hornauer, S., Stella, X.Y.: Batvision: learning to see 3D spatial layout with two ears. In: ICRA (2020)

    Google Scholar 

  25. Debski, A., Grajewski, W., Zaborowski, W., Turek, W.: Open-source localization device for indoor mobile robots. Procedia Comput. Sci. 76, 139–146 (2015)

    Article  Google Scholar 

  26. Dokmanić, I., Parhizkar, R., Walther, A., Lu, Y.M., Vetterli, M.: Acoustic echoes reveal room shape. Proc. Natl. Acad. Sci. 110(30), 12186–12191 (2013)

    Article  Google Scholar 

  27. Dusmanu, M., et al.: D2-net: a trainable CNN for joint detection and description of local features. arXiv preprint arXiv:1905.03561 (2019)

  28. Eliakim, I., Cohen, Z., Kosa, G., Yovel, Y.: A fully autonomous terrestrial bat-like acoustic robot. PLoS Comput. Biol. 14(9), e1006406 (2018)

    Article  Google Scholar 

  29. En, S., Lechervy, A., Jurie, F.: RPNet: an end-to-end network for relative camera pose estimation. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11129, pp. 738–745. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11009-3_46

    Chapter  Google Scholar 

  30. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  31. Fisher III, J.W., Darrell, T., Freeman, W., Viola, P.: Learning joint statistical models for audio-visual fusion and segregation. In: NeurIPS (2000)

    Google Scholar 

  32. Gan, C., Zhao, H., Chen, P., Cox, D., Torralba, A.: Self-supervised moving vehicle tracking with stereo sound. In: ICCV (2019)

    Google Scholar 

  33. Gao, R., Chen, C., Al-Halah, Z., Schissler, C., Grauman, K.: VisualEchoes: spatial image representation learning through echolocation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 658–676. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_38

    Chapter  Google Scholar 

  34. Gao, R., Grauman, K.: 2.5D visual sound. In: CVPR (2019)

    Google Scholar 

  35. Gao, R., Oh, T.H., Grauman, K., Torresani, L.: Listen to look: action recognition by previewing audio. In: CVPR (2020)

    Google Scholar 

  36. Garg, S., Fischer, T., Milford, M.: Where is your place, visual place recognition? IJCAI (2021)

    Google Scholar 

  37. Greene, N.: Environment mapping and other applications of world projections. IEEE Comput. Graphics Appl. 6(11), 21–29 (1986)

    Article  Google Scholar 

  38. Hausler, S., Garg, S., Xu, M., Milford, M., Fischer, T.: Patch-NetVLAD: multi-scale fusion of locally-global descriptors for place recognition. In: CVPR (2021)

    Google Scholar 

  39. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  40. Hershey, J., Movellan, J.: Audio vision: using audio-visual synchrony to locate sounds. In: NeurIPS (1999)

    Google Scholar 

  41. Hu, J., Ozay, M., Zhang, Y., Okatani, T.: Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries. In: WACV (2019)

    Google Scholar 

  42. Humenberger, M., et al.: Robust image retrieval-based visual localization using Kapture. arXiv:2007.13867 (2020)

  43. Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: CVPR (2010)

    Google Scholar 

  44. Kazakos, E., Nagrani, A., Zisserman, A., Damen, D.: Epic-fusion: audio-visual temporal binding for egocentric action recognition. In: ICCV (2019)

    Google Scholar 

  45. Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: CVPR (2017)

    Google Scholar 

  46. Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: ICCV (2015)

    Google Scholar 

  47. Kidron, E., Schechner, Y.Y., Elad, M.: Pixels that sound. In: CVPR (2005)

    Google Scholar 

  48. Laskar, Z., Melekhov, I., Kalia, S., Kannala, J.: Camera relocalization by computing pairwise relative poses using convolutional neural network. In: ICCV Workshops (2017)

    Google Scholar 

  49. Li, X., Wang, S., Zhao, Y., Verbeek, J., Kannala, J.: Hierarchical scene coordinate classification and regression for visual localization. In: CVPR (2020)

    Google Scholar 

  50. Li, Y., Snavely, N., Huttenlocher, D.P.: Location recognition using prioritized feature matching. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6312, pp. 791–804. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15552-9_57

    Chapter  Google Scholar 

  51. Li, Y., Snavely, N., Huttenlocher, D., Fua, P.: Worldwide pose estimation using 3D point clouds. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 15–29. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33718-5_2

    Chapter  Google Scholar 

  52. Lim, H., Sinha, S.N., Cohen, M.F., Uyttendaele, M.: Real-time image-based 6-dof localization in large-scale environments. In: CVPR (2012)

    Google Scholar 

  53. Lindell, D.B., Wetzstein, G., Koltun, V.: Acoustic non-line-of-sight imaging. In: CVPR (2019)

    Google Scholar 

  54. Liu, D., Cui, Y., Yan, L., Mousas, C., Yang, B., Chen, Y.: Densernet: weakly supervised visual localization using multi-scale feature aggregation. In: Proceedings of the AAAI Conference on Artificial Intelligence (2021)

    Google Scholar 

  55. Long, X., Gan, C., De Melo, G., Wu, J., Liu, X., Wen, S.: Attention clusters: purely attention based local feature integration for video classification. In: CVPR (2018)

    Google Scholar 

  56. Masone, C., Caputo, B.: A survey on deep visual place recognition. IEEE Access 9, 19516–19547 (2021)

    Article  Google Scholar 

  57. Melekhov, I., Ylioinas, J., Kannala, J., Rahtu, E.: Relative camera pose estimation using convolutional neural networks. In: Blanc-Talon, J., Penne, R., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2017. LNCS, vol. 10617, pp. 675–687. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70353-4_57

    Chapter  Google Scholar 

  58. Morgado, P., Li, Y., Nvasconcelos, N.: Learning representations from audio-visual spatial alignment. In: NeurIPS, vol. 33, 4733–4744 (2020)

    Google Scholar 

  59. Morgado, P., Nvasconcelos, N., Langlois, T., Wang, O.: Self-supervised generation of spatial audio for 360 video. In: NeurIPS, vol. 31 (2018)

    Google Scholar 

  60. Morgado, P., Vasconcelos, N., Misra, I.: Audio-visual instance discrimination with cross-modal agreement. In: CVPR (2021)

    Google Scholar 

  61. Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: ICML (2011)

    Google Scholar 

  62. Owens, A., Efros, A.A.: Audio-visual scene analysis with self-supervised multisensory features. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 639–658. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_39

    Chapter  Google Scholar 

  63. Owens, A., Isola, P., McDermott, J., Torralba, A., Adelson, E.H., Freeman, W.T.: Visually indicated sounds. In: CVPR (2016)

    Google Scholar 

  64. Owens, A., Wu, J., McDermott, J.H., Freeman, W.T., Torralba, A.: Ambient sound provides supervision for visual learning. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 801–816. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_48

    Chapter  Google Scholar 

  65. Parida, K.K., Srivastava, S., Sharma, G.: Beyond image to depth: improving depth prediction using echoes. In: CVPR (2021)

    Google Scholar 

  66. Politis, A., Mesaros, A., Adavanne, S., Heittola, T., Virtanen, T.: Overview and evaluation of sound event localization and detection in dcase 2019. IEEE/ACM Trans. Audio Speech Language Process. 29, 684–698 (2020)

    Article  Google Scholar 

  67. Poursaeed, O., et al.: Deep fundamental matrix estimation without correspondences. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11131, pp. 485–497. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11015-4_35

    Chapter  Google Scholar 

  68. Purushwalkam, S., et al.: Audio-visual floorplan reconstruction. In: ICCV (2021)

    Google Scholar 

  69. Raguram, R., Frahm, J.-M., Pollefeys, M.: A comparative analysis of RANSAC techniques leading to adaptive real-time random sample consensus. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 500–513. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88688-4_37

    Chapter  Google Scholar 

  70. Ranftl, R., Koltun, V.: Deep fundamental matrix estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 292–309. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_18

    Chapter  Google Scholar 

  71. Revaud, J., Weinzaepfel, P., de Souza, C.R., Humenberger, M.: R2D2: repeatable and reliable detector and descriptor. In: NeurIPS (2019)

    Google Scholar 

  72. de Sa, V.R.: Learning classification with unlabeled data. In: NeurIPS (1994)

    Google Scholar 

  73. Sarlin, P.E., Cadena, C., Siegwart, R., Dymczyk, M.: From coarse to fine: robust hierarchical localization at large scale. In: CVPR (2019)

    Google Scholar 

  74. Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: learning feature matching with graph neural networks. In: CVPR (2020)

    Google Scholar 

  75. Sarlin, P.E., et al.: Back to the feature: learning robust camera localization from pixels to pose. In: CVPR (2021). arxiv.org/abs/2103.09213

  76. Sattler, T., Havlena, M., Radenovic, F., Schindler, K., Pollefeys, M.: Hyperpoints and fine vocabularies for large-scale location recognition. In: ICCV (2015)

    Google Scholar 

  77. Sattler, T., Leibe, B., Kobbelt, L.: Improving image-based localization by active correspondence search. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 752–765. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33718-5_54

  78. Sattler, T., Leibe, B., Kobbelt, L.: Efficient & effective prioritized matching for large-scale image-based localization. In: PAMI (2017)

    Google Scholar 

  79. Sattler, T., et al..: Are large-scale 3D models really necessary for accurate visual localization? In: CVPR (2017)

    Google Scholar 

  80. Savva, M., et al.: Habitat: a platform for embodied AI research. In: ICCV (2019)

    Google Scholar 

  81. Shavit, Y., Ferens, R., Keller, Y.: Learning multi-scene absolute pose regression with transformers. In: ICCV, pp. 2733–2742, October 2021

    Google Scholar 

  82. Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in RGB-D images. In: CVPR (2013)

    Google Scholar 

  83. Singh, N., Mentch, J., Ng, J., Beveridge, M., Drori, I.: Image2reverb: cross-modal reverb impulse response synthesis. In: ICCV (2021)

    Google Scholar 

  84. Sohl-Dickstein, J., et al.: A device for human ultrasonic echolocation. IEEE Trans. Biomed. Eng. 62(6), 1526–1534 (2015)

    Article  Google Scholar 

  85. Straub, J., et al.: The replica dataset: a digital replica of indoor spaces. arXiv preprint arXiv:1906.05797 (2019)

  86. Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: LoFTR: detector-free local feature matching with transformers. In: CVPR (2021)

    Google Scholar 

  87. Sun, W., Jiang, W., Trulls, E., Tagliasacchi, A., Yi, K.M.: ACNe: attentive context normalization for robust permutation-equivariant learning. In: CVPR, June 2020

    Google Scholar 

  88. Svarm, L., Enqvist, O., Oskarsson, M., Kahl, F.: Accurate localization and pose estimation for large 3D models. In: CVPR (2014)

    Google Scholar 

  89. Svärm, L., Enqvist, O., Kahl, F., Oskarsson, M.: City-scale localization for cameras with known vertical direction. TPAMI (2017)

    Google Scholar 

  90. Taira, H., et aal.: InLoc: indoor visual localization with dense matching and view synthesis. In: CVPR (2018)

    Google Scholar 

  91. Taira, H., et al.: InLoc: indoor visual localization with dense matching and view synthesis. In: TPAMI (2021)

    Google Scholar 

  92. Taubner, F., Tschopp, F., Novkovic, T., Siegwart, R., Furrer, F.: LCD-line clustering and description for place recognition. In: 2020 International Conference on 3D Vision (3DV) (2020)

    Google Scholar 

  93. Thrun, S.: Affine structure from sound. In: NeurIPS (2005)

    Google Scholar 

  94. Torii, A., Arandjelovic, R., Sivic, J., Okutomi, M., Pajdla, T.: 24/7 place recognition by view synthesis. In: CVPR (2015)

    Google Scholar 

  95. Türkoğlu, M.Ö., Brachmann, E., Schindler, K., Brostow, G., Monszpart, A.: Visual camera re-localization using graph neural networks and relative pose supervision. In: 3DV. IEEE (2021)

    Google Scholar 

  96. Tyszkiewicz, M., Fua, P., Trulls, E.: Disk: learning local features with policy gradient. In: NeurIPS (2020)

    Google Scholar 

  97. Valentin, J., Nießner, M., Shotton, J., Fitzgibbon, A., Izadi, S., Torr, P.: Exploiting uncertainty in regression forests for accurate camera relocalization. In: CVPR (2015)

    Google Scholar 

  98. Vasudevan, A.B., Dai, D., Van Gool, L.: Semantic object prediction and spatial sound super-resolution with binaural sounds. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 638–655. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_37

    Chapter  Google Scholar 

  99. Vaswani, A., et al.: Attention is all you need. In: NeurIPS, vol. 30 (2017)

    Google Scholar 

  100. Villalpando, A.P., Schillaci, G., Hafner, V.V., Guzmán, B.L.: Ego-noise predictions for echolocation in wheeled robots. In: ALIFE 2019: The 2019 Conference on Artificial Life, pp. 567–573. MIT Press (2019)

    Google Scholar 

  101. Wagner, D., Reitmayr, G., Mulloni, A., Drummond, T., Schmalstieg, D.: Real-time detection and tracking for augmented reality on mobile phones. IEEE Trans. Visual Comput. Graphics 16(3), 355–368 (2009)

    Article  Google Scholar 

  102. Walch, F., Hazirbas, C., Leal-Taixé, L., Sattler, T., Hilsenbeck, S., Cremers, D.: Image-Based Localization Using LSTMs for Structured Feature Correlation. In: ICCV (2017)

    Google Scholar 

  103. Wang, W., Tran, D., Feiszli, M.: What makes training multi-modal classification networks hard? In: CVPR (2020)

    Google Scholar 

  104. Winkelbauer, D., Denninger, M., Triebel, R.: Learning to localize in new environments from synthetic training data. In: ICRA (2021)

    Google Scholar 

  105. Wu, Z., Jiang, Y.G., Wang, X., Ye, H., Xue, X.: Multi-stream multi-class fusion of deep networks for video classification. In: Proceedings of the 24th ACM international conference on Multimedia, pp. 791–800 (2016)

    Google Scholar 

  106. Yang, K., Lin, W.Y., Barman, M., Condessa, F., Kolter, Z.: Defending multimodal fusion models against single-source adversaries. In: CVPR (2021)

    Google Scholar 

  107. Yang, K., Russell, B., Salamon, J.: Telling left from right: learning spatial correspondence of sight and sound. In: CVPR (2020)

    Google Scholar 

  108. Yi, K.M., Trulls, E., Ono, Y., Lepetit, V., Salzmann, M., Fua, P.: Learning to find good correspondences. In: CVPR (2018)

    Google Scholar 

  109. Yue, H., Miao, J., Yu, Y., Chen, W., Wen, C.: Robust loop closure detection based on bag of superpoints and graph verification. In: IROS (2019)

    Google Scholar 

  110. Zhang, Z., et al.: Generative modeling of audible shapes for object perception. In: ICCV (2017)

    Google Scholar 

  111. Zhou, Q., Sattler, T., Pollefeys, M., Leal-Taixé, L.: To learn or not to learn: visual localization from essential matrices. In: ICRA (2019)

    Google Scholar 

  112. Zhou, Y., Barnes, C., Lu, J., Yang, J., Li, H.: On the continuity of rotation representations in neural networks. In: CVPR (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eric Brachmann .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1995 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, K., Firman, M., Brachmann, E., Godard, C. (2022). Camera Pose Estimation and Localization with Active Audio Sensing. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13697. Springer, Cham. https://doi.org/10.1007/978-3-031-19836-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19836-6_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19835-9

  • Online ISBN: 978-3-031-19836-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics