Camera Pose Estimation and Localization with Active Audio Sensing

Yang, Karren; Firman, Michael; Brachmann, Eric; Godard, Clément

doi:10.1007/978-3-031-19836-6_16

Karren Yang¹³,
Michael Firman¹²,
Eric Brachmann¹² &
…
Clément Godard¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13697))

Included in the following conference series:

European Conference on Computer Vision

2160 Accesses
1 Citations

Abstract

In this work, we show how to estimate a device’s position and orientation indoors by echolocation, i.e., by interpreting the echoes of an audio signal that the device itself emits. Established visual localization methods rely on the device’s camera and yield excellent accuracy if unique visual features are in view and depicted clearly. We argue that audio sensing can offer complementary information to vision for device localization, since audio is invariant to adverse visual conditions and can reveal scene information beyond a camera’s field of view. We first propose a strategy for learning an audio representation that captures the scene geometry around a device using supervision transfer from vision. Subsequently, we leverage this audio representation to complement vision in three device localization tasks: relative pose estimation, place recognition, and absolute pose regression. Our proposed methods outperform state-of-the-art vision models on new audio-visual benchmarks for the Replica and Matterport3D datasets.

K. Yang and C. Godard—Work done while at Niantic, during Karren’s internship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: CVPR (2016)
Google Scholar
Arandjelovic, R., Zisserman, A.: Look, listen and learn. In: ICCV (2017)
Google Scholar
Babenko, A., Lempitsky, V.: Aggregating local deep features for image retrieval. In: ICCV (2015)
Google Scholar
Balntas, V., Li, S., Prisacariu, V.: RelocNet: continuous metric learning relocalisation using neural nets. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 782–799. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_46
Chapter Google Scholar
Balntas, V., Riba, E., Ponsa, D., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: BMVC (2016)
Google Scholar
Bhowmik, A., Gumhold, S., Rother, C., Brachmann, E.: Reinforced feature points: optimizing feature detection and description for a high-level task. In: CVPR, June 2020
Google Scholar
Brachmann, E., et al.: DSAC - differentiable RANSAC for camera localization. In: CVPR (2017)
Google Scholar
Brachmann, E., Michel, F., Krull, A., Yang, M.Y., Gumhold, S., Rother, C.: Uncertainty-driven 6d pose estimation of objects and scenes from a single RGB image. In: CVPR (2016)
Google Scholar
Brachmann, E., Rother, C.: Learning less is more - 6D camera localization via 3D surface regression. In: CVPR (2018)
Google Scholar
Brachmann, E., Rother, C.: Expert sample consensus applied to camera re-localization. In: ICCV (2019)
Google Scholar
Brachmann, E., Rother, C.: Neural-guided RANSAC: learning where to sample model hypotheses. In: ICCV (2019)
Google Scholar
Brachmann, E., Rother, C.: Visual camera re-localization from RGB and RGB-D images using DSAC. TPAMI (2021)
Google Scholar
Brahmbhatt, S., Gu, J., Kim, K., Hays, J., Kautz, J.: Geometry-aware learning of maps for camera localization. In: CVPR (2018)
Google Scholar
Bui, M., et al.: 6D camera relocalization in ambiguous scenes via continuous multimodal inference. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 139–157. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_9
Chapter Google Scholar
Cai, R., Hariharan, B., Snavely, N., Averbuch-Elor, H.: Extreme rotation estimation using dense correlation volumes. In: CVPR (2021)
Google Scholar
Castle, R., Klein, G., Murray, D.W.: Video-rate localization in multiple maps for wearable augmented reality. In: 2008 12th IEEE International Symposium on Wearable Computers, pp. 15–22. IEEE (2008)
Google Scholar
Chang, A., et al.: Matterport3D: learning from RGB-D data in indoor environments. In: 3DV (2017)
Google Scholar
Chen, C., Al-Halah, Z., Grauman, K.: Semantic audio-visual navigation. In: CVPR (2021)
Google Scholar
Chen, C., et al.: Audio-visual embodied navigation. Environment 97, 103 (2019)
Google Scholar
Chen, C., et al.: SoundSpaces: audio-visual navigation in 3D environments. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12351, pp. 17–36. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_2
Chapter Google Scholar
Chen, C., Majumder, S., Al-Halah, Z., Gao, R., Ramakrishnan, S.K., Grauman, K.: Learning to set waypoints for audio-visual navigation. arXiv preprint arXiv:2008.09622 (2020)
Chen, K., Snavely, N., Makadia, A.: Wide-baseline relative camera pose estimation with directional learning. In: CVPR (2021)
Google Scholar
Chen, Z., Hu, X., Owens, A.: Structure from silence: learning scene structure from ambient sound. arXiv preprint arXiv:2111.05846 (2021)
Christensen, J.H., Hornauer, S., Stella, X.Y.: Batvision: learning to see 3D spatial layout with two ears. In: ICRA (2020)
Google Scholar
Debski, A., Grajewski, W., Zaborowski, W., Turek, W.: Open-source localization device for indoor mobile robots. Procedia Comput. Sci. 76, 139–146 (2015)
Article Google Scholar
Dokmanić, I., Parhizkar, R., Walther, A., Lu, Y.M., Vetterli, M.: Acoustic echoes reveal room shape. Proc. Natl. Acad. Sci. 110(30), 12186–12191 (2013)
Article Google Scholar
Dusmanu, M., et al.: D2-net: a trainable CNN for joint detection and description of local features. arXiv preprint arXiv:1905.03561 (2019)
Eliakim, I., Cohen, Z., Kosa, G., Yovel, Y.: A fully autonomous terrestrial bat-like acoustic robot. PLoS Comput. Biol. 14(9), e1006406 (2018)
Article Google Scholar
En, S., Lechervy, A., Jurie, F.: RPNet: an end-to-end network for relative camera pose estimation. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11129, pp. 738–745. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11009-3_46
Chapter Google Scholar
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Article MathSciNet Google Scholar
Fisher III, J.W., Darrell, T., Freeman, W., Viola, P.: Learning joint statistical models for audio-visual fusion and segregation. In: NeurIPS (2000)
Google Scholar
Gan, C., Zhao, H., Chen, P., Cox, D., Torralba, A.: Self-supervised moving vehicle tracking with stereo sound. In: ICCV (2019)
Google Scholar
Gao, R., Chen, C., Al-Halah, Z., Schissler, C., Grauman, K.: VisualEchoes: spatial image representation learning through echolocation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 658–676. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_38
Chapter Google Scholar
Gao, R., Grauman, K.: 2.5D visual sound. In: CVPR (2019)
Google Scholar
Gao, R., Oh, T.H., Grauman, K., Torresani, L.: Listen to look: action recognition by previewing audio. In: CVPR (2020)
Google Scholar
Garg, S., Fischer, T., Milford, M.: Where is your place, visual place recognition? IJCAI (2021)
Google Scholar
Greene, N.: Environment mapping and other applications of world projections. IEEE Comput. Graphics Appl. 6(11), 21–29 (1986)
Article Google Scholar
Hausler, S., Garg, S., Xu, M., Milford, M., Fischer, T.: Patch-NetVLAD: multi-scale fusion of locally-global descriptors for place recognition. In: CVPR (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Hershey, J., Movellan, J.: Audio vision: using audio-visual synchrony to locate sounds. In: NeurIPS (1999)
Google Scholar
Hu, J., Ozay, M., Zhang, Y., Okatani, T.: Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries. In: WACV (2019)
Google Scholar
Humenberger, M., et al.: Robust image retrieval-based visual localization using Kapture. arXiv:2007.13867 (2020)
Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: CVPR (2010)
Google Scholar
Kazakos, E., Nagrani, A., Zisserman, A., Damen, D.: Epic-fusion: audio-visual temporal binding for egocentric action recognition. In: ICCV (2019)
Google Scholar
Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: CVPR (2017)
Google Scholar
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: ICCV (2015)
Google Scholar
Kidron, E., Schechner, Y.Y., Elad, M.: Pixels that sound. In: CVPR (2005)
Google Scholar
Laskar, Z., Melekhov, I., Kalia, S., Kannala, J.: Camera relocalization by computing pairwise relative poses using convolutional neural network. In: ICCV Workshops (2017)
Google Scholar
Li, X., Wang, S., Zhao, Y., Verbeek, J., Kannala, J.: Hierarchical scene coordinate classification and regression for visual localization. In: CVPR (2020)
Google Scholar
Li, Y., Snavely, N., Huttenlocher, D.P.: Location recognition using prioritized feature matching. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6312, pp. 791–804. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15552-9_57
Chapter Google Scholar
Li, Y., Snavely, N., Huttenlocher, D., Fua, P.: Worldwide pose estimation using 3D point clouds. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 15–29. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33718-5_2
Chapter Google Scholar
Lim, H., Sinha, S.N., Cohen, M.F., Uyttendaele, M.: Real-time image-based 6-dof localization in large-scale environments. In: CVPR (2012)
Google Scholar
Lindell, D.B., Wetzstein, G., Koltun, V.: Acoustic non-line-of-sight imaging. In: CVPR (2019)
Google Scholar
Liu, D., Cui, Y., Yan, L., Mousas, C., Yang, B., Chen, Y.: Densernet: weakly supervised visual localization using multi-scale feature aggregation. In: Proceedings of the AAAI Conference on Artificial Intelligence (2021)
Google Scholar
Long, X., Gan, C., De Melo, G., Wu, J., Liu, X., Wen, S.: Attention clusters: purely attention based local feature integration for video classification. In: CVPR (2018)
Google Scholar
Masone, C., Caputo, B.: A survey on deep visual place recognition. IEEE Access 9, 19516–19547 (2021)
Article Google Scholar
Melekhov, I., Ylioinas, J., Kannala, J., Rahtu, E.: Relative camera pose estimation using convolutional neural networks. In: Blanc-Talon, J., Penne, R., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2017. LNCS, vol. 10617, pp. 675–687. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70353-4_57
Chapter Google Scholar
Morgado, P., Li, Y., Nvasconcelos, N.: Learning representations from audio-visual spatial alignment. In: NeurIPS, vol. 33, 4733–4744 (2020)
Google Scholar
Morgado, P., Nvasconcelos, N., Langlois, T., Wang, O.: Self-supervised generation of spatial audio for 360 video. In: NeurIPS, vol. 31 (2018)
Google Scholar
Morgado, P., Vasconcelos, N., Misra, I.: Audio-visual instance discrimination with cross-modal agreement. In: CVPR (2021)
Google Scholar
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: ICML (2011)
Google Scholar
Owens, A., Efros, A.A.: Audio-visual scene analysis with self-supervised multisensory features. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 639–658. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_39
Chapter Google Scholar
Owens, A., Isola, P., McDermott, J., Torralba, A., Adelson, E.H., Freeman, W.T.: Visually indicated sounds. In: CVPR (2016)
Google Scholar
Owens, A., Wu, J., McDermott, J.H., Freeman, W.T., Torralba, A.: Ambient sound provides supervision for visual learning. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 801–816. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_48
Chapter Google Scholar
Parida, K.K., Srivastava, S., Sharma, G.: Beyond image to depth: improving depth prediction using echoes. In: CVPR (2021)
Google Scholar
Politis, A., Mesaros, A., Adavanne, S., Heittola, T., Virtanen, T.: Overview and evaluation of sound event localization and detection in dcase 2019. IEEE/ACM Trans. Audio Speech Language Process. 29, 684–698 (2020)
Article Google Scholar
Poursaeed, O., et al.: Deep fundamental matrix estimation without correspondences. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11131, pp. 485–497. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11015-4_35
Chapter Google Scholar
Purushwalkam, S., et al.: Audio-visual floorplan reconstruction. In: ICCV (2021)
Google Scholar
Raguram, R., Frahm, J.-M., Pollefeys, M.: A comparative analysis of RANSAC techniques leading to adaptive real-time random sample consensus. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 500–513. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88688-4_37
Chapter Google Scholar
Ranftl, R., Koltun, V.: Deep fundamental matrix estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 292–309. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_18
Chapter Google Scholar
Revaud, J., Weinzaepfel, P., de Souza, C.R., Humenberger, M.: R2D2: repeatable and reliable detector and descriptor. In: NeurIPS (2019)
Google Scholar
de Sa, V.R.: Learning classification with unlabeled data. In: NeurIPS (1994)
Google Scholar
Sarlin, P.E., Cadena, C., Siegwart, R., Dymczyk, M.: From coarse to fine: robust hierarchical localization at large scale. In: CVPR (2019)
Google Scholar
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: learning feature matching with graph neural networks. In: CVPR (2020)
Google Scholar
Sarlin, P.E., et al.: Back to the feature: learning robust camera localization from pixels to pose. In: CVPR (2021). arxiv.org/abs/2103.09213
Sattler, T., Havlena, M., Radenovic, F., Schindler, K., Pollefeys, M.: Hyperpoints and fine vocabularies for large-scale location recognition. In: ICCV (2015)
Google Scholar
Sattler, T., Leibe, B., Kobbelt, L.: Improving image-based localization by active correspondence search. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 752–765. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33718-5_54
Sattler, T., Leibe, B., Kobbelt, L.: Efficient & effective prioritized matching for large-scale image-based localization. In: PAMI (2017)
Google Scholar
Sattler, T., et al..: Are large-scale 3D models really necessary for accurate visual localization? In: CVPR (2017)
Google Scholar
Savva, M., et al.: Habitat: a platform for embodied AI research. In: ICCV (2019)
Google Scholar
Shavit, Y., Ferens, R., Keller, Y.: Learning multi-scene absolute pose regression with transformers. In: ICCV, pp. 2733–2742, October 2021
Google Scholar
Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in RGB-D images. In: CVPR (2013)
Google Scholar
Singh, N., Mentch, J., Ng, J., Beveridge, M., Drori, I.: Image2reverb: cross-modal reverb impulse response synthesis. In: ICCV (2021)
Google Scholar
Sohl-Dickstein, J., et al.: A device for human ultrasonic echolocation. IEEE Trans. Biomed. Eng. 62(6), 1526–1534 (2015)
Article Google Scholar
Straub, J., et al.: The replica dataset: a digital replica of indoor spaces. arXiv preprint arXiv:1906.05797 (2019)
Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: LoFTR: detector-free local feature matching with transformers. In: CVPR (2021)
Google Scholar
Sun, W., Jiang, W., Trulls, E., Tagliasacchi, A., Yi, K.M.: ACNe: attentive context normalization for robust permutation-equivariant learning. In: CVPR, June 2020
Google Scholar
Svarm, L., Enqvist, O., Oskarsson, M., Kahl, F.: Accurate localization and pose estimation for large 3D models. In: CVPR (2014)
Google Scholar
Svärm, L., Enqvist, O., Kahl, F., Oskarsson, M.: City-scale localization for cameras with known vertical direction. TPAMI (2017)
Google Scholar
Taira, H., et aal.: InLoc: indoor visual localization with dense matching and view synthesis. In: CVPR (2018)
Google Scholar
Taira, H., et al.: InLoc: indoor visual localization with dense matching and view synthesis. In: TPAMI (2021)
Google Scholar
Taubner, F., Tschopp, F., Novkovic, T., Siegwart, R., Furrer, F.: LCD-line clustering and description for place recognition. In: 2020 International Conference on 3D Vision (3DV) (2020)
Google Scholar
Thrun, S.: Affine structure from sound. In: NeurIPS (2005)
Google Scholar
Torii, A., Arandjelovic, R., Sivic, J., Okutomi, M., Pajdla, T.: 24/7 place recognition by view synthesis. In: CVPR (2015)
Google Scholar
Türkoğlu, M.Ö., Brachmann, E., Schindler, K., Brostow, G., Monszpart, A.: Visual camera re-localization using graph neural networks and relative pose supervision. In: 3DV. IEEE (2021)
Google Scholar
Tyszkiewicz, M., Fua, P., Trulls, E.: Disk: learning local features with policy gradient. In: NeurIPS (2020)
Google Scholar
Valentin, J., Nießner, M., Shotton, J., Fitzgibbon, A., Izadi, S., Torr, P.: Exploiting uncertainty in regression forests for accurate camera relocalization. In: CVPR (2015)
Google Scholar
Vasudevan, A.B., Dai, D., Van Gool, L.: Semantic object prediction and spatial sound super-resolution with binaural sounds. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 638–655. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_37
Chapter Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NeurIPS, vol. 30 (2017)
Google Scholar
Villalpando, A.P., Schillaci, G., Hafner, V.V., Guzmán, B.L.: Ego-noise predictions for echolocation in wheeled robots. In: ALIFE 2019: The 2019 Conference on Artificial Life, pp. 567–573. MIT Press (2019)
Google Scholar
Wagner, D., Reitmayr, G., Mulloni, A., Drummond, T., Schmalstieg, D.: Real-time detection and tracking for augmented reality on mobile phones. IEEE Trans. Visual Comput. Graphics 16(3), 355–368 (2009)
Article Google Scholar
Walch, F., Hazirbas, C., Leal-Taixé, L., Sattler, T., Hilsenbeck, S., Cremers, D.: Image-Based Localization Using LSTMs for Structured Feature Correlation. In: ICCV (2017)
Google Scholar
Wang, W., Tran, D., Feiszli, M.: What makes training multi-modal classification networks hard? In: CVPR (2020)
Google Scholar
Winkelbauer, D., Denninger, M., Triebel, R.: Learning to localize in new environments from synthetic training data. In: ICRA (2021)
Google Scholar
Wu, Z., Jiang, Y.G., Wang, X., Ye, H., Xue, X.: Multi-stream multi-class fusion of deep networks for video classification. In: Proceedings of the 24th ACM international conference on Multimedia, pp. 791–800 (2016)
Google Scholar
Yang, K., Lin, W.Y., Barman, M., Condessa, F., Kolter, Z.: Defending multimodal fusion models against single-source adversaries. In: CVPR (2021)
Google Scholar
Yang, K., Russell, B., Salamon, J.: Telling left from right: learning spatial correspondence of sight and sound. In: CVPR (2020)
Google Scholar
Yi, K.M., Trulls, E., Ono, Y., Lepetit, V., Salzmann, M., Fua, P.: Learning to find good correspondences. In: CVPR (2018)
Google Scholar
Yue, H., Miao, J., Yu, Y., Chen, W., Wen, C.: Robust loop closure detection based on bag of superpoints and graph verification. In: IROS (2019)
Google Scholar
Zhang, Z., et al.: Generative modeling of audible shapes for object perception. In: ICCV (2017)
Google Scholar
Zhou, Q., Sattler, T., Pollefeys, M., Leal-Taixé, L.: To learn or not to learn: visual localization from essential matrices. In: ICRA (2019)
Google Scholar
Zhou, Y., Barnes, C., Lu, J., Yang, J., Li, H.: On the continuity of rotation representations in neural networks. In: CVPR (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Niantic, London, UK
Michael Firman & Eric Brachmann
MIT, Cambridge, USA
Karren Yang
Google, San Francisco, USA
Clément Godard

Authors

Karren Yang
View author publications
You can also search for this author in PubMed Google Scholar
Michael Firman
View author publications
You can also search for this author in PubMed Google Scholar
Eric Brachmann
View author publications
You can also search for this author in PubMed Google Scholar
Clément Godard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eric Brachmann .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1995 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, K., Firman, M., Brachmann, E., Godard, C. (2022). Camera Pose Estimation and Localization with Active Audio Sensing. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13697. Springer, Cham. https://doi.org/10.1007/978-3-031-19836-6_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-19836-6_16
Published: 22 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19835-9
Online ISBN: 978-3-031-19836-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Camera Pose Estimation and Localization with Active Audio Sensing