RelMobNet: End-to-End Relative Camera Pose Estimation Using a Robust Two-Stage Training

Rajendran, Praveen Kumar; Mishra, Sumit; Vecchietti, Luiz Felipe; Har, Dongsoo

doi:10.1007/978-3-031-25075-0_18

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13806))

Included in the following conference series:

European Conference on Computer Vision

1361 Accesses
1 Citations

Abstract

Relative camera pose estimation, i.e. estimating the translation and rotation vectors using a pair of images taken in different locations, is an important part of systems in augmented reality and robotics. In this paper, we present an end-to-end relative camera pose estimation network using a siamese architecture that is independent of camera parameters. The network is trained using the Cambridge Landmarks data with four individual scene datasets and a dataset combining the four scenes. To improve generalization, we propose a novel two-stage training that alleviates the need of a hyperparameter to balance the translation and rotation loss scale. The proposed method is compared with one-stage training CNN-based methods such as RPNet and RCPNet and demonstrate that the proposed model improves translation vector estimation by 16.11%, 28.88%, and 52.27% on the Kings College, Old Hospital, and St Marys Church scenes, respectively. For proving texture invariance, we investigate the generalization of the proposed method augmenting the datasets to different scene styles, as ablation studies, using generative adversarial networks. Also, we present a qualitative assessment of epipolar lines of our network predictions and ground truth poses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bailo, O., Rameau, F., Joo, K., Park, J., Bogdan, O., Kweon, I.S.: Efficient adaptive non-maximal suppression algorithms for homogeneous spatial keypoint distribution. Pattern Recogn. Lett. 106, 53–60 (2018)
Article Google Scholar
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (surf). Comput. Vis. Image Underst. 110(3), 346–359 (2008)
Article Google Scholar
Brachmann, E., et al.: Dsac-differentiable ransac for camera localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6684–6692 (2017)
Google Scholar
Bradski, G.: The OpenCV library. Dr. Dobb’s J. Softw. Tools (2000)
Google Scholar
Chen, K., Snavely, N., Makadia, A.: Wide-baseline relative camera pose estimation with directional learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3258–3268 (2021)
Google Scholar
Dusmanu, M., et al.: D2-net: a trainable CNN for joint detection and description of local features. arXiv preprint arXiv:1905.03561 (2019)
En, S., Lechervy, A., Jurie, F.: Rpnet: an end-to-end network for relative camera pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 0–0 (2018)
Google Scholar
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Article MathSciNet Google Scholar
Graziani, M., Lompech, T., Müller, H., Depeursinge, A., Andrearczyk, V.: On the scale invariance in state of the art CNNs trained on imagenet. Mach. Learn. Knowl. Extraction 3(2), 374–391 (2021)
Article Google Scholar
Hartley, R., Zisserman, A.: Multiple view geometry in computer vision (cambridge university, 2003). C1 C3 2 (2013)
Google Scholar
Hartley, R.I.: In defense of the eight-point algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 19(6), 580–593 (1997)
Article Google Scholar
Howard, A., et al.: Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)
Google Scholar
Hwang, K., Cho, J., Park, J., Har, D., Ahn, S.: Ferrite position identification system operating with wireless power transfer for intelligent train position detection. IEEE Trans. Intell. Transp. Syst. 20(1), 374–382 (2018)
Article Google Scholar
Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5974–5983 (2017)
Google Scholar
Kendall, A., Grimes, M., Cipolla, R.: Posenet: a convolutional network for real-time 6-dof camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2938–2946 (2015)
Google Scholar
Kim, S., Kim, I., Vecchietti, L.F., Har, D.: Pose estimation utilizing a gated recurrent unit network for visual localization. Appl. Sci. 10(24), 8876 (2020)
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lee, S., Lee, J., Jung, H., Cho, J., Hong, J., Lee, S., Har, D.: Optimal power management for nanogrids based on technical information of electric appliances. Energy Build. 191, 174–186 (2019)
Article Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)
Article Google Scholar
Melekhov, I., Ylioinas, J., Kannala, J., Rahtu, E.: Relative camera pose estimation using convolutional neural networks. In: Blanc-Talon, J., Penne, R., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2017. LNCS, vol. 10617, pp. 675–687. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70353-4_57
Chapter Google Scholar
Mina, R.: fast-neural-style: Fast style transfer in pytorch! (2018). https://github.com/iamRusty/fast-neural-style-pytorch
Moraes, C., Myung, S., Lee, S., Har, D.: Distributed sensor nodes charged by mobile charger with directional antenna and by energy trading for balancing. Sensors 17(1), 122 (2017)
Article Google Scholar
Nistér, D.: An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 756–770 (2004)
Article Google Scholar
Paszke, A., et al.: Adaptiveavgpool2d. https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc. (2019). https://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)
Google Scholar
Poursaeed, O., et al.: Deep fundamental matrix estimation without correspondences. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11131, pp. 485–497. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11015-4_35
Chapter Google Scholar
Raguram, R., Frahm, J.-M., Pollefeys, M.: A comparative analysis of RANSAC techniques leading to adaptive real-time random sample consensus. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 500–513. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88688-4_37
Chapter Google Scholar
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: an efficient alternative to sift or surf. In: 2011 International Conference on Computer Vision, pp. 2564–2571. IEEE (2011)
Google Scholar
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4938–4947 (2020)
Google Scholar
Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Seo, M., Vecchietti, L.F., Lee, S., Har, D.: Rewards prediction-based credit assignment for reinforcement learning with sparse binary rewards. IEEE Access 7, 118776–118791 (2019)
Article Google Scholar
Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: Loftr: detector-free local feature matching with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8922–8931 (2021)
Google Scholar
Wynn, K.: pyquaternion (2020). https://github.com/KieranWynn/pyquaternion
Yang, C., Liu, Y., Zell, A.: Rcpnet: deep-learning based relative camera pose estimation for uavs. In: 2020 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 1085–1092. IEEE (2020)
Google Scholar
Yew, Z.J., Lee, G.H.: Regtr: end-to-end point cloud correspondences with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6677–6686 (2022)
Google Scholar

Download references

Acknowledgement

This work was supported by the Institute for Information communications Technology Promotion (IITP) grant funded by the Korean government (MSIT) (No. 2020-0-00440, Development of Artificial Intelligence Technology that continuously improves itself as the situation changes in the real world).

Author information

Authors and Affiliations

Division of Future Vehicle, KAIST, Daejeon, South Korea
Praveen Kumar Rajendran
The Robotics Program, KAIST, Daejeon, South Korea
Sumit Mishra
Data Science Group, Institute for Basic Science, Daejeon, South Korea
Luiz Felipe Vecchietti
The CCS Graduate School of Mobility, KAIST, Daejeon, South Korea
Dongsoo Har

Authors

Praveen Kumar Rajendran
View author publications
You can also search for this author in PubMed Google Scholar
Sumit Mishra
View author publications
You can also search for this author in PubMed Google Scholar
Luiz Felipe Vecchietti
View author publications
You can also search for this author in PubMed Google Scholar
Dongsoo Har
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dongsoo Har .

Editor information

Editors and Affiliations

IBM Research - MIT-IBM Watson AI Lab, Massachusetts, USA
Leonid Karlinsky
Technion – Israel Institute of Technology, Haifa, Israel
Tomer Michaeli
Kyoto University, Kyoto, Japan
Ko Nishino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rajendran, P.K., Mishra, S., Vecchietti, L.F., Har, D. (2023). RelMobNet: End-to-End Relative Camera Pose Estimation Using a Robust Two-Stage Training. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13806. Springer, Cham. https://doi.org/10.1007/978-3-031-25075-0_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-25075-0_18
Published: 19 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25074-3
Online ISBN: 978-3-031-25075-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

RelMobNet: End-to-End Relative Camera Pose Estimation Using a Robust Two-Stage Training