A semantic-aware monocular projection model for accurate pose measurement

Weng, Libo; Chen, Xiuqi; Qiu, Qi; Zhuang, Yaozhong; Gao, Fei

doi:10.1007/s10044-023-01197-1

A semantic-aware monocular projection model for accurate pose measurement

Theoretical Advances
Published: 07 October 2023

Volume 26, pages 1703–1714, (2023)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Libo Weng¹,
Xiuqi Chen¹,
Qi Qiu¹,
Yaozhong Zhuang² &
…
Fei Gao ORCID: orcid.org/0000-0002-4678-1936¹

131 Accesses
Explore all metrics

Abstract

Monocular vision system is widely used in many fields due to its simple structure, faster speed, and lower cost for object measurement. However, most of the current monocular methods have complicated mathematical models or require artificial markers to achieve accurate measurement results. In addition, it is not easy to precisely extract the features of objects in the captured image which are affected by many factors. In this paper, we present a semantic-aware monocular projection model for accurate pose measurement. Our mathematical model is simple and neat, and we use deep learning network to extract the semantic features in images. Finally, the relevant parameters of the projection model are further optimized with Kalman filter to make the measurement results more accurate and stable. The extensive experiments demonstrate that the proposed method is robust with high performance and accuracy. As a few constraints are required on the measured object and environment, our method is easy for installation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Overview of Monocular Depth Estimation Based on Deep Learning

A Synopsis of Monocular Depth Estimation

A review on monocular tracking and mapping: from model-based to data-driven methods

Article 17 November 2022

Data availability

The datasets generated during the current study are available from the corresponding author on reasonable request.

References

Mur-Artal R, Tardós JD (2017) Orb-slam2: an open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans Rob 33(5):1255–1262
Article Google Scholar
Desouza GN, Kak AC (2002) Vision for mobile robot navigation: a survey. IEEE Trans Pattern Anal Mach Intell 24(2):237–267
Article Google Scholar
Zhu Ren Zhang, Lin Yan, Zhang Lei (2006) A new algorithm for distance measurement of computer vision system for spacecraft rendezvous. J Beijing Univ Aeronaut Astronaut 32(7):764–768
Google Scholar
Marr D (1982) Vision: a computational investigation into the human representation and processing of visual information. W.H. Freeman and Compay, San Francisco
Google Scholar
Wolf J, Burgard W, Burkhardt H (2005) Robust vision-based localization by combining an image-retrieval system with Monte Carlo localization[J]. IEEE Trans Rob 21(2):208–216
Article Google Scholar
Deretey E, Ahmed MT, Marshall JA et al. (2015) Visual indoor positioning with a single camera using PnP. In: International Conference on Indoor Positioning & Indoor Navigation. 1–9
Xu C, Zhang L, Cheng L et al (2017) Pose estimation from line correspondences: a complete analysis and a series of solutions. IEEE Trans Pattern Anal Mach Intell 39(6):1209–1222
Article Google Scholar
Qin L, Wang T, Hu Y et al (2016) Improved position and attitude determination method for monocular vision in vehicle collision warning system. Int J Pattern Recognit Artif Intell 30(07):1655019
Article MathSciNet Google Scholar
Chen S, Li Y, Chen H (2017) A monocular vision localization algorithm based on maximum likelihood estimation[C]. In: IEEE International Conference on Real-time Computing & Robotics. IEEE, 561–566
Yuhang Ji, Lizhuang Ma (2016) A Stereo Tree Based Stereo Matching Parallax Optimization Algorithm. J ComputAid Des Comput Graph. 28(12):2159–2167
Google Scholar
Duggal S, Wang S, Ma WC et al (2019) Deeppruner: Learning efficient stereo matching via differentiable patchmatch. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 4384–4393
Liu GD, Jiang GL, Xiong R et al. (2019) Binocular depth estimation using convolutional neural network with Siamese branches. In: 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO). New York:IEEE Press. 1717–1722
Tan Nguyen L (2017) Omnidirectional vision-based distributed optimal tracking control for mobile multi-robot systems with kinematic and dynamic disturbance rejection. IEEE Trans Ind Electron 65(7):5693–5703
Google Scholar
Saxena A, Chung SH, Ng AY (2008) 3-D depth reconstruction from a single still image. Int J Comput Vision 76(1):53–69
Article Google Scholar
David Eigen, Christian Puhrsch, Rob Fergus (2014) Depth map prediction from a single image using a multi-scale deep network. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS'14), Z Ghahramani, M Welling, C Cortes, ND Lawrence, KQ Weinberger (Eds), Vol. 2. MIT Press, Cambridge, MA, USA, 2366–2374
I Laina, C Rupprecht, V Belagiannis, F Tombari, N Navab (2016) “Deeper Depth Prediction with Fully Convolutional Residual Networks,” In 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 2016 pp. 239–248
Zhou TH, Matthew B, Noah S et al. (2017) Unsupervised learning of depth and ego-motion from video. In: the 30th IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, USA, 22–25
Zhang YD, Ravi G, Chamara S W et al (2018) Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. The 31th IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 19–21
Zhang L, Huang J, Li X et al (2018) Vision-based parking-slot detection: a DCNN-based approach and a large-scale benchmark dataset. IEEE Trans Image Process 27(11):5350–5364
Article MathSciNet Google Scholar
Li L, Zhang L, Li X et al. (2017) Vision-based parking-slot detection: A benchmark and a learning-based approach[C]. In: 2017 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 649–654
Wu Z, Sun W, Wang M, et al. (2020) Psdet: Efficient and universal parking slot detection. In: 2020 IEEE Intelligent Vehicles Symposium (IV). IEEE, 290–297
Huang J, Zhang L, Shen Y, et al. (2019) DMPR-PS: A novel approach for parking-slot detection using directional marking-point regression. In: 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 212–217
Min C, Xu J, Xiao L et al (2021) Attentional graph neural network for parking-slot detection. IEEE Robot Autom Lett 6(2):3445–3450
Article Google Scholar
Long J, Shelhamer E, Darrell T (2014) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651
Google Scholar
Ronneberger O, Fischer P, Brox T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: International Conference on Medical Image Computing & Computer-assisted Intervention. 234–241
H Zhao, J Shi, X Qi, X Wang, J Jia (2017) “Pyramid Scene Parsing Network,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 6230–6239
Liu Y, Chen K, Liu C, et al. (2019) Structured knowledge distillation for semantic segmentation. In : Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2604–2613
Zhang H, Dana K, Shi J, et al. (2018) Context encoding for semantic segmentation. In : Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 7151–7160
Visin F, Ciccone M, Romero A, et al. (2016) Reseg: A recurrent neural network-based model for semantic segmentation. In : Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 41–48
Xue Y, Xu T, Zhang H et al (2018) Segan: adversarial network with multi-scale l 1 loss for medical image segmentation. Neuroinformatics 16(3):383–392
Article Google Scholar
Ding X, Guo Y, Ding G, et al. (2019) Acnet: strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks. In : Proceedings of the IEEE/CVF International Conference on Computer Vision. 1911–1920
Mehta S, Rastegari M, Caspi A, et al (2018) Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In : Proceedings of the european conference on computer vision (ECCV). 552–568
Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision–ECCV 2018 Lecture Notes in Computer Science. Springer, Cham
Google Scholar
Li H, Xiong P, Fan H, et al. (2019) Dfanet: Deep feature aggregation for real-time semantic segmentation[C]. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 9522–9531
Sandler M, Howard A, Zhu M, et al. (2018) Mobilenetv2: Inverted residuals and linear bottlenecks In : Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520
Fei Gao, Yisu Ge, Wang Tao Lu, Shufang Zhang Yuanming (2018) Vision-based localization model based on plane constraints. Chinese J Sci Instrum 39(07):183–190
Google Scholar
B Fu, B Zhao, Y Cheng (2019) Monocular Camera Target Detection and Location. In : IEEE 21st Internation al Workshop on Multimedia Signal Processing, Kuala Lumpur, Malaysia, 1–3

Download references

Acknowledgements

This work is being supported by the National Key Research and Development Project of China under Grant No. 2020AAA0104001, the “Pioneer” and “Leading Goose” R&D Program of Zhejiang under Grant No. 2022C01120 and the Zhejiang Provincial Natural Science Foundation of China under Grant No. LQ22F020008.

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, 310023, China
Libo Weng, Xiuqi Chen, Qi Qiu & Fei Gao
Xinfengming Group Co., Ltd, Tong Xiang, 314513, China
Yaozhong Zhuang

Authors

Libo Weng
View author publications
You can also search for this author in PubMed Google Scholar
Xiuqi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Qi Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Yaozhong Zhuang
View author publications
You can also search for this author in PubMed Google Scholar
Fei Gao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

LW contributed to investigation, methodology, software, writing—original draft, writing—review & editing. XC contributed to software, validation, writing—review & editing. QQ contributed to software and writing—original draft. YZ contributed to validation, writing–review & editing. FG contributed to funding acquisition, project administration, and supervision.

Corresponding author

Correspondence to Fei Gao.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Weng, L., Chen, X., Qiu, Q. et al. A semantic-aware monocular projection model for accurate pose measurement. Pattern Anal Applic 26, 1703–1714 (2023). https://doi.org/10.1007/s10044-023-01197-1

Download citation

Received: 21 September 2022
Accepted: 06 September 2023
Published: 07 October 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s10044-023-01197-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A semantic-aware monocular projection model for accurate pose measurement

Abstract

Access this article

Similar content being viewed by others

Overview of Monocular Depth Estimation Based on Deep Learning

A Synopsis of Monocular Depth Estimation

A review on monocular tracking and mapping: from model-based to data-driven methods

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation