Skip to main content
Log in

A semantic-aware monocular projection model for accurate pose measurement

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Monocular vision system is widely used in many fields due to its simple structure, faster speed, and lower cost for object measurement. However, most of the current monocular methods have complicated mathematical models or require artificial markers to achieve accurate measurement results. In addition, it is not easy to precisely extract the features of objects in the captured image which are affected by many factors. In this paper, we present a semantic-aware monocular projection model for accurate pose measurement. Our mathematical model is simple and neat, and we use deep learning network to extract the semantic features in images. Finally, the relevant parameters of the projection model are further optimized with Kalman filter to make the measurement results more accurate and stable. The extensive experiments demonstrate that the proposed method is robust with high performance and accuracy. As a few constraints are required on the measured object and environment, our method is easy for installation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The datasets generated during the current study are available from the corresponding author on reasonable request.

References

  1. Mur-Artal R, Tardós JD (2017) Orb-slam2: an open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans Rob 33(5):1255–1262

    Article  Google Scholar 

  2. Desouza GN, Kak AC (2002) Vision for mobile robot navigation: a survey. IEEE Trans Pattern Anal Mach Intell 24(2):237–267

    Article  Google Scholar 

  3. Zhu Ren Zhang, Lin Yan, Zhang Lei (2006) A new algorithm for distance measurement of computer vision system for spacecraft rendezvous. J Beijing Univ Aeronaut Astronaut 32(7):764–768

    Google Scholar 

  4. Marr D (1982) Vision: a computational investigation into the human representation and processing of visual information. W.H. Freeman and Compay, San Francisco

    Google Scholar 

  5. Wolf J, Burgard W, Burkhardt H (2005) Robust vision-based localization by combining an image-retrieval system with Monte Carlo localization[J]. IEEE Trans Rob 21(2):208–216

    Article  Google Scholar 

  6. Deretey E, Ahmed MT, Marshall JA et al. (2015) Visual indoor positioning with a single camera using PnP. In: International Conference on Indoor Positioning & Indoor Navigation. 1–9

  7. Xu C, Zhang L, Cheng L et al (2017) Pose estimation from line correspondences: a complete analysis and a series of solutions. IEEE Trans Pattern Anal Mach Intell 39(6):1209–1222

    Article  Google Scholar 

  8. Qin L, Wang T, Hu Y et al (2016) Improved position and attitude determination method for monocular vision in vehicle collision warning system. Int J Pattern Recognit Artif Intell 30(07):1655019

    Article  MathSciNet  Google Scholar 

  9. Chen S, Li Y, Chen H (2017) A monocular vision localization algorithm based on maximum likelihood estimation[C]. In: IEEE International Conference on Real-time Computing & Robotics. IEEE, 561–566

  10. Yuhang Ji, Lizhuang Ma (2016) A Stereo Tree Based Stereo Matching Parallax Optimization Algorithm. J ComputAid Des Comput Graph. 28(12):2159–2167

    Google Scholar 

  11. Duggal S, Wang S, Ma WC et al (2019) Deeppruner: Learning efficient stereo matching via differentiable patchmatch. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 4384–4393

  12. Liu GD, Jiang GL, Xiong R et al. (2019) Binocular depth estimation using convolutional neural network with Siamese branches. In: 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO). New York:IEEE Press. 1717–1722

  13. Tan Nguyen L (2017) Omnidirectional vision-based distributed optimal tracking control for mobile multi-robot systems with kinematic and dynamic disturbance rejection. IEEE Trans Ind Electron 65(7):5693–5703

    Google Scholar 

  14. Saxena A, Chung SH, Ng AY (2008) 3-D depth reconstruction from a single still image. Int J Comput Vision 76(1):53–69

    Article  Google Scholar 

  15. David Eigen, Christian Puhrsch, Rob Fergus (2014) Depth map prediction from a single image using a multi-scale deep network. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS'14), Z Ghahramani, M Welling, C Cortes, ND Lawrence, KQ Weinberger (Eds), Vol. 2. MIT Press, Cambridge, MA, USA, 2366–2374

  16. I Laina, C Rupprecht, V Belagiannis, F Tombari, N Navab (2016) “Deeper Depth Prediction with Fully Convolutional Residual Networks,” In 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 2016 pp. 239–248

  17. Zhou TH, Matthew B, Noah S et al. (2017) Unsupervised learning of depth and ego-motion from video. In: the 30th IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, USA, 22–25

  18. Zhang YD, Ravi G, Chamara S W et al (2018) Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. The 31th IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 19–21

  19. Zhang L, Huang J, Li X et al (2018) Vision-based parking-slot detection: a DCNN-based approach and a large-scale benchmark dataset. IEEE Trans Image Process 27(11):5350–5364

    Article  MathSciNet  Google Scholar 

  20. Li L, Zhang L, Li X et al. (2017) Vision-based parking-slot detection: A benchmark and a learning-based approach[C]. In: 2017 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 649–654

  21. Wu Z, Sun W, Wang M, et al. (2020) Psdet: Efficient and universal parking slot detection. In: 2020 IEEE Intelligent Vehicles Symposium (IV). IEEE, 290–297

  22. Huang J, Zhang L, Shen Y, et al. (2019) DMPR-PS: A novel approach for parking-slot detection using directional marking-point regression. In: 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 212–217

  23. Min C, Xu J, Xiao L et al (2021) Attentional graph neural network for parking-slot detection. IEEE Robot Autom Lett 6(2):3445–3450

    Article  Google Scholar 

  24. Long J, Shelhamer E, Darrell T (2014) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651

    Google Scholar 

  25. Ronneberger O, Fischer P, Brox T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: International Conference on Medical Image Computing & Computer-assisted Intervention. 234–241

  26. H Zhao, J Shi, X Qi, X Wang, J Jia (2017) “Pyramid Scene Parsing Network,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 6230–6239

  27. Liu Y, Chen K, Liu C, et al. (2019) Structured knowledge distillation for semantic segmentation. In : Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2604–2613

  28. Zhang H, Dana K, Shi J, et al. (2018) Context encoding for semantic segmentation. In : Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 7151–7160

  29. Visin F, Ciccone M, Romero A, et al. (2016) Reseg: A recurrent neural network-based model for semantic segmentation. In : Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 41–48

  30. Xue Y, Xu T, Zhang H et al (2018) Segan: adversarial network with multi-scale l 1 loss for medical image segmentation. Neuroinformatics 16(3):383–392

    Article  Google Scholar 

  31. Ding X, Guo Y, Ding G, et al. (2019) Acnet: strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks. In : Proceedings of the IEEE/CVF International Conference on Computer Vision. 1911–1920

  32. Mehta S, Rastegari M, Caspi A, et al (2018) Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In : Proceedings of the european conference on computer vision (ECCV). 552–568

  33. Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision–ECCV 2018 Lecture Notes in Computer Science. Springer, Cham

    Google Scholar 

  34. Li H, Xiong P, Fan H, et al. (2019) Dfanet: Deep feature aggregation for real-time semantic segmentation[C]. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 9522–9531

  35. Sandler M, Howard A, Zhu M, et al. (2018) Mobilenetv2: Inverted residuals and linear bottlenecks In : Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520

  36. Fei Gao, Yisu Ge, Wang Tao Lu, Shufang Zhang Yuanming (2018) Vision-based localization model based on plane constraints. Chinese J Sci Instrum 39(07):183–190

    Google Scholar 

  37. B Fu, B Zhao, Y Cheng (2019) Monocular Camera Target Detection and Location. In : IEEE 21st Internation al Workshop on Multimedia Signal Processing, Kuala Lumpur, Malaysia, 1–3

Download references

Acknowledgements

This work is being supported by the National Key Research and Development Project of China under Grant No. 2020AAA0104001, the “Pioneer” and “Leading Goose” R&D Program of Zhejiang under Grant No. 2022C01120 and the Zhejiang Provincial Natural Science Foundation of China under Grant No. LQ22F020008.

Author information

Authors and Affiliations

Authors

Contributions

LW contributed to investigation, methodology, software, writing—original draft, writing—review & editing. XC contributed to software, validation, writing—review & editing. QQ contributed to software and writing—original draft. YZ contributed to validation, writing–review & editing. FG contributed to funding acquisition, project administration, and supervision.

Corresponding author

Correspondence to Fei Gao.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Weng, L., Chen, X., Qiu, Q. et al. A semantic-aware monocular projection model for accurate pose measurement. Pattern Anal Applic 26, 1703–1714 (2023). https://doi.org/10.1007/s10044-023-01197-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-023-01197-1

Keywords

Navigation