Abstract
Monocular vision system is widely used in many fields due to its simple structure, faster speed, and lower cost for object measurement. However, most of the current monocular methods have complicated mathematical models or require artificial markers to achieve accurate measurement results. In addition, it is not easy to precisely extract the features of objects in the captured image which are affected by many factors. In this paper, we present a semantic-aware monocular projection model for accurate pose measurement. Our mathematical model is simple and neat, and we use deep learning network to extract the semantic features in images. Finally, the relevant parameters of the projection model are further optimized with Kalman filter to make the measurement results more accurate and stable. The extensive experiments demonstrate that the proposed method is robust with high performance and accuracy. As a few constraints are required on the measured object and environment, our method is easy for installation.
Similar content being viewed by others
Data availability
The datasets generated during the current study are available from the corresponding author on reasonable request.
References
Mur-Artal R, Tardós JD (2017) Orb-slam2: an open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans Rob 33(5):1255–1262
Desouza GN, Kak AC (2002) Vision for mobile robot navigation: a survey. IEEE Trans Pattern Anal Mach Intell 24(2):237–267
Zhu Ren Zhang, Lin Yan, Zhang Lei (2006) A new algorithm for distance measurement of computer vision system for spacecraft rendezvous. J Beijing Univ Aeronaut Astronaut 32(7):764–768
Marr D (1982) Vision: a computational investigation into the human representation and processing of visual information. W.H. Freeman and Compay, San Francisco
Wolf J, Burgard W, Burkhardt H (2005) Robust vision-based localization by combining an image-retrieval system with Monte Carlo localization[J]. IEEE Trans Rob 21(2):208–216
Deretey E, Ahmed MT, Marshall JA et al. (2015) Visual indoor positioning with a single camera using PnP. In: International Conference on Indoor Positioning & Indoor Navigation. 1–9
Xu C, Zhang L, Cheng L et al (2017) Pose estimation from line correspondences: a complete analysis and a series of solutions. IEEE Trans Pattern Anal Mach Intell 39(6):1209–1222
Qin L, Wang T, Hu Y et al (2016) Improved position and attitude determination method for monocular vision in vehicle collision warning system. Int J Pattern Recognit Artif Intell 30(07):1655019
Chen S, Li Y, Chen H (2017) A monocular vision localization algorithm based on maximum likelihood estimation[C]. In: IEEE International Conference on Real-time Computing & Robotics. IEEE, 561–566
Yuhang Ji, Lizhuang Ma (2016) A Stereo Tree Based Stereo Matching Parallax Optimization Algorithm. J ComputAid Des Comput Graph. 28(12):2159–2167
Duggal S, Wang S, Ma WC et al (2019) Deeppruner: Learning efficient stereo matching via differentiable patchmatch. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 4384–4393
Liu GD, Jiang GL, Xiong R et al. (2019) Binocular depth estimation using convolutional neural network with Siamese branches. In: 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO). New York:IEEE Press. 1717–1722
Tan Nguyen L (2017) Omnidirectional vision-based distributed optimal tracking control for mobile multi-robot systems with kinematic and dynamic disturbance rejection. IEEE Trans Ind Electron 65(7):5693–5703
Saxena A, Chung SH, Ng AY (2008) 3-D depth reconstruction from a single still image. Int J Comput Vision 76(1):53–69
David Eigen, Christian Puhrsch, Rob Fergus (2014) Depth map prediction from a single image using a multi-scale deep network. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS'14), Z Ghahramani, M Welling, C Cortes, ND Lawrence, KQ Weinberger (Eds), Vol. 2. MIT Press, Cambridge, MA, USA, 2366–2374
I Laina, C Rupprecht, V Belagiannis, F Tombari, N Navab (2016) “Deeper Depth Prediction with Fully Convolutional Residual Networks,” In 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 2016 pp. 239–248
Zhou TH, Matthew B, Noah S et al. (2017) Unsupervised learning of depth and ego-motion from video. In: the 30th IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, USA, 22–25
Zhang YD, Ravi G, Chamara S W et al (2018) Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. The 31th IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 19–21
Zhang L, Huang J, Li X et al (2018) Vision-based parking-slot detection: a DCNN-based approach and a large-scale benchmark dataset. IEEE Trans Image Process 27(11):5350–5364
Li L, Zhang L, Li X et al. (2017) Vision-based parking-slot detection: A benchmark and a learning-based approach[C]. In: 2017 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 649–654
Wu Z, Sun W, Wang M, et al. (2020) Psdet: Efficient and universal parking slot detection. In: 2020 IEEE Intelligent Vehicles Symposium (IV). IEEE, 290–297
Huang J, Zhang L, Shen Y, et al. (2019) DMPR-PS: A novel approach for parking-slot detection using directional marking-point regression. In: 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 212–217
Min C, Xu J, Xiao L et al (2021) Attentional graph neural network for parking-slot detection. IEEE Robot Autom Lett 6(2):3445–3450
Long J, Shelhamer E, Darrell T (2014) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651
Ronneberger O, Fischer P, Brox T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: International Conference on Medical Image Computing & Computer-assisted Intervention. 234–241
H Zhao, J Shi, X Qi, X Wang, J Jia (2017) “Pyramid Scene Parsing Network,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 6230–6239
Liu Y, Chen K, Liu C, et al. (2019) Structured knowledge distillation for semantic segmentation. In : Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2604–2613
Zhang H, Dana K, Shi J, et al. (2018) Context encoding for semantic segmentation. In : Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 7151–7160
Visin F, Ciccone M, Romero A, et al. (2016) Reseg: A recurrent neural network-based model for semantic segmentation. In : Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 41–48
Xue Y, Xu T, Zhang H et al (2018) Segan: adversarial network with multi-scale l 1 loss for medical image segmentation. Neuroinformatics 16(3):383–392
Ding X, Guo Y, Ding G, et al. (2019) Acnet: strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks. In : Proceedings of the IEEE/CVF International Conference on Computer Vision. 1911–1920
Mehta S, Rastegari M, Caspi A, et al (2018) Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In : Proceedings of the european conference on computer vision (ECCV). 552–568
Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision–ECCV 2018 Lecture Notes in Computer Science. Springer, Cham
Li H, Xiong P, Fan H, et al. (2019) Dfanet: Deep feature aggregation for real-time semantic segmentation[C]. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 9522–9531
Sandler M, Howard A, Zhu M, et al. (2018) Mobilenetv2: Inverted residuals and linear bottlenecks In : Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520
Fei Gao, Yisu Ge, Wang Tao Lu, Shufang Zhang Yuanming (2018) Vision-based localization model based on plane constraints. Chinese J Sci Instrum 39(07):183–190
B Fu, B Zhao, Y Cheng (2019) Monocular Camera Target Detection and Location. In : IEEE 21st Internation al Workshop on Multimedia Signal Processing, Kuala Lumpur, Malaysia, 1–3
Acknowledgements
This work is being supported by the National Key Research and Development Project of China under Grant No. 2020AAA0104001, the “Pioneer” and “Leading Goose” R&D Program of Zhejiang under Grant No. 2022C01120 and the Zhejiang Provincial Natural Science Foundation of China under Grant No. LQ22F020008.
Author information
Authors and Affiliations
Contributions
LW contributed to investigation, methodology, software, writing—original draft, writing—review & editing. XC contributed to software, validation, writing—review & editing. QQ contributed to software and writing—original draft. YZ contributed to validation, writing–review & editing. FG contributed to funding acquisition, project administration, and supervision.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Weng, L., Chen, X., Qiu, Q. et al. A semantic-aware monocular projection model for accurate pose measurement. Pattern Anal Applic 26, 1703–1714 (2023). https://doi.org/10.1007/s10044-023-01197-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-023-01197-1