research-article

Monocular 3D Pose Estimation of Very Small Airplane in the Air

Authors:
Sung Kwon On

Republic of Korea Air Force Academy, KR

Republic of Korea Air Force Academy, KR

0009-0009-3262-9800
View Profile

,
Songhyon Kim

Republic of Korea Air Force Academy, KR

Republic of Korea Air Force Academy, KR

0000-0003-2258-8549
View Profile

,
Kwangjin Yang

Republic of Korea Air Force Academy, KR

Republic of Korea Air Force Academy, KR

0009-0002-1418-2151
View Profile

,
Younggun Lee

Republic of Korea Air Force Academy, KR

Republic of Korea Air Force Academy, KR

0000-0002-9416-3139
View Profile

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in AsiaDecember 2023Article No.: 82Pages 1–7https://doi.org/10.1145/3595916.3626456

Published:01 January 2024Publication History

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

Pages 1–7

ABSTRACT

In this paper, a novel pose estimation algorithm is proposed specifically for maneuvering airplanes in the air. The algorithm consists of two main stages. The first stage involves semantic segmentation of a monocular input image of a flying airplane, where the entire captured area serves as feature points for the airplane, which are typically small in the image. The second stage focuses on the 3D pose estimation of the segmented image using projective registration. Since airplanes have unique characteristics and there is a scarcity of airplane-specific datasets, a custom dataset is generated for the experiments. Unreal Engine 4, a 3D computer graphics game engine renowned for its realistic simulations, is employed for this purpose. Experimental results demonstrate the suitability of the algorithm for 3D pose estimation of airplanes, providing valuable information for studying autonomous control of airplanes.

References

Daniel Bolya, Chong Zhou, Fanyi Xiao, and Yong Jae Lee. 2019. Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF international conference on computer vision. 9157–9166.Google ScholarCross Ref
Garrick Brazil and Xiaoming Liu. 2019. M3d-rpn: Monocular 3d region proposal network for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9287–9296.Google ScholarCross Ref
Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. 2020. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 11621–11631.Google ScholarCross Ref
Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).Google Scholar
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV). 801–818.Google ScholarDigital Library
Oscal Tzyh-Chiang Chen, Yu-Xuan Chang, Yu-Wei Jhao, Chih-Yu Chung, Yun-Ling Chang, and Wei-Hsiang Huang. 2022. 3D Object Detection of Cars and Pedestrians by Deep Neural Networks from Unit-Sharing One-Shot NAS. In 2022 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). 1–8. https://doi.org/10.1109/AVSS56176.2022.9959427Google ScholarCross Ref
Xiaozhi Chen, Kaustav Kundu, Ziyu Zhang, Huimin Ma, Sanja Fidler, and Raquel Urtasun. 2016. Monocular 3d object detection for autonomous driving. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2147–2156.Google ScholarCross Ref
Jin-Kyu Choi, Yong-Tae Lee, HeaSook Park, BongSoo Kim, and Byung-Woon Kim. 2022. Challenges to the Development of Manned and Unmanned Combat Systems. In 2022 13th International Conference on Information and Communication Technology Convergence (ICTC). 2362–2364. https://doi.org/10.1109/ICTC55196.2022.9952483Google ScholarCross Ref
Mingyu Ding, Yuqi Huo, Hongwei Yi, Zhe Wang, Jianping Shi, Zhiwu Lu, and Ping Luo. 2020. Learning depth-guided convolutions for monocular 3d object detection. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition workshops. 1000–1001.Google ScholarCross Ref
Daoyong Fu, Songchen Han, Wei Li, and Hanren Lin. 2023. The Pose Estimation of the Aircraft on the Airport Surface Based on the Contour Features. IEEE Trans. Aerospace Electron. Systems 59, 2 (2023), 817–826. https://doi.org/10.1109/TAES.2022.3192220Google ScholarCross Ref
Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarDigital Library
Tong He and Stefano Soatto. 2019. Mono3d++: Monocular 3d vehicle detection with two-scale 3d hypotheses and task priors. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 8409–8416.Google ScholarDigital Library
Xinyu Huang, Xinjing Cheng, Qichuan Geng, Binbin Cao, Dingfu Zhou, Peng Wang, Yuanqing Lin, and Ruigang Yang. 2018. The apolloscape dataset for autonomous driving. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 954–960.Google ScholarCross Ref
Peixuan Li, Huaici Zhao, Pengfei Liu, and Feidao Cao. 2020. Rtm3d: Real-time monocular 3d detection from object keypoints for autonomous driving. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16. Springer, 644–660.Google Scholar
Shichao Li, Zengqiang Yan, Hongyang Li, and Kwang-Ting Cheng. 2021. Exploring intermediate representation for monocular vehicle pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1873–1883.Google ScholarCross Ref
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2117–2125.Google ScholarCross Ref
Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018. Path aggregation network for instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8759–8768.Google ScholarCross Ref
Arsalan Mousavian, Dragomir Anguelov, John Flynn, and Jana Kosecka. 2017. 3d bounding box estimation using deep learning and geometry. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 7074–7082.Google ScholarCross Ref
Mrunalini Nalamati, Ankit Kapoor, Muhammed Saqib, Nabin Sharma, and Michael Blumenstein. 2019. Drone Detection in Long-Range Surveillance Videos. In 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). 1–6. https://doi.org/10.1109/AVSS.2019.8909830Google ScholarCross Ref
Gerhard Neuhold, Tobias Ollmann, Samuel Rota Bulo, and Peter Kontschieder. 2017. The mapillary vistas dataset for semantic understanding of street scenes. In Proceedings of the IEEE international conference on computer vision. 4990–4999.Google ScholarCross Ref
Felix Nobis, Ehsan Shafiei, Phillip Karle, Johannes Betz, and Markus Lienkamp. 2021. Radar voxel fusion for 3D object detection. Applied Sciences 11, 12 (2021), 5598.Google ScholarCross Ref
Adrian P. Pope, Jaime S. Ide, Daria Micovic, Henry Díaz, David Rosenbluth, Lee Ritholtz, Jason C. Twedt, Thayne T. Walker, Kevin Alcedo, and Daniel Javorsek. 2021. Hierarchical Reinforcement Learning for Air-to-Air Combat. CoRR abs/2105.00990 (2021). arXiv:2105.00990https://arxiv.org/abs/2105.00990Google Scholar
Arne Schumann, Lars Sommer, Johannes Klatte, Tobias Schuchert, and Jürgen Beyerer. 2017. Deep cross-domain flying obrazilject classification for robrazilust UAV detection. In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). 1–6. https://doi.org/10.1109/AVSS.2017.8078558Google ScholarCross Ref
Lars Sommer, Arne Schumann, Thomas Müller, Tobrazilias Schuchert, and Jürgen Beyerer. 2017. Flying object detection for automatic UAV recognition. In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). 1–6. https://doi.org/10.1109/AVSS.2017.8078557Google ScholarCross Ref
Nian Wang, Zhe Zhang, Jing Xiao, and Li Cui. 2019. DeepLap: A deep learning based non-specific low back pain symptomatic muscles recognition system. In 2019 16th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON). IEEE, 1–9.Google ScholarDigital Library
Di Wu, Zhaoyong Zhuang, Canqun Xiang, Wenbin Zou, and Xia Li. 2019. 6d-vnet: End-to-end 6-dof vehicle pose estimation from monocular rgb images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0–0.Google ScholarCross Ref
Jie Xu, Qing Guo, Lei Xiao, Zhaoyi Li, and Gaowei Zhang. 2019. Autonomous Decision-Making Method for Combat Mission of UAV based on Deep Reinforcement Learning. In 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Vol. 1. 538–544. https://doi.org/10.1109/IAEAC47372.2019.8998066Google ScholarCross Ref
Jaewoong Yoo, Hyunki Seong, David Hyunchul Shim, Jung Ho Bae, and Yong-Duk Kim. 2022. Deep Reinforcement Learning-based Intelligent Agent for Autonomous Air Combat. In 2022 IEEE/AIAA 41st Digital Avionics Systems Conference (DASC). 1–9. https://doi.org/10.1109/DASC55683.2022.9925811Google ScholarCross Ref

Index Terms

Monocular 3D Pose Estimation of Very Small Airplane in the Air
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Computer graphics
    1. Animation
    2. Shape modeling

Index terms have been assigned to the content through auto-classification.

Recommendations

Unsupervised universal hierarchical multi-person 3D pose estimation for natural scenes
Abstract
Multi-person 3D pose estimation using a monocular freely moving camera in real-world scenarios remains a challenge. There is a lack of data with 3D ground truth, and real-world scenes usually contain self-occlusions and inter-person occlusions. To ...
Read More
3D motion estimation of human body from video with dynamic camera work
MPRSS'12: Proceedings of the First international conference on Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction

Occlusion or camera setting produces a high degree of ambiguity when estimating human body motion from monocular video sequences. Good human motion models are an important means of addressing this problem. In this work, we propose a hierarchical motion ...
Read More
Terminal phase vision-based target recognition and 3d pose estimation for a tail-sitter, vertical takeoff and landing unmanned air vehicle
PSIVT'06: Proceedings of the First Pacific Rim conference on Advances in Image and Video Technology

This paper presents an approach to accurately identify landing targets and obtain 3D pose estimates for vertical takeoff and landing unmanned air vehicles via computer vision methods. The objective of this paper is to detect and recognize a pre-known ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia
December 2023
745 pages
ISBN:9798400702051
DOI:10.1145/3595916
Editors:
Wen-Huang Cheng,
Wei-Ta Chu,
Min-Chun Hu,
Jiaying Liu,
Munchurl Kim,
Wei Zhang
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 January 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
3D pose estimation
Dataset
Monocular camera
Small airplane
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate59of204submissions,29%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 55
  Total Downloads
- Downloads (Last 12 months)55
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Monocular 3D Pose Estimation of Very Small Airplane in the Air

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Unsupervised universal hierarchical multi-person 3D pose estimation for natural scenes

3D motion estimation of human body from video with dynamic camera work

Terminal phase vision-based target recognition and 3d pose estimation for a tail-sitter, vertical takeoff and landing unmanned air vehicle

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Monocular 3D Pose Estimation of Very Small Airplane in the Air

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Unsupervised universal hierarchical multi-person 3D pose estimation for natural scenes

3D motion estimation of human body from video with dynamic camera work

Terminal phase vision-based target recognition and 3d pose estimation for a tail-sitter, vertical takeoff and landing unmanned air vehicle

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media