ABSTRACT
Two-dimensional human pose estimation is the basis of human behavior understanding, but predicting a reasonable three-dimensional human pose sequence is still a challenging problem. To solve this problem, a pose estimation model named DEFormer based on ViT (Vision Transformer) is proposed, which uses a coordinate representation of key points' distribution perception to reduce quantization errors, and combines the original encoding module with an efficient encoding module to construct a lighter two-stage model. Experimental results show that on the CrowdPose dataset and a self-constructed campus scene human motion dataset, the DEFormer lightweight pose estimation model achieves a maximum average accuracy of 85.9% for human pose estimation, demonstrating more accurate pose estimation performance.
- Toshev A, Szegedy C. Deeppose: Human pose estimation via deep neural networks[C]. Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 1653-1660.Google Scholar
- Sun K, Xiao B, Liu D, Deep high-resolution representation learning for human pose estimation[C]. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 5693-5703.Google Scholar
- Panteleris P, Argyros A. Pe-former: Pose estimation transformer[C]. Pattern Recognition and Artificial Intelligence: Third International Conference, ICPRAI 2022, Paris, France, June 1–3, 2022, Proceedings, Part II. Cham: Springer International Publishing, 2022: 3-14.Google Scholar
- Touvron H, Cord M, Douze M, Training data-efficient image transformers & distillation through attention[C]. International conference on machine learning. PMLR, 2021: 10347-10357.Google Scholar
- Ali A, Touvron H, Caron M, Xcit: Cross-covariance image transformers[J]. Advances in neural information processing systems, 2021, 34: 20014-20027.Google Scholar
- Wang W, Xie E, Li X, Pyramid vision transformer: A versatile backbone for dense prediction without convolutions[C]. Proceedings of the IEEE/CVF international conference on computer vision. 2021: 568-578.Google Scholar
- Zhang F, Zhu X, Dai H, Distribution-aware coordinate representation for human pose estimation[C]. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 7093-7102.Google Scholar
- Ding M, Xiao B, Codella N, Davit: Dual attention vision transformers[C]. Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIV. Cham: Springer Nature Switzerland, 2022: 74-92.Google Scholar
- Li J,Wang C ,Zhu H , CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark[J].2018.Google Scholar
Index Terms
- A 2D Human Pose Estimation Method Based On Visual Transformer
Recommendations
A survey of human pose estimation
Summarization of methods on human pose estimation in recent years.Conclusion of the traditional human pose estimation methods.Illustrated based on a two-stage framework.Comprehensive comparisons are given based on the open source methods. Estimating ...
Silhouette-Based 2D Human Pose Estimation
ICIG '09: Proceedings of the 2009 Fifth International Conference on Image and GraphicsIn this paper we present a novel silhouette-based method to estimate 2D human pose. It takes a pre-defined human skeleton model as the prior information and a video sequence as the data source, and estimates human pose in each frame by the following ...
2D Human Pose Estimation: New Benchmark and State of the Art Analysis
CVPR '14: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern RecognitionHuman pose estimation has made significant progress during the last years. However current datasets are limited in their coverage of the overall pose estimation challenges. Still these serve as the common sources to evaluate, train and compare different ...
Comments