Improving Human Pose Estimation Based on Stacked Hourglass Network

Zou, Xuelian; Bi, Xiaojun; Yu, Changdong

doi:10.1007/s11063-023-11212-5

Improving Human Pose Estimation Based on Stacked Hourglass Network

Published: 21 March 2023

Volume 55, pages 9521–9544, (2023)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

274 Accesses
1 Altmetric
Explore all metrics

Abstract

The performance of multi-person pose estimation has been greatly improved due to the rapid development of deep learning. However, the problems of self-occlusion, mutual occlusion and complex background not only have not been effectively solved. In order to further effectively solve these problems, in this paper, we design a novel Global and Local Content-aware Feature Boosting Network (GLCFBNet) that includes Intra-layer Feature Residual-like Module (IFRM), Input Feature Aggregation Module (IFAM), Spatial and Channel Feature Hourglass Attention Module (SCFHAM). We propose a novel IFRM that can expand receptive field of each convolution layer through aggregation feature. The IRAM can fully extract the edge information of the input image,and effectively solve the problem of negative background impact. The SCFHAM can accurately determine the location of the occluded keypoints, judge the global information of the reasonable keypoints, and extract the effective features for joint node positioning from the redundant feature information. We evaluate the effectiveness of our proposed method on the MSCOCO keypoint detection dataset, the MPII Human Pose dataset and the CrowdPose dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SP-YOLO: an end-to-end lightweight network for real-time human pose estimation

Article 18 October 2023

Combining detailed appearance and multi-scale representation: a structure-context complementary network for human pose estimation

Article 19 July 2022

CFENet: Content-aware feature enhancement network for multi-person pose estimation

Article 26 April 2021

Data Availability

The MPII Human Pose dataset was derived and analysed from the following public domain resources:http://human-pose.mpi-inf.mpg.de/. The MSCOCO keypoint detection dataset was derived and analysed from the following public domain resources:https://cocodataset.org/. The CrowdPose dataset was derived and analysed from the following public domain resources:https://github.com/Jeff-sjtu/CrowdPose

References

Luo Y, Xu Z, Liu P, Du Y, Guo J-M (2018) Multi-person pose estimation via multi-layer fractal network and joints kinship pattern. IEEE Trans Image Process 28(1):142–155
Article MathSciNet MATH Google Scholar
Majd M, Safabakhsh R (2019) A motion-aware convLSTM network for action recognition. Appl Intell 49(7):2515–2521
Article Google Scholar
Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 466–481
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision, Springer, pp 483–499
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision, Springer, pp 740–755
Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2d human pose estimation: New benchmark and state of the art analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3686–3693
Li J, Wang C, Zhu H, Mao Y, Fang H-S, Lu C (2019) Crowdpose: Efficient crowded scenes pose estimation and a new benchmark. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10863–10872
Yang Y, Ramanan D (2011) Articulated pose estimation with flexible mixtures-of-parts. In: CVPR 2011, IEEE, pp 1385–1392
Pishchulin L, Andriluka M, Gehler P, Schiele B (2013) Poselet conditioned pictorial structures. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 588–595
Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems, vol 27
Yang W, Ouyang W, Li H, Wang X (2016) End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3073–3082
Song J, Wang L, Van Gool L, Hilliges O (2017) Thin-slicing network: a deep structured model for pose estimation in videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4220–4229
Kong D, Chen Y, Ma H, Yan X, Xie X (2019) Adaptive graphical model network for 2d handpose estimation. arXiv preprint arXiv:1909.08205
Kong D, Ma H, Chen Y, Xie X (2020) Rotation-invariant mixed graphical model network for 2d hand pose estimation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1546–1555
Tang W, Yu P, Wu Y (2018) Deeply learned compositional models for human pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 190–206
Chen Y, Shen C, Wei XS, Liu L, Yang J (2017) Adversarial posenet: a structure-aware convolutional network for human pose estimation. In: IEEE computer society
Chou C-J, Chien J-T, Chen H-T (2018) Self adversarial training for human pose estimation. In: 2018 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). IEEE, pp 17–30
Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X (2017) Multi- context attention for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1831–1840
He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. In: International conference on computer vision
Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4903–4911
Fang H-S, Xie S, Tai Y-W, Lu C (2017) Rmpe: regional multi-person pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 2334–2343
Huang S, Gong M, Tao D (2017) A coarse-fine network for keypoint localization. In: 2017 IEEE international conference on computer vision (ICCV)
Yang W, Li S, Ouyang W, Li H, Wang X (2017) Learning feature pyramids for human pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 1281–1290
Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7103–7112
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5693–5703
Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler PV, Schiele B (2016) Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4929–4937
Newell A, Huang Z, Deng J (2017) Associative embedding: end-to-end learning for joint detection and grouping. In: Advances in neural information processing systems, vol 30
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, vol 25
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Wang R, Geng F, Wang X (2022) Mtpose: Human pose estimation with high-resolution multi-scale transformers. Neural Process Lett 54(5):3941–3964
Article Google Scholar
Juan Lyu, Sai Ho, Ling (2018) Using multi-level convolutional neural network for classification of lung nodules on CT images. In: Conference proceedings : annual international conference of the IEEE engineering in medicine and biology society. IEEE Engineering in Medicine and Biology Society. Annual Conference, vol 2018, pp 686–689
Yu F, Wang D, Shelhamer E, Darrell T (2018) Deep layer aggregation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2403–2412
Gao S-H, Cheng M-M, Zhao K, Zhang X-Y, Yang M-H, Torr P (2019) Res2net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell 43(2):652–662
Article Google Scholar
Ke L, Chang M-C, Qi H, Lyu S (2018) Multi-scale structure-aware network for human pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 713–728
Zhao H, Kong X, He J, Qiao Y, Dong C (2020) Efficient image super-resolution using pixel attention. In: European conference on computer vision, Springer, pp 56–72
Zhang J, Su Q, Tang B, Wang C, Li Y (2021) Dpsnet: Multitask learning using geometry reasoning for scene depth and semantics. IEEE Trans Neural Netw Learn Syst. 1–12. https://doi.org/10.1109/TNNLS.2021.3107362
Zhang J, Su Q, Wang C, Gu H (2020) Monocular 3d vehicle detection with multi-instance depth and geometry reasoning for autonomous driving. Neurocomputing 403:182–192
Article Google Scholar
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3146–3154
Wang Q, Wu B, Zhu P, Li P, Hu Q (2020) Eca-net: Efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Roy Abhijit, Guha Navab, Wachinger Nassir, Christian (2019) Recalibrating fully convolutional networks with spatial and channel squeeze and excitation blocks. IEEE Trans Med Imaging 38(2):540–549
Article Google Scholar
Hu Y, Li J, Huang Y, Gao X (2019) Channel-wise and spatial feature modulation network for single image super-resolution. IEEE Trans Circ Syst Vid Technol 30(11):3911–3927
Article Google Scholar
Wang X, Tong J, Wang R (2021) Attention refined network for human pose estimation. Neural Process Lett 53(4):2853–2872
Article Google Scholar
Su K, Yu D, Xu Z, Geng X, Wang C (2019) Multi-person pose estimation with enhanced channel-wise and spatial information. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Kingma D, Ba J (2014) Adam: A method for stochastic optimization. Computer Science
Luvizon DC, Tabia H, Picard D (2019) Human pose regression by combining indirect part detection and contextual information. Compute Graph 85:15–22
Article Google Scholar

Download references

Author information

Xuelian Zou, Xiaojun Bi and Changdong Yu have contributed equally to this study.

Authors and Affiliations

The College of Information and Communication Engineering, Harbin Engineering University, Nantong, Harbin, 150001, Heilongjiang, China
Xuelian Zou & Changdong Yu
The School of Information Engineering, Minzu University of China, Zhongguancun South, Beijing, 100081, China
Xiaojun Bi

Authors

Xuelian Zou
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojun Bi
View author publications
You can also search for this author in PubMed Google Scholar
Changdong Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaojun Bi.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zou, X., Bi, X. & Yu, C. Improving Human Pose Estimation Based on Stacked Hourglass Network. Neural Process Lett 55, 9521–9544 (2023). https://doi.org/10.1007/s11063-023-11212-5

Download citation

Accepted: 24 February 2023
Published: 21 March 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11063-023-11212-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving Human Pose Estimation Based on Stacked Hourglass Network

Abstract

Access this article

Similar content being viewed by others

SP-YOLO: an end-to-end lightweight network for real-time human pose estimation

Combining detailed appearance and multi-scale representation: a structure-context complementary network for human pose estimation

CFENet: Content-aware feature enhancement network for multi-person pose estimation

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving Human Pose Estimation Based on Stacked Hourglass Network

Abstract

Access this article

Similar content being viewed by others

SP-YOLO: an end-to-end lightweight network for real-time human pose estimation

Combining detailed appearance and multi-scale representation: a structure-context complementary network for human pose estimation

CFENet: Content-aware feature enhancement network for multi-person pose estimation

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation