Skip to main content
Log in

Improving Human Pose Estimation Based on Stacked Hourglass Network

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

The performance of multi-person pose estimation has been greatly improved due to the rapid development of deep learning. However, the problems of self-occlusion, mutual occlusion and complex background not only have not been effectively solved. In order to further effectively solve these problems, in this paper, we design a novel Global and Local Content-aware Feature Boosting Network (GLCFBNet) that includes Intra-layer Feature Residual-like Module (IFRM), Input Feature Aggregation Module (IFAM), Spatial and Channel Feature Hourglass Attention Module (SCFHAM). We propose a novel IFRM that can expand receptive field of each convolution layer through aggregation feature. The IRAM can fully extract the edge information of the input image,and effectively solve the problem of negative background impact. The SCFHAM can accurately determine the location of the occluded keypoints, judge the global information of the reasonable keypoints, and extract the effective features for joint node positioning from the redundant feature information. We evaluate the effectiveness of our proposed method on the MSCOCO keypoint detection dataset, the MPII Human Pose dataset and the CrowdPose dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability

The MPII Human Pose dataset was derived and analysed from the following public domain resources:http://human-pose.mpi-inf.mpg.de/. The MSCOCO keypoint detection dataset was derived and analysed from the following public domain resources:https://cocodataset.org/. The CrowdPose dataset was derived and analysed from the following public domain resources:https://github.com/Jeff-sjtu/CrowdPose

References

  1. Luo Y, Xu Z, Liu P, Du Y, Guo J-M (2018) Multi-person pose estimation via multi-layer fractal network and joints kinship pattern. IEEE Trans Image Process 28(1):142–155

    Article  MathSciNet  MATH  Google Scholar 

  2. Majd M, Safabakhsh R (2019) A motion-aware convLSTM network for action recognition. Appl Intell 49(7):2515–2521

    Article  Google Scholar 

  3. Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 466–481

  4. Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision, Springer, pp 483–499

  5. Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

  6. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision, Springer, pp 740–755

  7. Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2d human pose estimation: New benchmark and state of the art analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3686–3693

  8. Li J, Wang C, Zhu H, Mao Y, Fang H-S, Lu C (2019) Crowdpose: Efficient crowded scenes pose estimation and a new benchmark. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10863–10872

  9. Yang Y, Ramanan D (2011) Articulated pose estimation with flexible mixtures-of-parts. In: CVPR 2011, IEEE, pp 1385–1392

  10. Pishchulin L, Andriluka M, Gehler P, Schiele B (2013) Poselet conditioned pictorial structures. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 588–595

  11. Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems, vol 27

  12. Yang W, Ouyang W, Li H, Wang X (2016) End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3073–3082

  13. Song J, Wang L, Van Gool L, Hilliges O (2017) Thin-slicing network: a deep structured model for pose estimation in videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4220–4229

  14. Kong D, Chen Y, Ma H, Yan X, Xie X (2019) Adaptive graphical model network for 2d handpose estimation. arXiv preprint arXiv:1909.08205

  15. Kong D, Ma H, Chen Y, Xie X (2020) Rotation-invariant mixed graphical model network for 2d hand pose estimation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1546–1555

  16. Tang W, Yu P, Wu Y (2018) Deeply learned compositional models for human pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 190–206

  17. Chen Y, Shen C, Wei XS, Liu L, Yang J (2017) Adversarial posenet: a structure-aware convolutional network for human pose estimation. In: IEEE computer society

  18. Chou C-J, Chien J-T, Chen H-T (2018) Self adversarial training for human pose estimation. In: 2018 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). IEEE, pp 17–30

  19. Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X (2017) Multi- context attention for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1831–1840

  20. He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. In: International conference on computer vision

  21. Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4903–4911

  22. Fang H-S, Xie S, Tai Y-W, Lu C (2017) Rmpe: regional multi-person pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 2334–2343

  23. Huang S, Gong M, Tao D (2017) A coarse-fine network for keypoint localization. In: 2017 IEEE international conference on computer vision (ICCV)

  24. Yang W, Li S, Ouyang W, Li H, Wang X (2017) Learning feature pyramids for human pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 1281–1290

  25. Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7103–7112

  26. Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5693–5703

  27. Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler PV, Schiele B (2016) Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4929–4937

  28. Newell A, Huang Z, Deng J (2017) Associative embedding: end-to-end learning for joint detection and grouping. In: Advances in neural information processing systems, vol 30

  29. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  30. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, vol 25

  31. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  32. Wang R, Geng F, Wang X (2022) Mtpose: Human pose estimation with high-resolution multi-scale transformers. Neural Process Lett 54(5):3941–3964

    Article  Google Scholar 

  33. Juan Lyu, Sai Ho, Ling (2018) Using multi-level convolutional neural network for classification of lung nodules on CT images. In: Conference proceedings : annual international conference of the IEEE engineering in medicine and biology society. IEEE Engineering in Medicine and Biology Society. Annual Conference, vol 2018, pp 686–689

  34. Yu F, Wang D, Shelhamer E, Darrell T (2018) Deep layer aggregation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2403–2412

  35. Gao S-H, Cheng M-M, Zhao K, Zhang X-Y, Yang M-H, Torr P (2019) Res2net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell 43(2):652–662

    Article  Google Scholar 

  36. Ke L, Chang M-C, Qi H, Lyu S (2018) Multi-scale structure-aware network for human pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 713–728

  37. Zhao H, Kong X, He J, Qiao Y, Dong C (2020) Efficient image super-resolution using pixel attention. In: European conference on computer vision, Springer, pp 56–72

  38. Zhang J, Su Q, Tang B, Wang C, Li Y (2021) Dpsnet: Multitask learning using geometry reasoning for scene depth and semantics. IEEE Trans Neural Netw Learn Syst. 1–12. https://doi.org/10.1109/TNNLS.2021.3107362

  39. Zhang J, Su Q, Wang C, Gu H (2020) Monocular 3d vehicle detection with multi-instance depth and geometry reasoning for autonomous driving. Neurocomputing 403:182–192

    Article  Google Scholar 

  40. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  41. Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3146–3154

  42. Wang Q, Wu B, Zhu P, Li P, Hu Q (2020) Eca-net: Efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  43. Roy Abhijit, Guha Navab, Wachinger Nassir, Christian (2019) Recalibrating fully convolutional networks with spatial and channel squeeze and excitation blocks. IEEE Trans Med Imaging 38(2):540–549

    Article  Google Scholar 

  44. Hu Y, Li J, Huang Y, Gao X (2019) Channel-wise and spatial feature modulation network for single image super-resolution. IEEE Trans Circ Syst Vid Technol 30(11):3911–3927

    Article  Google Scholar 

  45. Wang X, Tong J, Wang R (2021) Attention refined network for human pose estimation. Neural Process Lett 53(4):2853–2872

    Article  Google Scholar 

  46. Su K, Yu D, Xu Z, Geng X, Wang C (2019) Multi-person pose estimation with enhanced channel-wise and spatial information. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  47. Kingma D, Ba J (2014) Adam: A method for stochastic optimization. Computer Science

  48. Luvizon DC, Tabia H, Picard D (2019) Human pose regression by combining indirect part detection and contextual information. Compute Graph 85:15–22

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaojun Bi.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zou, X., Bi, X. & Yu, C. Improving Human Pose Estimation Based on Stacked Hourglass Network. Neural Process Lett 55, 9521–9544 (2023). https://doi.org/10.1007/s11063-023-11212-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-023-11212-5

Keywords

Navigation