Skip to main content
Log in

Human pose estimation model based on DiracNets and integral pose regression

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Human pose estimation has achieved great progress in recent years. However, many methods use max pooling, average pooling, or simple downsampling in the form of stepped convolution on the feature map to increase the feature receptive field of the network, which will lead to the loss of original feature information and quantization errors. In order to solve above problems, we propose the RM-IPR-DDHPE model. Specifically, we firstly propose the DDHPE model which uses Mask R-CNN as the backbone network. In this model, we replace the residual module with an improved Dirac network module (DiracNets) to adaptively learn deeper features. Besides, we adopt the detail-preserving pooling (DPP) method which can amplify the spatial changes to solve the problem of key details loss in traditional pooling methods. On the basis of the above improvements, a RM-IPR-DDHPE model based on Ranger optimizer, Mish activation function and integral attitude regression is constructed, which can avoid quantization errors, optimize the gradient propagation and structure of the network. We validate the classification ability of DDHPE on the CIFAR dataset and the performance of the RM-IPR-DDHPE model for predicting human keypoints on the MSCOCO2014 dataset and the MPII dataset. The results of DDHPE on CIFAR-10 and CIFAR-100 are 95.27 and 77.51 respectively. The AP, AP50, AP75, APM, APL of RM-IPR-DDHPE on MSCOCO2014 are 78.0, 93.9, 85.4, 74.3, 84.9. And the average accuracy mAP of all key points on the MPII is 94.1. The results show that DDHPE has a good feature extraction ability, and the RM-IPR-DDHPE model improves the prediction accuracy while solving the quantization error of the DDHPE network joint point estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

  1. Andriluka M, Pishchulin L, Gehler P, Schiele B (n.d.) 2D Human Pose Estimation: New Benchmark and State of the Art Analysis. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. June 2014:3686-3693. https://doi.org/10.1109/CVPR.2014.471

  2. Bin Y, Chen ZM, Wei XS, Chen X, Gao C, Sang N (2020) Structure-aware human pose estimation with graph convolutional networks. Pattern Recogn 106:107410. https://doi.org/10.1016/j.patcog.2020.107410

    Article  Google Scholar 

  3. Cao Z, Simon T, Wei SE, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7291-7299). https://doi.org/10.1109/cvpr.2017.143

  4. Cao D, Liu W, Xing W, Wei X (2022) Human pose estimation based on feature enhancement and multi-scale feature fusion. SIViP:1–8. https://doi.org/10.1007/s11760-022-02271-7

  5. Carreira J, Agrawal P, Fragkiadaki K, Malik J (2016) Human Pose Estimation with Iterative Error Feedback. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference On, 4733-4742. https://doi.org/10.1109/CVPR.2016.512

  6. Chen X, Yuille A (2014) Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations. https://doi.org/10.48550/arXiv.1407.3399

  7. Chen Y, Shen C, Wei X-S, Liu L, Yang J (2017) Adversarial PoseNet: A Structure-aware Convolutional Network for Human Pose Estimation. https://doi.org/10.48550/arXiv.1705.00389

  8. Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018). Cascaded pyramid network for multi-person pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7103-7112. https://doi.org/10.48550/arXiv.1711.07319

  9. Chen J, Lei B, Song Q, Ying H, Chen DZ, Wu J (2020) A hierarchical graph network for 3d object detection on point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 392-401. https://doi.org/10.1109/CVPR42600.2020.00047

  10. Cheng B, Xiao B, Wang J, Shi H, Huang TS, Zhang L (2020) HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Computer Vision and Pattern Recognition (CVPR), 2020 IEEE/CVF Conference on, CVPR, 5385-5394. https://doi.org/10.1109/CVPR42600.2020.00543

  11. Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X (2017) Multi-Context Attention for Human Pose Estimation. https://doi.org/10.48550/arXiv.1702.07432

  12. Clevert DA, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289. https://doi.org/10.48550/arXiv.1511.07289

  13. Fang H-S, Xie S, Tai Y-W, Lu C (2016) RMPE: Regional Multi-person Pose Estimation. https://doi.org/10.48550/arXiv.1612.00137

  14. Felzenszwalb PF, Huttenlocher DP (2005) Pictorial Structures for Object Recognition. Int J Comput Vis 61:55–79. https://doi.org/10.1023/B:VISI.0000042934.15159.49

    Article  Google Scholar 

  15. He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference On, 770-778. https://doi.org/10.1109/CVPR.2016.90

  16. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. pp. 2961-2969. https://doi.org/10.48550/arXiv.1703.06870

  17. He K, Gkioxari G, Dollar P, Girshick R (n.d.) Mask R-CNN. Proceedings of the IEEE International Conference on Computer Vision, 2017-October, 2980-2988. https://doi.org/10.1109/ICCV.2017.322

  18. Hu J, Shen L, Sun G (2018). Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. (pp. 7132-7141). https://doi.org/10.48550/arXiv.1709.01507

  19. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4700-4708. https://doi.org/10.48550/arXiv.1608.06993

  20. Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (n.d.) Deepercut: A deeper, stronger, and faster multi-person pose estimation model: Vol. 9910 LNCS. Springer Verlag. https://doi.org/10.1007/978-3-319-46466-4_3

  21. Kreiss S, Bertoni L, Alahi A (2019) PifPaf: Composite Fields for Human Pose Estimation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Computer Vision and Pattern Recognition (CVPR), 2019 IEEE/CVF Conference On, 11969-11978. https://doi.org/10.1109/CVPR.2019.01225

  22. Kumar M, Jindal MK, Kumar M (2022) Distortion, rotation and scale invariant recognition of hollow Hindi characters. Sādhanā 47(2):1–6. https://doi.org/10.1007/s12046-022-01847-w

    Article  Google Scholar 

  23. Li W, Wang Z, Yin B, Peng Q, Du Y, Xiao T, ..., Sun J (2019) Rethinking on multi-stage networks for human pose estimation. arXiv preprint arXiv:1901.00148. https://doi.org/10.48550/arXiv.1901.00148

  24. Li R, Huang H, Zheng Y (2022) Human Pose Estimation Based on Lite HRNet with Coordinate Attention. 2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP), Intelligent Computing and Signal Processing (ICSP), 2022 7th International Conference On, 1166-1170. https://doi.org/10.1109/ICSP54964.2022.9778346

  25. Lin M, Chen Q, Yan S (2013) Network in network. arXiv preprint arXiv:1312.4400. https://doi.org/10.48550/arXiv.1312.4400

  26. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431-3440). https://doi.org/10.1109/TPAMI.2016.2572683

  27. Luvizon DC, Tabia H, Picard D (2017) Human Pose Regression by Combining Indirect Part Detection and Contextual Information. https://doi.org/10.1016/j.cag.2019.09.002

  28. Marusov A, Kaprielova M, Neychev R (2022) Enhancing Human Pose Estimation with Privileged Learning. 2022 31st Conference of Open Innovations Association (FRUCT), Open Innovations Association (FRUCT), 2022 31st Conference Of, 174-180. https://doi.org/10.23919/FRUCT54823.2022.9770903

  29. Misra D (2019) Mish: A self regularized non-monotonic neural activation function. arXiv preprint arXiv:1908.08681, 4(2), 10-48550. https://doi.org/10.48550/arXiv.1908.0868

  30. Newell A, Yang K, Deng J (2016) Stacked Hourglass Networks for Human Pose Estimation. https://doi.org/10.1007/978-3-319-46484-8_29

  31. Newell A, Huang Z, Deng J (2017) Associative embedding: End-to-end learning for joint detection and grouping. Adv Neural Inf Proces Syst 30. https://doi.org/10.48550/arXiv.1611.05424

  32. Ou Z, Luo Y, Chen J, Chen G (2022) SRFNet: selective receptive field network for human pose estimation. J Supercomput 78(1):691–711. https://doi.org/10.1007/s11042-017-5537-5

    Article  Google Scholar 

  33. Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4903-4911. https://doi.org/10.48550/arXiv.1701.01779

  34. Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler P, Schiele B (2016) DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference On, 4929-4937. https://doi.org/10.1109/CVPR.2016.533

  35. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Proces Syst 28:1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  36. Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550. https://doi.org/10.48550/arXiv.1412.6550

  37. Saeedan F, Weber N, Goesele M, Roth S (2018). Detail-Preserving Pooling in Deep Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Computer Vision and Pattern Recognition (CVPR), 2018 IEEE/CVF Conference on, CVPR, 9108–9116. https://doi.org/10.1109/CVPR.2018.00949

  38. Sarafianos N, Boteanu B, Ionescu B, Kakadiaris IA (2016) 3D Human pose estimation: A review of the literature and analysis of covariates. Comput Vis Image Underst 152:1–20. https://doi.org/10.1016/j.cviu.2016.09.002

    Article  Google Scholar 

  39. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. https://doi.org/10.48550/arXiv.1409.1556

  40. Sun X, Shang J, Liang S, Wei Y (2017) Compositional Human Pose Regression. https://doi.org/10.48550/arXiv.1704.00159. Accessed 18 July 2022

  41. Sun X, Xiao B, Wei F, Liang S, Wei Y (2018) Integral human pose regression. In Proceedings of the European conference on computer vision (ECCV) (pp. 529-545). https://doi.org/10.48550/arXiv.1711.08229

  42. Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (pp. 5693-5703). https://doi.org/10.1109/CVPR.2019.00584

  43. Tong Q, Liang G, Bi J (2022) Calibrating the adaptive learning rate to improve convergence of ADAM. Neurocomputing 481:333–356. https://doi.org/10.1016/j.neucom.2022.01.014

    Article  Google Scholar 

  44. Toshev A, Szegedy C (2014) DeepPose: Human Pose Estimation via Deep Neural Networks. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference On, 1653-1660. https://doi.org/10.1109/CVPR.2014.214

  45. Wang M, Tighe J, Modolo D (2020) Combining Detection and Tracking for Human Pose Estimation in Videos. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Computer Vision and Pattern Recognition (CVPR), 2020 IEEE/CVF Conference on, CVPR, 11085-11093. https://doi.org/10.1109/CVPR42600.2020.01110

  46. Wang Z, Lu Y, Ni W, Song L (2021) An RGB-D Based Approach for Human Pose Estimation. 2021 International Conference on Networking Systems of AI (INSAI), Networking Systems of AI (INSAI), 2021 International Conference on, INSAI, 166-170. https://doi.org/10.1109/INSAI54028.2021.00039

  47. Wang W, Zhang K, Ren H, Wei D, Gao Y, Liu J (2022) UULPN: An ultra-lightweight network for human pose estimation based on unbiased data processing. Neurocomputing 480:220–233. https://doi.org/10.1016/j.neucom.2021.12.083

    Article  Google Scholar 

  48. Wei S-E, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional Pose Machines. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference On, 4724-4732. https://doi.org/10.1109/CVPR.2016.511

  49. Wu X, Chen Z, Liu H (2021) Learning Delicate Pixel-Level Representations for Bottom-Up Human Pose Estimation. 2021 IEEE International Conference on Engineering, Technology & Education (TALE), Engineering, Technology & Education (TALE), 2021 IEEE International Conference On, 01-06. https://doi.org/10.1109/TALE52509.2021.9678618

  50. Xiang X, Zong W, Li G (2022) Learnable Upsampling-Based Point Cloud Semantic Segmentation. 2022 7th International Conference on Image, Vision and Computing (ICIVC), Image, Vision and Computing (ICIVC), 2022 7th International Conference On, 340-347. https://doi.org/10.1109/ICIVC55077.2022.9886287

  51. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. (pp. 1492-1500). https://doi.org/10.48550/arXiv.1611.05431

  52. Xiong S, Qu Z, Wang Y, Wang X, Xia H (2021) MLP-Pose: Human Pose Estimation by MLP-Mixer. 2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS), Cloud Computing and Intelligent Systems (CCIS), 2021 IEEE 7th International Conference On, 183-187. https://doi.org/10.1109/CCIS53392.2021.9754658

  53. Zagoruyko S, Komodakis N (2017) DiracNets: Training Very Deep Neural Networks Without Skip-Connections[OL]. https://doi.org/10.48550/arXiv.1706.00388

Download references

Acknowledgements

We thank anonymous reviewers for valuable suggestions.

Funding

This work is supported by the National Natural Science Foundation of China (No. 61976217), the the Fundamental Research Funds for the Central Universities(No. 2019XKQYMS87), and the Science and Technology Planning Project of Xuzhou (No.KC21193).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinzheng Xu.

Ethics declarations

Conflict of Interest

Authors have no conflict of Interest to declare.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, X., Guo, Y. & Wang, X. Human pose estimation model based on DiracNets and integral pose regression. Multimed Tools Appl 82, 36019–36039 (2023). https://doi.org/10.1007/s11042-023-15057-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15057-x

Keywords

Navigation