Skip to main content
Log in

Skeleton-based action recognition with temporal action graph and temporal adaptive graph convolution structure

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Skeleton-based action recognition has recently achieved much attention since they can robustly convey the action information. Recently, many studies have shown that graph convolutional networks (GCNs), which generalize CNNs to more generic non-Euclidean structures, are more exactly extracts spatial feature. Nevertheless, how to effectively extract global temporal features is still a challenge. In this work, firstly, a unique feature named temporal action graph is designed. It first attempts to express timing relationship with the form of graph. Secondly, temporal adaptive graph convolution structure (T-AGCN) are proposed. Through generating global adjacency matrix for temporal action graph, it can flexibly extract global temporal features in temporal dynamics. Thirdly, we further propose a novel model named spatial-temporal adaptive graph convolutional network (ST-AGCN) for skeletons-based action recognition to extract spatial-temporal feature and improve action recognition accuracy. ST-AGCN combines T-AGCN with spatial graph convolution to make up for the shortage of T-AGCN for spatial structure. Besides, ST-AGCN uses dual features to form a two-stream network which is able to further improve action recognition accuracy for hard-to-recognition sample. Finally, comparsive experiments on the two skeleton-based action recognition datasets, NTU-RGBD and SBU, demonstrate that T-AGCN and temporal action graph can effective explore global temporal information and ST-AGCN achieves certain improvement of recognition accuracy on both datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Alex G (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  2. Amir S, Liu J, Ng T, Wang G (2016) NTU RGB+D: a large scale dataset for 3d human activity analysis. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 1010–1019

  3. Cao Z, Simon T, Wei S, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 1302–1310

  4. Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the kinetics dataset. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 4724–4733

  5. Chen Y, Wang L, Li C, Hou Y, Li W (2020) ConvNets-based action recognition from skeleton motion maps. Multimed Tools Appl 79:1707–1725

    Article  Google Scholar 

  6. Daniel W, Remi R, Edmond B (2011) A survey of vision-based methods for action representation, segmentation and recognition. Computer Vision & Image Understanding 115(2):224–241

    Article  Google Scholar 

  7. Diao Z, Wang X, Zhang D, Liu Y, Kun K, He S (2019) Dynamic spatial-temporal graph convolutional neural networks for traffic forecasting. In: AAAI conference on artificial intelligence (AAAI), Honolulu, Hawaii, p 890–897

  8. Ding W, Liu K, Fu X, Cheng F (2016) Profile hmms for skeleton-based human action recognition. Image Communication 42(C):109–119

    Google Scholar 

  9. Dong X, Thanou D, Rabbat M, Frossard P (2019) Learning graphs from data: a signal representation perspective. IEEE Signal Process Mag 36(3):44–63

    Article  Google Scholar 

  10. Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. IEEE Conference on computer vision and pattern recognition (CVPR) 9:1110–1118

    Google Scholar 

  11. Feng J, Zhang S, Xiao J (2019) Explorations of skeleton features for LSTM-based action recognition. Multimed Tools Appl 78:591–603

    Article  Google Scholar 

  12. G-H G, Kim TK (2017) Transition forests: learning discriminative temporal transitions for action recognition and detection. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 407–415

  13. Johansson G (1973) Visual perception of biological motion and a model for its analysis. Percept Psychophys 14:201–211

    Article  Google Scholar 

  14. Karen S, Andrew Z (2014) Two-stream convolutional networks for action recognition in videos. In: International conference on neural information processing systems (NIPS), Montreal, QC, Canada, p 568–576

  15. Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Viola F, Green T, Back T, Natsev P, Suleyman M, Zisserman A (2017) The kinetics human action video dataset arXiv preprint arXiv: 1705.06950

  16. Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3d action recognition. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 4570–4579

  17. Lee I, Kim D, Kang S, Lee S (2017) Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks. In: IEEE international conference on computer vision (ICCV), IEEE Computer Society, Venice, Italy, p 1012–1020

  18. Li C, Zhong Q, Xie D, Pu S (2017) Skeleton-based action recognition with convolutional neural networks. In: IEEE international conference on multimedia & expo workshops (ICMEW), IEEE Computer Society, p 583–587

  19. Li B, He M, Dai Y, Cheng X, Chen Y (2018) 3D skeleton based action recognition by video-domain translation-scale invariant mapping and multi-scale dilated CNN. Multimed Tools Appl 77:22901–22921

    Article  Google Scholar 

  20. Li C, Zhong Q, Xie D, Pu S (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In: twenty-seventh international joint conference on artificial intelligence (IJCAI), p 786-792

  21. Li J, Xie X, Pan Q (2020) Shi G (2020) SGM-net: skeleton-guided multimodal network for action recognition. Pattern Recogn 104:107356

    Article  Google Scholar 

  22. Lie W, Le A, Lin G (2018) Human fall-down event detection based on 2D skeletons and deep learning approach. In: 2018 international workshop on advanced image technology (IWAIT). IEEE Computer Society, pp 1–4

  23. Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision (ECCV). Springer, pp 816–833

  24. Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn 68:346–362

    Article  Google Scholar 

  25. Liu H, Tu J, Liu M (2017) Two-stream 3d convolutional neural network for skeleton-based action recognition. arXiv preprint arXiv: 1705.08106

  26. Liu H, Tu J, Liu M (2017) Two-stream 3d convolutional neural network for skeleton-based action recognition. arXiv preprint arXiv: 1705.08106

  27. Mihai T, Mihai N, Adina M (2019) Spatia-temporal features in action recognition using 3d skeletal joints. Sensors 19(2):423–442

    Article  Google Scholar 

  28. Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in PyTorch. In: NIPS Workshop Autodiff Decision Program Chairs

  29. Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990

    Article  Google Scholar 

  30. Kim T. S, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW), IEEE Computer Society, p 1623–1631

  31. Salah A, Lepri B (2011) Second international workshop on human behavior understanding: inducing behavioral change. In: International conference on ambient intelligence (AmISEmeH). Springer-Verlag, pp 376–377

  32. Shi L, Zhang Y, Cheng J, Lu H (2019) Skeleton-based action recognition with directed graph neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR), IEEE Computer Society, Long Beach, CA, USA, p 7904–7913

  33. Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 12018–12027

  34. Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 1227–1236

  35. Song S, Lan C, Xing J, Zeng W, Liu J (2018) Spatio-temporal attention-based LSTM networks for 3d action recognition and detection. IEEE Trans Image Process 27(7):3459–3471

    Article  MathSciNet  Google Scholar 

  36. Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 5686–5696

  37. Tang Y, Tian Y, Lu J, Li P, Zhou J (2018) Deep progressive reinforcement learning for skeleton-based action recognition. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 5323–5332

  38. Varol G, Laptev I, Schmid C (2018) Long-term temporal convolutions for action recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1510–1517

    Article  Google Scholar 

  39. Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Gool L (2016) Temporal segment networks: towards good practices for deep action recognition. In: European conference on computer vision (ECCV). Springer, pp 20–36

  40. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 7794–7803

  41. Weng J, Liu M, Jiang X, Yuan J (2018) Deformable pose traversal convolution for 3d action and gesture recognition. In: European conference on computer vision (ECCV). Springer, pp 142–157

  42. Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI conference on artificial intelligence (AAAI), New Orleans, Louisiana, USA, p 7444–7452

  43. Yu G, Liu Z, Yuan J (2014) Discriminative order let mining for real-time recognition of human-object interaction. In: 2014 Asian conference on computer vision (ACCV), Singapore, 2014, p 50–65

  44. Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2019) View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans Pattern Anal Mach Intell 41(8):1963–1978

    Article  Google Scholar 

  45. Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2019) View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans Pattern Anal Mach Intell 41(8):1963–1978

    Article  Google Scholar 

  46. Zhang P, Lan C, Zeng W, Xing J, Xue J, Zheng N (2019) Semi-supervised classification with graph convolution networks arXiv preprint arXiv: 1904.01189

  47. Zhou J, Cui G, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2018) Graph neural networks: a review of methods and applications arXiv preprint arXiv: 1812.08434

  48. Zhu W, Lan C, Xing J, Zheng W, Li Y, Shen L (2016) Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: AAAI conference on artificial intelligence (AAAI). Phoenix, pp 3697–3603

Download references

Acknowledgments

This work is supported by National Natural Science Foundation of China (Grant No: 51375209), the Six Talent Peaks Project in Jiangsu Province(Grant No:ZBZZ-012), Excellent Science and Technology Innovation Team Fund Jiangsu Province(Grant No:2019SK07) and partially supported by the Postgraduate Research & Practice Innovation Program of Jiangsu Province (Grant No:KYCX20_1928 and KYCX20_0760).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Cao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, Y., Liu, C., Huang, Z. et al. Skeleton-based action recognition with temporal action graph and temporal adaptive graph convolution structure. Multimed Tools Appl 80, 29139–29162 (2021). https://doi.org/10.1007/s11042-021-11136-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11136-z

Keywords

Navigation