Skeleton-based action recognition with temporal action graph and temporal adaptive graph convolution structure

Cao, Yi; Liu, Chen; Huang, Zilong; Sheng, Yongjian; Ju, Yongjian

doi:10.1007/s11042-021-11136-z

Skeleton-based action recognition with temporal action graph and temporal adaptive graph convolution structure

Published: 19 June 2021

Volume 80, pages 29139–29162, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yi Cao ORCID: orcid.org/0000-0003-3255-6575^1,2,
Chen Liu^1,2,
Zilong Huang^1,2,
Yongjian Sheng^1,2 &
…
Yongjian Ju^1,2

904 Accesses
12 Citations
1 Altmetric
Explore all metrics

Abstract

Skeleton-based action recognition has recently achieved much attention since they can robustly convey the action information. Recently, many studies have shown that graph convolutional networks (GCNs), which generalize CNNs to more generic non-Euclidean structures, are more exactly extracts spatial feature. Nevertheless, how to effectively extract global temporal features is still a challenge. In this work, firstly, a unique feature named temporal action graph is designed. It first attempts to express timing relationship with the form of graph. Secondly, temporal adaptive graph convolution structure (T-AGCN) are proposed. Through generating global adjacency matrix for temporal action graph, it can flexibly extract global temporal features in temporal dynamics. Thirdly, we further propose a novel model named spatial-temporal adaptive graph convolutional network (ST-AGCN) for skeletons-based action recognition to extract spatial-temporal feature and improve action recognition accuracy. ST-AGCN combines T-AGCN with spatial graph convolution to make up for the shortage of T-AGCN for spatial structure. Besides, ST-AGCN uses dual features to form a two-stream network which is able to further improve action recognition accuracy for hard-to-recognition sample. Finally, comparsive experiments on the two skeleton-based action recognition datasets, NTU-RGBD and SBU, demonstrate that T-AGCN and temporal action graph can effective explore global temporal information and ST-AGCN achieves certain improvement of recognition accuracy on both datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Fig. 6

CBAM: Convolutional Block Attention Module

Graph convolutional networks: a comprehensive review

Article Open access 10 November 2019

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

Article 12 August 2023

References

Alex G (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Amir S, Liu J, Ng T, Wang G (2016) NTU RGB+D: a large scale dataset for 3d human activity analysis. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 1010–1019
Cao Z, Simon T, Wei S, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 1302–1310
Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the kinetics dataset. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 4724–4733
Chen Y, Wang L, Li C, Hou Y, Li W (2020) ConvNets-based action recognition from skeleton motion maps. Multimed Tools Appl 79:1707–1725
Article Google Scholar
Daniel W, Remi R, Edmond B (2011) A survey of vision-based methods for action representation, segmentation and recognition. Computer Vision & Image Understanding 115(2):224–241
Article Google Scholar
Diao Z, Wang X, Zhang D, Liu Y, Kun K, He S (2019) Dynamic spatial-temporal graph convolutional neural networks for traffic forecasting. In: AAAI conference on artificial intelligence (AAAI), Honolulu, Hawaii, p 890–897
Ding W, Liu K, Fu X, Cheng F (2016) Profile hmms for skeleton-based human action recognition. Image Communication 42(C):109–119
Google Scholar
Dong X, Thanou D, Rabbat M, Frossard P (2019) Learning graphs from data: a signal representation perspective. IEEE Signal Process Mag 36(3):44–63
Article Google Scholar
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. IEEE Conference on computer vision and pattern recognition (CVPR) 9:1110–1118
Google Scholar
Feng J, Zhang S, Xiao J (2019) Explorations of skeleton features for LSTM-based action recognition. Multimed Tools Appl 78:591–603
Article Google Scholar
G-H G, Kim TK (2017) Transition forests: learning discriminative temporal transitions for action recognition and detection. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 407–415
Johansson G (1973) Visual perception of biological motion and a model for its analysis. Percept Psychophys 14:201–211
Article Google Scholar
Karen S, Andrew Z (2014) Two-stream convolutional networks for action recognition in videos. In: International conference on neural information processing systems (NIPS), Montreal, QC, Canada, p 568–576
Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Viola F, Green T, Back T, Natsev P, Suleyman M, Zisserman A (2017) The kinetics human action video dataset arXiv preprint arXiv: 1705.06950
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3d action recognition. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 4570–4579
Lee I, Kim D, Kang S, Lee S (2017) Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks. In: IEEE international conference on computer vision (ICCV), IEEE Computer Society, Venice, Italy, p 1012–1020
Li C, Zhong Q, Xie D, Pu S (2017) Skeleton-based action recognition with convolutional neural networks. In: IEEE international conference on multimedia & expo workshops (ICMEW), IEEE Computer Society, p 583–587
Li B, He M, Dai Y, Cheng X, Chen Y (2018) 3D skeleton based action recognition by video-domain translation-scale invariant mapping and multi-scale dilated CNN. Multimed Tools Appl 77:22901–22921
Article Google Scholar
Li C, Zhong Q, Xie D, Pu S (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In: twenty-seventh international joint conference on artificial intelligence (IJCAI), p 786-792
Li J, Xie X, Pan Q (2020) Shi G (2020) SGM-net: skeleton-guided multimodal network for action recognition. Pattern Recogn 104:107356
Article Google Scholar
Lie W, Le A, Lin G (2018) Human fall-down event detection based on 2D skeletons and deep learning approach. In: 2018 international workshop on advanced image technology (IWAIT). IEEE Computer Society, pp 1–4
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision (ECCV). Springer, pp 816–833
Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn 68:346–362
Article Google Scholar
Liu H, Tu J, Liu M (2017) Two-stream 3d convolutional neural network for skeleton-based action recognition. arXiv preprint arXiv: 1705.08106
Liu H, Tu J, Liu M (2017) Two-stream 3d convolutional neural network for skeleton-based action recognition. arXiv preprint arXiv: 1705.08106
Mihai T, Mihai N, Adina M (2019) Spatia-temporal features in action recognition using 3d skeletal joints. Sensors 19(2):423–442
Article Google Scholar
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in PyTorch. In: NIPS Workshop Autodiff Decision Program Chairs
Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990
Article Google Scholar
Kim T. S, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW), IEEE Computer Society, p 1623–1631
Salah A, Lepri B (2011) Second international workshop on human behavior understanding: inducing behavioral change. In: International conference on ambient intelligence (AmISEmeH). Springer-Verlag, pp 376–377
Shi L, Zhang Y, Cheng J, Lu H (2019) Skeleton-based action recognition with directed graph neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR), IEEE Computer Society, Long Beach, CA, USA, p 7904–7913
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 12018–12027
Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 1227–1236
Song S, Lan C, Xing J, Zeng W, Liu J (2018) Spatio-temporal attention-based LSTM networks for 3d action recognition and detection. IEEE Trans Image Process 27(7):3459–3471
Article MathSciNet Google Scholar
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 5686–5696
Tang Y, Tian Y, Lu J, Li P, Zhou J (2018) Deep progressive reinforcement learning for skeleton-based action recognition. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 5323–5332
Varol G, Laptev I, Schmid C (2018) Long-term temporal convolutions for action recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1510–1517
Article Google Scholar
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Gool L (2016) Temporal segment networks: towards good practices for deep action recognition. In: European conference on computer vision (ECCV). Springer, pp 20–36
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 7794–7803
Weng J, Liu M, Jiang X, Yuan J (2018) Deformable pose traversal convolution for 3d action and gesture recognition. In: European conference on computer vision (ECCV). Springer, pp 142–157
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI conference on artificial intelligence (AAAI), New Orleans, Louisiana, USA, p 7444–7452
Yu G, Liu Z, Yuan J (2014) Discriminative order let mining for real-time recognition of human-object interaction. In: 2014 Asian conference on computer vision (ACCV), Singapore, 2014, p 50–65
Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2019) View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans Pattern Anal Mach Intell 41(8):1963–1978
Article Google Scholar
Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2019) View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans Pattern Anal Mach Intell 41(8):1963–1978
Article Google Scholar
Zhang P, Lan C, Zeng W, Xing J, Xue J, Zheng N (2019) Semi-supervised classification with graph convolution networks arXiv preprint arXiv: 1904.01189
Zhou J, Cui G, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2018) Graph neural networks: a review of methods and applications arXiv preprint arXiv: 1812.08434
Zhu W, Lan C, Xing J, Zheng W, Li Y, Shen L (2016) Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: AAAI conference on artificial intelligence (AAAI). Phoenix, pp 3697–3603

Download references

Acknowledgments

This work is supported by National Natural Science Foundation of China (Grant No: 51375209), the Six Talent Peaks Project in Jiangsu Province(Grant No:ZBZZ-012), Excellent Science and Technology Innovation Team Fund Jiangsu Province(Grant No:2019SK07) and partially supported by the Postgraduate Research & Practice Innovation Program of Jiangsu Province (Grant No:KYCX20_1928 and KYCX20_0760).

Author information

Authors and Affiliations

School of Mechanical Engineering, Jiangnan University, Wuxi, 214122, China
Yi Cao, Chen Liu, Zilong Huang, Yongjian Sheng & Yongjian Ju
Jiangsu Key Laboratory of Advanced Food Manufacturing Equipment and Technology, Wuxi, 214122, China
Yi Cao, Chen Liu, Zilong Huang, Yongjian Sheng & Yongjian Ju

Authors

Yi Cao
View author publications
You can also search for this author in PubMed Google Scholar
Chen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zilong Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yongjian Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Yongjian Ju
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Cao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cao, Y., Liu, C., Huang, Z. et al. Skeleton-based action recognition with temporal action graph and temporal adaptive graph convolution structure. Multimed Tools Appl 80, 29139–29162 (2021). https://doi.org/10.1007/s11042-021-11136-z

Download citation

Received: 15 August 2020
Revised: 02 January 2021
Accepted: 03 June 2021
Published: 19 June 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s11042-021-11136-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Skeleton-based action recognition with temporal action graph and temporal adaptive graph convolution structure

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Graph convolutional networks: a comprehensive review

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Skeleton-based action recognition with temporal action graph and temporal adaptive graph convolution structure

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Graph convolutional networks: a comprehensive review

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation