Abstract
Skeleton-based action recognition has recently achieved much attention since they can robustly convey the action information. Recently, many studies have shown that graph convolutional networks (GCNs), which generalize CNNs to more generic non-Euclidean structures, are more exactly extracts spatial feature. Nevertheless, how to effectively extract global temporal features is still a challenge. In this work, firstly, a unique feature named temporal action graph is designed. It first attempts to express timing relationship with the form of graph. Secondly, temporal adaptive graph convolution structure (T-AGCN) are proposed. Through generating global adjacency matrix for temporal action graph, it can flexibly extract global temporal features in temporal dynamics. Thirdly, we further propose a novel model named spatial-temporal adaptive graph convolutional network (ST-AGCN) for skeletons-based action recognition to extract spatial-temporal feature and improve action recognition accuracy. ST-AGCN combines T-AGCN with spatial graph convolution to make up for the shortage of T-AGCN for spatial structure. Besides, ST-AGCN uses dual features to form a two-stream network which is able to further improve action recognition accuracy for hard-to-recognition sample. Finally, comparsive experiments on the two skeleton-based action recognition datasets, NTU-RGBD and SBU, demonstrate that T-AGCN and temporal action graph can effective explore global temporal information and ST-AGCN achieves certain improvement of recognition accuracy on both datasets.
Similar content being viewed by others
References
Alex G (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Amir S, Liu J, Ng T, Wang G (2016) NTU RGB+D: a large scale dataset for 3d human activity analysis. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 1010–1019
Cao Z, Simon T, Wei S, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 1302–1310
Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the kinetics dataset. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 4724–4733
Chen Y, Wang L, Li C, Hou Y, Li W (2020) ConvNets-based action recognition from skeleton motion maps. Multimed Tools Appl 79:1707–1725
Daniel W, Remi R, Edmond B (2011) A survey of vision-based methods for action representation, segmentation and recognition. Computer Vision & Image Understanding 115(2):224–241
Diao Z, Wang X, Zhang D, Liu Y, Kun K, He S (2019) Dynamic spatial-temporal graph convolutional neural networks for traffic forecasting. In: AAAI conference on artificial intelligence (AAAI), Honolulu, Hawaii, p 890–897
Ding W, Liu K, Fu X, Cheng F (2016) Profile hmms for skeleton-based human action recognition. Image Communication 42(C):109–119
Dong X, Thanou D, Rabbat M, Frossard P (2019) Learning graphs from data: a signal representation perspective. IEEE Signal Process Mag 36(3):44–63
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. IEEE Conference on computer vision and pattern recognition (CVPR) 9:1110–1118
Feng J, Zhang S, Xiao J (2019) Explorations of skeleton features for LSTM-based action recognition. Multimed Tools Appl 78:591–603
G-H G, Kim TK (2017) Transition forests: learning discriminative temporal transitions for action recognition and detection. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 407–415
Johansson G (1973) Visual perception of biological motion and a model for its analysis. Percept Psychophys 14:201–211
Karen S, Andrew Z (2014) Two-stream convolutional networks for action recognition in videos. In: International conference on neural information processing systems (NIPS), Montreal, QC, Canada, p 568–576
Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Viola F, Green T, Back T, Natsev P, Suleyman M, Zisserman A (2017) The kinetics human action video dataset arXiv preprint arXiv: 1705.06950
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3d action recognition. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 4570–4579
Lee I, Kim D, Kang S, Lee S (2017) Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks. In: IEEE international conference on computer vision (ICCV), IEEE Computer Society, Venice, Italy, p 1012–1020
Li C, Zhong Q, Xie D, Pu S (2017) Skeleton-based action recognition with convolutional neural networks. In: IEEE international conference on multimedia & expo workshops (ICMEW), IEEE Computer Society, p 583–587
Li B, He M, Dai Y, Cheng X, Chen Y (2018) 3D skeleton based action recognition by video-domain translation-scale invariant mapping and multi-scale dilated CNN. Multimed Tools Appl 77:22901–22921
Li C, Zhong Q, Xie D, Pu S (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In: twenty-seventh international joint conference on artificial intelligence (IJCAI), p 786-792
Li J, Xie X, Pan Q (2020) Shi G (2020) SGM-net: skeleton-guided multimodal network for action recognition. Pattern Recogn 104:107356
Lie W, Le A, Lin G (2018) Human fall-down event detection based on 2D skeletons and deep learning approach. In: 2018 international workshop on advanced image technology (IWAIT). IEEE Computer Society, pp 1–4
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision (ECCV). Springer, pp 816–833
Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn 68:346–362
Liu H, Tu J, Liu M (2017) Two-stream 3d convolutional neural network for skeleton-based action recognition. arXiv preprint arXiv: 1705.08106
Liu H, Tu J, Liu M (2017) Two-stream 3d convolutional neural network for skeleton-based action recognition. arXiv preprint arXiv: 1705.08106
Mihai T, Mihai N, Adina M (2019) Spatia-temporal features in action recognition using 3d skeletal joints. Sensors 19(2):423–442
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in PyTorch. In: NIPS Workshop Autodiff Decision Program Chairs
Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990
Kim T. S, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW), IEEE Computer Society, p 1623–1631
Salah A, Lepri B (2011) Second international workshop on human behavior understanding: inducing behavioral change. In: International conference on ambient intelligence (AmISEmeH). Springer-Verlag, pp 376–377
Shi L, Zhang Y, Cheng J, Lu H (2019) Skeleton-based action recognition with directed graph neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR), IEEE Computer Society, Long Beach, CA, USA, p 7904–7913
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 12018–12027
Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 1227–1236
Song S, Lan C, Xing J, Zeng W, Liu J (2018) Spatio-temporal attention-based LSTM networks for 3d action recognition and detection. IEEE Trans Image Process 27(7):3459–3471
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 5686–5696
Tang Y, Tian Y, Lu J, Li P, Zhou J (2018) Deep progressive reinforcement learning for skeleton-based action recognition. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 5323–5332
Varol G, Laptev I, Schmid C (2018) Long-term temporal convolutions for action recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1510–1517
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Gool L (2016) Temporal segment networks: towards good practices for deep action recognition. In: European conference on computer vision (ECCV). Springer, pp 20–36
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 7794–7803
Weng J, Liu M, Jiang X, Yuan J (2018) Deformable pose traversal convolution for 3d action and gesture recognition. In: European conference on computer vision (ECCV). Springer, pp 142–157
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI conference on artificial intelligence (AAAI), New Orleans, Louisiana, USA, p 7444–7452
Yu G, Liu Z, Yuan J (2014) Discriminative order let mining for real-time recognition of human-object interaction. In: 2014 Asian conference on computer vision (ACCV), Singapore, 2014, p 50–65
Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2019) View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans Pattern Anal Mach Intell 41(8):1963–1978
Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2019) View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans Pattern Anal Mach Intell 41(8):1963–1978
Zhang P, Lan C, Zeng W, Xing J, Xue J, Zheng N (2019) Semi-supervised classification with graph convolution networks arXiv preprint arXiv: 1904.01189
Zhou J, Cui G, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2018) Graph neural networks: a review of methods and applications arXiv preprint arXiv: 1812.08434
Zhu W, Lan C, Xing J, Zheng W, Li Y, Shen L (2016) Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: AAAI conference on artificial intelligence (AAAI). Phoenix, pp 3697–3603
Acknowledgments
This work is supported by National Natural Science Foundation of China (Grant No: 51375209), the Six Talent Peaks Project in Jiangsu Province(Grant No:ZBZZ-012), Excellent Science and Technology Innovation Team Fund Jiangsu Province(Grant No:2019SK07) and partially supported by the Postgraduate Research & Practice Innovation Program of Jiangsu Province (Grant No:KYCX20_1928 and KYCX20_0760).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cao, Y., Liu, C., Huang, Z. et al. Skeleton-based action recognition with temporal action graph and temporal adaptive graph convolution structure. Multimed Tools Appl 80, 29139–29162 (2021). https://doi.org/10.1007/s11042-021-11136-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11136-z