Abstract
Recently, skeleton-based action recognition has gained increasing attention and achieved remarkable results in coarse-grained action recognition. Despite the positive results shown in these attempts, they are less effective in scenarios that require a detailed comparison between fine-grained classes, e.g. different moves during a vault. In such scenarios, existing methods make it hard to distinguish subtle differences between actions with different numbers of repetitions. In this article, to solve the above problem, we introduce periodicity into fine-grained action classification and propose a novel network architecture named periodic-aware network (PAN) to distinguish fine-grained actions with different numbers of repetitions. Firstly, a periodicity feature extraction module (PFEM) is proposed to capture periodicity information and extract periodicity features of different levels. Then, a periodicity fusion module (PFM) is proposed to fuse periodicity features and spatiotemporal features. We apply multiple periodicity fusion modules to fuse different levels of features. Finally, the results are obtained by classifying the fusion features. Extensive experiments on two fine-grained skeleton-based action recognition datasets, namely FineGym and Diving48, show that our proposed method outperforms previous skeleton-based action recognition methods.
S. Luo and J. XiaoāThese authors contributed equally to this work and should be considered co-first authors. This work was supported by the Guangdong Basic and Applied Basic Research Foundation No. 2021A1515011867, National Natural Science Foundation of China (NSFC) (61976123); Taishan Young Scholars Program of Shandong Province; and Key Development Program for Basic Research of Shandong Province (ZR2020ZD44).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Herzig, R., et al.: Object-region video transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3148ā3159 (2022)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Xiao, F., Lee, Y.J., Grauman, K., Malik, J., Feichtenhofer, C.: Audiovisual slowfast networks for video recognition. arXiv preprint arXiv:2001.08740 (2020)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Duan, H., Zhao, Y., Chen, K., Lin, D., Dai, B.: Revisiting skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2969ā2978 (2022)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Ye, F., Pu, S., Zhong, Q., Li, C., Xie, D., Tang, H.: Dynamic GCN: context-enriched topology learning for skeleton-based action recognition. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 55ā63 (2020)
Pan, H., Bai, Y., He, Z., Zhang, C.: AAGCN: adjacency-aware graph convolutional network for person re-identification. Knowl.-Based Syst. 236, 107300 (2022)
Liu, Z., Zhang, H., Chen, Z., Wang, Z., Ouyang, W.: Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 143ā152 (2020)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132ā7141 (2018)
Yue, R., Tian, Z., Du, S.: Action recognition based on RGB and skeleton data sets: a survey. Neurocomputing (2022)
Si, C., Chen, W., Wang, W., Wang, L., Tan, T.: An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1227ā1236 (2019)
Zhang, P., Lan, C., Zeng, W., Xing, J., Xue, J., Zheng, N.: Semantics-guided neural networks for efficient skeleton-based human action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1112ā1121 (2020)
Chen, Y., Zhang, Z., Yuan, C., Li, B., Deng, Y., Hu, W.: Channel-wise topology refinement graph convolution for skeleton-based action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 13359ā13368 (2021)
Cutle, R., Davis, L.: Robust real-time periodic motion detection. Anal. Appl. IEEE Comput. Soc. 22(8), 781ā796 (2000)
Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., Zisserman, A.: Counting out time: class agnostic video repetition counting in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10387ā10396 (2020)
Jacquelin, N., Vuillemot, R., Duffner, S.: Periodicity counting in videos with unsupervised learning of cyclic embeddings. Pattern Recogn. Lett. 161, 59ā66 (2022)
Karvounas, G., Oikonomidis, I., Argyros, A.: Reactnet: temporal localization of repetitive activities in real-world videos. arXiv preprint arXiv:1910.06096 (2019)
Li, Y., Li, Y., Vasconcelos, N.: RESOUND: towards action recognition without representation bias. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 520ā535. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_32
Feichtenhofer, C., Fan, H., Malik, J., He, K.: Slowfast networks for video recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6202ā6211 (2019)
Feichtenhofer, C.: X3d: Expanding architectures for efficient video recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 203ā213 (2020)
Dwibedi, D., Tompson, J., Lynch, C., Sermanet, P.: Learning actionable representations from visual observations. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1577ā1584. IEEE (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770ā778 (2016)
Shao, D., Zhao, Y., Dai, B., Lin, D.: FineGYM: a hierarchical video dataset for fine-grained action understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2616ā2625 (2020)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693ā5703 (2019)
Contributors, M.: Openmmlabās next generation video understanding toolbox and benchmark (2020). https://github.com/open-mmlab/mmaction2
Duan, H., Wang, J., Chen, K., Lin, D.: Pyskl: towards good practices for skeleton action recognition. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 7351ā7354 (2022)
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Trans. Image Process. 29, 9532ā9545 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
Ā© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Luo, S., Xiao, J., Li, D., Jian, M. (2024). Periodic-Aware Network forĀ Fine-Grained Action Recognition. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14432. Springer, Singapore. https://doi.org/10.1007/978-981-99-8543-2_9
Download citation
DOI: https://doi.org/10.1007/978-981-99-8543-2_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8542-5
Online ISBN: 978-981-99-8543-2
eBook Packages: Computer ScienceComputer Science (R0)