Periodic-Aware Network for Fine-Grained Action Recognition

Luo, Senzi; Xiao, Jiayin; Li, Dong; Jian, Muwei

doi:10.1007/978-981-99-8543-2_9

Senzi Luo¹⁵,
Jiayin Xiao¹⁵,
Dong Li¹⁵ &
…
Muwei Jian¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14432))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

272 Accesses

Abstract

Recently, skeleton-based action recognition has gained increasing attention and achieved remarkable results in coarse-grained action recognition. Despite the positive results shown in these attempts, they are less effective in scenarios that require a detailed comparison between fine-grained classes, e.g. different moves during a vault. In such scenarios, existing methods make it hard to distinguish subtle differences between actions with different numbers of repetitions. In this article, to solve the above problem, we introduce periodicity into fine-grained action classification and propose a novel network architecture named periodic-aware network (PAN) to distinguish fine-grained actions with different numbers of repetitions. Firstly, a periodicity feature extraction module (PFEM) is proposed to capture periodicity information and extract periodicity features of different levels. Then, a periodicity fusion module (PFM) is proposed to fuse periodicity features and spatiotemporal features. We apply multiple periodicity fusion modules to fuse different levels of features. Finally, the results are obtained by classifying the fusion features. Extensive experiments on two fine-grained skeleton-based action recognition datasets, namely FineGym and Diving48, show that our proposed method outperforms previous skeleton-based action recognition methods.

S. Luo and J. Xiao—These authors contributed equally to this work and should be considered co-first authors. This work was supported by the Guangdong Basic and Applied Basic Research Foundation No. 2021A1515011867, National Natural Science Foundation of China (NSFC) (61976123); Taishan Young Scholars Program of Shandong Province; and Key Development Program for Basic Research of Shandong Province (ZR2020ZD44).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Herzig, R., et al.: Object-region video transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3148–3159 (2022)
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Google Scholar
Xiao, F., Lee, Y.J., Grauman, K., Malik, J., Feichtenhofer, C.: Audiovisual slowfast networks for video recognition. arXiv preprint arXiv:2001.08740 (2020)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Duan, H., Zhao, Y., Chen, K., Lin, D., Dai, B.: Revisiting skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2969–2978 (2022)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Ye, F., Pu, S., Zhong, Q., Li, C., Xie, D., Tang, H.: Dynamic GCN: context-enriched topology learning for skeleton-based action recognition. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 55–63 (2020)
Google Scholar
Pan, H., Bai, Y., He, Z., Zhang, C.: AAGCN: adjacency-aware graph convolutional network for person re-identification. Knowl.-Based Syst. 236, 107300 (2022)
Article Google Scholar
Liu, Z., Zhang, H., Chen, Z., Wang, Z., Ouyang, W.: Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 143–152 (2020)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Yue, R., Tian, Z., Du, S.: Action recognition based on RGB and skeleton data sets: a survey. Neurocomputing (2022)
Google Scholar
Si, C., Chen, W., Wang, W., Wang, L., Tan, T.: An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1227–1236 (2019)
Google Scholar
Zhang, P., Lan, C., Zeng, W., Xing, J., Xue, J., Zheng, N.: Semantics-guided neural networks for efficient skeleton-based human action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1112–1121 (2020)
Google Scholar
Chen, Y., Zhang, Z., Yuan, C., Li, B., Deng, Y., Hu, W.: Channel-wise topology refinement graph convolution for skeleton-based action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 13359–13368 (2021)
Google Scholar
Cutle, R., Davis, L.: Robust real-time periodic motion detection. Anal. Appl. IEEE Comput. Soc. 22(8), 781–796 (2000)
Google Scholar
Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., Zisserman, A.: Counting out time: class agnostic video repetition counting in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10387–10396 (2020)
Google Scholar
Jacquelin, N., Vuillemot, R., Duffner, S.: Periodicity counting in videos with unsupervised learning of cyclic embeddings. Pattern Recogn. Lett. 161, 59–66 (2022)
Article Google Scholar
Karvounas, G., Oikonomidis, I., Argyros, A.: Reactnet: temporal localization of repetitive activities in real-world videos. arXiv preprint arXiv:1910.06096 (2019)
Li, Y., Li, Y., Vasconcelos, N.: RESOUND: towards action recognition without representation bias. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 520–535. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_32
Chapter Google Scholar
Feichtenhofer, C., Fan, H., Malik, J., He, K.: Slowfast networks for video recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6202–6211 (2019)
Google Scholar
Feichtenhofer, C.: X3d: Expanding architectures for efficient video recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 203–213 (2020)
Google Scholar
Dwibedi, D., Tompson, J., Lynch, C., Sermanet, P.: Learning actionable representations from visual observations. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1577–1584. IEEE (2018)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Shao, D., Zhao, Y., Dai, B., Lin, D.: FineGYM: a hierarchical video dataset for fine-grained action understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2616–2625 (2020)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Google Scholar
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)
Google Scholar
Contributors, M.: Openmmlab’s next generation video understanding toolbox and benchmark (2020). https://github.com/open-mmlab/mmaction2
Duan, H., Wang, J., Chen, K., Lin, D.: Pyskl: towards good practices for skeleton action recognition. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 7351–7354 (2022)
Google Scholar
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Trans. Image Process. 29, 9532–9545 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Automation, Guangdong University of Technology, 510006, Guangzhou, Guangdong, China
Senzi Luo, Jiayin Xiao & Dong Li
School of Computer Science and Technology, Shandong University of Finance and Economics, 250014, Jinan, Shandong, China
Muwei Jian

Authors

Senzi Luo
View author publications
You can also search for this author in PubMed Google Scholar
Jiayin Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Dong Li
View author publications
You can also search for this author in PubMed Google Scholar
Muwei Jian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dong Li .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (tex 5 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luo, S., Xiao, J., Li, D., Jian, M. (2024). Periodic-Aware Network for Fine-Grained Action Recognition. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14432. Springer, Singapore. https://doi.org/10.1007/978-981-99-8543-2_9

Download citation

DOI: https://doi.org/10.1007/978-981-99-8543-2_9
Published: 29 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8542-5
Online ISBN: 978-981-99-8543-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Periodic-Aware Network for Fine-Grained Action Recognition