Fine grained sport action recognition with Twin spatio-temporal convolutional neural networks

Martin, Pierre-Etienne; Benois-Pineau, Jenny; Péteri, Renaud; Morlier, Julien

doi:10.1007/s11042-020-08917-3

Fine grained sport action recognition with Twin spatio-temporal convolutional neural networks

Application to table tennis

Published: 19 April 2020

Volume 79, pages 20429–20447, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Pierre-Etienne Martin¹,
Jenny Benois-Pineau¹,
Renaud Péteri² &
…
Julien Morlier³

1068 Accesses
35 Citations
3 Altmetric
Explore all metrics

Abstract

Human action recognition in video is one of the key problems in visual data interpretation. Despite intensive research, the recognition of actions with low inter-class variability remains a challenge. This paper presents a new Twin Spatio-Temporal Convolutional Neural Network (TSTCNN) for this purpose. When applied to table tennis, it is possible to detect and recognize 20 table tennis strokes. The model has been trained on a specific dataset, so called TTStroke-21, recorded in natural conditions at the Faculty of Sports of the University of Bordeaux. Our model takes as inputs an RGB image sequence and its computed Optical Flow. The proposed Twin architecture is a two stream network both comprising 3 spatio-temporal convolutional layers, followed by a fully connected layer where data are fused. Our method reaches an accuracy of 91.4% against 43.1% for our baseline, a Two-Stream Inflated 3D ConvNet (I3D).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid LSTM and GAN model for action recognition and prediction of lawn tennis sport activities

Article 22 September 2023

Exploiting Sub-region Deep Features for Specific Action Recognition in Combat Sports Video

Spatiotemporal based table tennis stroke-type assessment

Article 30 March 2021

Notes

References

Ahmadi A, Mitchell E, Richter C, Destelle F, Gowing M, O’Connor NE, Moran K (2015) Toward automatic activity classification and movement assessment during a sports training session. IEEE Internet of Things Journal 2(1):23–32
Article Google Scholar
Bilen H, Fernando B, Gavves E, Vedaldi A (2016) Action recognition with dynamic image networks. arXiv:1612.00738
Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. arXiv:1705.07750.
Choutas V, Weinzaepfel P, Revaud J, Schmid C (2018) Potion: Pose motion representation for action recognition. In: CVPR 2018. IEEE Computer Society (2018), pp 7024–7033
Damen D, Doughty H, Farinella GM, Fidler S, Furnari A, Kazakos E, Moltisanti D, Munro J, Perrett T, Price W, Wray M (2018) Scaling egocentric vision: The EPIC-KITCHENS dataset. arXiv:1804.02748
Debard Q, Wolf C, Canu S, Arné J. (2018) Learning to recognize touch gestures: Recurrent vs. convolutional features and dynamic sampling. In: 13th IEEE international conference on automatic face & gesture recognition, 2018, pp 114–121
Escalera S, Baró X, Gonzàlez J, Bautista MÁ, Madadi M, Reyes M, Ponce-López V, Escalante HJ, Shotton J, Guyon I (2014) Chalearn looking at people challenge 2014: Dataset and results. In: Computer vision - ECCV 2014 workshops - zurich, switzerland, september 6-7 and 12, 2014, proceedings, Part I, pp 459–473
Gu C, Sun C, Vijayanarasimhan S, Pantofaru C, Ross DA, Toderici G, Li Y, Ricco S, Sukthankar R, Schmid C, Malik J (2017) AVA: A video dataset of spatio-temporally localized atomic visual actions. arXiv:1705.08421
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. 1512.03385
Hou R, Chen C, Shah M (2017) Tube convolutional neural network (t-CNN) for action detection in videos. In: IEEE International conference on computer vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp 5823–5832
Jhuang H, Gall J, Zuffi S, Schmid C, Black MJ (2013) Towards understanding action recognition. In: International conf. on computer vision (ICCV), pp 3192–3199
Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Viola F, Green T, Back T, Natsev P, Suleyman M, Zisserman A (2017) The kinetics human action video dataset. arXiv:1705.06950
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, Lake Tahoe, Nevada, United States, pp 1106–1114
Kuehne H, Jhuang H, Garrote E, Poggio TA, Serre T (2011) HMDB: A large video database for human motion recognition. In: ICCV. IEEE Computer Society, pp 2556–2563
Li Z, Wang W, Li N, Wang J (2016) Tube convnets: Better exploiting motion for action recognition. In: 2016 IEEE International conference on image processing, ICIP 2016, Phoenix, AZ, USA, September 25-28, 2016, pp 3056–3060
Liu C (2009) Beyond pixels: Exploring new representations and applications for motion analysis. Ph.D. thesis Massachusetts Institute of Technology
Martin P, Benois-Pineau J, Péteri R (2019) Fine-grained action detection and classification in table tennis with siamese spatio-temporal convolutional neural network. In: ICIP 2019. IEEE, pp 3027–3028
Martin P, Benois-Pineau J, Péteri R, Morlier J (2018) Sport action recognition with siamese spatio-temporal cnns: Application to table tennis. In: CBMI 2018. IEEE, pp 1–6
Martin P, Benois-Pineau J, Péteri R, Morlier J (2019) Optimal choice of motion estimation methods for fine-grained action classification with 3d convolutional networks. In: ICIP 2019. IEEE, pp 554–558
Nesterov Y (1983) A method for solving a convex programming problem with convergence rate o(1/k2). Soviet Mathematics Doklady 27:372–367
MATH Google Scholar
Niebles JC, Chen C, Li F (2010) Modeling temporal structure of decomposable motion segments for activity classification. In: ECCV 2010, pp 392–405
Noiumkar S, Tirakoat S (2013) Use of optical motion capture in sports science: a case study of golf swing. In: ICICM, pp 310–313
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein MS, Berg AC, Li F (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: NIPS, pp 568–576
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Soomro K, Zamir AR, Shah M (2012) UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402
Stoian A, Ferecatu M, Benois-Pineau J, Crucianu M (2016) Fast action localization in large-scale video archives. IEEE Trans Circuits Syst Video Techn 26 (10):1917–1930
Article Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE Conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pp 1–9
Varol G, Laptev I, Schmid C (2018) Long-term temporal convolutions for action recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1510–1517
Article Google Scholar
Weinzaepfel P, Revaud J, Harchaoui Z, Schmid C (2013) Deepflow: Large displacement optical flow with deep matching. In: IEEE ICCV, pp 1385–1392
Wu D, Pigou L, Kindermans P, Le ND, Shao L, Dambre J, Odobez J (2016) Deep dynamic neural networks for multimodal gesture segmentation and recognition. IEEE Trans Pattern Anal Mach Intell 38(8):1583–1597
Article Google Scholar
Zivkovic Z, van der Heijden F (2006) Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recogn Lett 27(7):773–780
Article Google Scholar

Download references

Acknowledgements

We would like to thank Alain Coupet from sport faculty, expert and teacher in table tennis, for the proposed table tennis strokes taxonomy and all the players and annotators for their involvement in the acquisition and annotation processes leading to TTStroke-21.

Author information

Authors and Affiliations

LaBRI, University of Bordeaux, Talence, France
Pierre-Etienne Martin & Jenny Benois-Pineau
MIA, University of La Rochelle, La Rochelle, France
Renaud Péteri
IMS, University of Bordeaux, Talence, France
Julien Morlier

Authors

Pierre-Etienne Martin
View author publications
You can also search for this author in PubMed Google Scholar
Jenny Benois-Pineau
View author publications
You can also search for this author in PubMed Google Scholar
Renaud Péteri
View author publications
You can also search for this author in PubMed Google Scholar
Julien Morlier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pierre-Etienne Martin.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the CRISP project of the Nouvelle-Aquitaine Region and Bordeaux IDEX Initiative

Rights and permissions

Reprints and permissions

About this article

Cite this article

Martin, PE., Benois-Pineau, J., Péteri, R. et al. Fine grained sport action recognition with Twin spatio-temporal convolutional neural networks. Multimed Tools Appl 79, 20429–20447 (2020). https://doi.org/10.1007/s11042-020-08917-3

Download citation

Received: 20 March 2019
Revised: 09 January 2020
Accepted: 01 April 2020
Published: 19 April 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s11042-020-08917-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fine grained sport action recognition with Twin spatio-temporal convolutional neural networks

Abstract

Access this article

Similar content being viewed by others

Hybrid LSTM and GAN model for action recognition and prediction of lawn tennis sport activities

Exploiting Sub-region Deep Features for Specific Action Recognition in Combat Sports Video

Spatiotemporal based table tennis stroke-type assessment

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fine grained sport action recognition with Twin spatio-temporal convolutional neural networks

Abstract

Access this article

Similar content being viewed by others

Hybrid LSTM and GAN model for action recognition and prediction of lawn tennis sport activities

Exploiting Sub-region Deep Features for Specific Action Recognition in Combat Sports Video

Spatiotemporal based table tennis stroke-type assessment

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation