Skip to main content
Log in

Encoded motion image-based dynamic hand gesture recognition

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Dynamic hand gesture recognition is a crucial need in a smart human–computer interaction (HCI) system. Dynamic imaging has been recently introduced as a gesture description paradigm for simultaneously capturing spatial, temporal, and structural information from the depth video. However, existing techniques based on dynamic images cannot differentiate gesture movements that follow the same path but in opposite directions, for example, “moving a hand down” versus “moving a hand up.” To solve the issue, we have proposed an approach in which a gesture depth video is converted into a single image called an encoded motion image (EMI). The EMI has been given to a modified pre-trained 2D-CNN(two-dimensional convolutional neural network) based on VGG-19 to classify gestures present in the depth video. The experiments were carried out on two datasets: a multi-modal large-scale EgoGesture and MSR Gesture 3D datasets. For the EgoGesture dataset, the proposed method achieved an accuracy of 90.63%. Such a result provides state-of-the-art accuracy when employing this large-scale dataset of 83 classes and the 2D-CNN approach. For the MSR Gesture 3D dataset, the proposed method accuracy is 99.24%, which outperforms the state-of-the-art methods. This work also highlights the recognition accuracy and precision of each gesture. Instead of high-end systems like GPU, the experiments are conducted using a web-based data science environment called Kaggle to demonstrate the work's economic efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig.4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Mitra, S., Acharya, T.: Gesture recognition: a survey. IEEE Trans. Syst. Man Cybern. Part C 37, 311–324 (2007). https://doi.org/10.1109/TSMCC.2007.893280

    Article  Google Scholar 

  2. Hasan, H., Abdul-Kareem, S.: RETRACTED ARTICLE: Human–computer interaction using vision-based hand gesture recognition systems: a survey. Neural Comput. Appl. 25, 251–261 (2014). https://doi.org/10.1007/s00521-013-1481-0

    Article  Google Scholar 

  3. Chang, C.C., Chen, J.J., Tai, W.K., Han, C.C.: New approach for static gesture recognition. J. Inf. Sci. Eng. 22, 1047–1057 (2006). https://doi.org/10.6688/JISE.2006.22.5.4

    Article  Google Scholar 

  4. Kopuklu, O., Gunduz, A., Kose, N., Rigoll, G.: Online dynamic hand gesture recognition including efficiency analysis. IEEE Trans. Biomet. Behav. Identity Sci. 2, 85–97 (2020). https://doi.org/10.1109/tbiom.2020.2968216

    Article  Google Scholar 

  5. Pavlovic, V.I., Sharma, R., Huang, T.S.: Visual interpretation of hand gestures for human-computer interaction: a review. IEEE Trans. Pattern Anal. Mach. Intell. 19, 677–695 (1997). https://doi.org/10.1109/34.598226

    Article  Google Scholar 

  6. Barbhuiya, A.A., Karsh, R.K., Jain, R.: CNN based feature extraction and classification for sign language. Multimed. Tools Appl. 80, 3051–3069 (2021). https://doi.org/10.1007/s11042-020-09829-y

    Article  Google Scholar 

  7. Wang, P., Li, W., Ogunbona, P., Wan, J., Escalera, S.: RGB-D-based human motion recognition with deep learning: a survey. Comput. Vis. Image Underst. 171, 118–139 (2018). https://doi.org/10.1016/j.cviu.2018.04.007

    Article  Google Scholar 

  8. Yang, R., Yang, R.: DMM-pyramid based deep architectures for action recognition with depth cameras. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). pp. 37–49. Springer, Cham (2015)

  9. O’Mahony, N., Campbell, S., Carvalho, A., Harapanahalli, S., Hernandez, G.V., Krpalkova, L., Riordan, D., Walsh, J.: Deep learning vs traditional computer vision. In: Arai, K., Kapoor, S. (eds.) Advances in Intelligent Systems and Computing, pp. 128–144. Springer, Cham (2020)

    Google Scholar 

  10. Al-Shamayleh, A.S., Ahmad, R., Abushariah, M.A.M., Alam, K.A., Jomhari, N.: A systematic literature review on vision based gesture recognition techniques. Multimed. Tools Appl. 77, 28121–28184 (2018). https://doi.org/10.1007/s11042-018-5971-z

    Article  Google Scholar 

  11. Koehn, P.: Combining genetic algorithms and neural networks : the encoding problem (1994)

  12. Ji, S., Xu, W., Yang, M., Yu, K.: 3D Convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35, 221–231 (2013). https://doi.org/10.1109/TPAMI.2012.59

    Article  Google Scholar 

  13. Liu, Z., Zhang, C., Tian, Y.: 3D-based Deep convolutional neural network for action recognition with depth sequences. Image Vis. Comput. 55, 93–100 (2016). https://doi.org/10.1016/j.imavis.2016.04.004

    Article  Google Scholar 

  14. Guyon, I., Athitsos, V., Jangyodsuk, P., Escalante, H.J.: The ChaLearn gesture dataset (CGD 2011). Mach. Vis. Appl. 25, 1929–1951 (2014). https://doi.org/10.1007/s00138-014-0596-3

    Article  Google Scholar 

  15. Donahue, J., Hendricks, L.A., Rohrbach, M., Venugopalan, S., Guadarrama, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans. Pattern Anal. Mach. Intell. 39, 677–691 (2017). https://doi.org/10.1109/TPAMI.2016.2599174

    Article  Google Scholar 

  16. Molchanov, P., Yang, X., Gupta, S., Kim, K., Tyree, S., Kautz, J.: Online detection and classification of dynamic hand gestures with recurrent 3D convolutional neural networks. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pp. 4207–4215 (2016)

  17. Duan, J., Wan, J., Li, S.Z., Zhou, S., Guo, X.: A unified framework for multi-modal isolated gesture recognition. ACM Trans. Multimed. Comput. Commun. (2018). https://doi.org/10.1145/3131343

    Article  Google Scholar 

  18. Narayana, P., Beveridge, J.R., Draper, B.A.: Gesture recognition: focus on the hands. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5235–5244. IEEE (2018)

  19. Elboushaki, A., Hannane, R., Afdel, K., Koutti, L.: MultiD-CNN: a multi-dimensional feature learning approach based on deep convolutional networks for gesture recognition in RGB-D image sequences. Expert Syst. Appl. 139, 112829 (2019). https://doi.org/10.1016/j.eswa.2019.112829

    Article  Google Scholar 

  20. dos Santos, C.C., Samatelo, J.L.A., Vassallo, R.F.: Dynamic gesture recognition by using CNNs and star RGB: a temporal information condensation. Neurocomputing 400, 238–254 (2020). https://doi.org/10.1016/j.neucom.2020.03.038

    Article  Google Scholar 

  21. Asadi-Aghbolaghi, M., Clapes, A., Bellantonio, M., Escalante, H.J., Ponce-Lopez, V., Baro, X., Guyon, I., Kasaei, S., Escalera, S.: A Survey on Deep Learning Based Approaches for Action and Gesture Recognition in Image Sequences. In: Proc. - 12th IEEE Int. Conf. Autom. Face Gesture Recognition, FG 2017 - 1st Int. Work. Adapt. Shot Learn. Gesture Underst. Prod. ASL4GUP 2017, Biometrics Wild, Bwild 2017, Heteroge. 476–483 (2017). https://doi.org/10.1109/FG.2017.150

  22. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM. 60, 84–90 (2017). https://doi.org/10.1145/3065386

    Article  Google Scholar 

  23. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv Prepr. arXiv1409.1556. (2014)

  24. Yang, X., Zhang, C., Tian, Y.: Recognizing actions using depth motion maps-based histograms of oriented gradients. In: MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia. pp. 1057–1060 (2012)

  25. Wang, P., Li, W., Liu, S., Zhang, Y., Gao, Z., Ogunbona, P.: Large-scale continuous gesture recognition using convolutional neural networks. In: Proceedings - International Conference on Pattern Recognition. pp. 13–18. Institute of Electrical and Electronics Engineers Inc. (2016)

  26. Wang, P., Li, W., Liu, S., Gao, Z., Tang, C., Ogunbona, P.: Large-scale isolated gesture recognition using convolutional neural networks. Proc. - Int. Conf. Pattern Recognit. (2016). https://doi.org/10.1109/ICPR.2016.7899599

    Article  Google Scholar 

  27. Wang, P., Li, W., Gao, Z., Tang, C., Ogunbona, P.O.: Depth pooling based large-scale 3-D action recognition with convolutional neural networks. IEEE Trans. Multimed. 20, 1051–1061 (2018). https://doi.org/10.1109/TMM.2018.2818329

    Article  Google Scholar 

  28. Barros, P., Parisi, G.I., Jirak, D., Wermter, S.: Real-time gesture recognition using a humanoid robot with a deep neural architecture. In: IEEE-RAS Int. Conf. Humanoid Robot. 2015-Febru, 646–651 (2015). https://doi.org/10.1109/HUMANOIDS.2014.7041431

  29. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proc. IEEE Int. Conf. Comput. Vis. 2015 Inter, 4489–4497 (2015). https://doi.org/10.1109/ICCV.2015.510

  30. Cao, C., Zhang, Y., Wu, Y., Lu, H., Cheng, J.: Egocentric gesture recognition using recurrent 3D convolutional neural networks with spatiotemporal transformer modules. In: Proc. IEEE Int. Conf. Comput. Vis. 2017-Octob, 3783–3791 (2017). https://doi.org/10.1109/ICCV.2017.406

  31. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: MM 2014 - Proceedings of the 2014 ACM Conference on Multimedia. pp. 675–678. Association for Computing Machinery, Inc, New York, NY, USA (2014)

  32. Li, Z., Zheng, Z., Lin, F., Leung, H., Li, Q.: Action recognition from depth sequence using depth motion maps-based local ternary patterns and CNN. Multimed. Tools Appl. 78, 19587–19601 (2019). https://doi.org/10.1007/s11042-019-7356-3

    Article  Google Scholar 

  33. Azad, R., Asadi-Aghbolaghi, M., Kasaei, S., Escalera, S.: Dynamic 3D hand gesture recognition by learning weighted depth motion maps. IEEE Trans. Circuits Syst. Video Technol. 29, 1729–1740 (2019). https://doi.org/10.1109/TCSVT.2018.2855416

    Article  Google Scholar 

  34. Zhang, Z., Wei, S., Song, Y., Zhang, Y.: Gesture recognition using enhanced depth motion map and static pose map. In: Proc. - 12th IEEE Int. Conf. Autom. Face Gesture Recognition, FG 2017 - 1st Int. Work. Adapt. Shot Learn. Gesture Underst. Prod. ASL4GUP 2017, Biometrics Wild, Bwild 2017, Heteroge. 238–244 (2017). https://doi.org/10.1109/FG.2017.38

  35. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2016-Decem, 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  36. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pp. 2818–2826 (2016)

  37. Lin, M., Chen, Q., Yan, S.: Network In: Network. 2nd Int. Conf. Learn. Represent. ICLR 2014 - Conf. Track Proc. (2013)

  38. Zhang, Y., Cao, C., Cheng, J., Lu, H.: EgoGesture: A New Dataset and Benchmark for Egocentric Hand Gesture Recognition. IEEE Trans. Multimed. 20, 1038–1050 (2018). https://doi.org/10.1109/TMM.2018.2808769

    Article  Google Scholar 

  39. Kurakin, A., Zhang, Z., Liu, Z.: A real time system for dynamic hand gesture recognition with a depth sensor. In: European Signal Processing Conference. pp. 1975–1979 (2012)

  40. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. arXiv Prepr. arXiv1406.2199. 1, 568–576 (2014)

  41. Chen, C., Jafari, R., Kehtarnavaz, N.: Action recognition from depth sequences using depth motion maps-based local binary patterns. In: Proceedings—2015 IEEE Winter Conference on Applications of Computer Vision, WACV 2015. pp. 1092–1099. Institute of Electrical and Electronics Engineers Inc. (2015)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rahul Jain.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jain, R., Karsh, R.K. & Barbhuiya, A.A. Encoded motion image-based dynamic hand gesture recognition. Vis Comput 38, 1957–1974 (2022). https://doi.org/10.1007/s00371-021-02259-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-021-02259-3

Keywords

Navigation