Encoded motion image-based dynamic hand gesture recognition

Jain, Rahul; Karsh, Ram Kumar; Barbhuiya, Abul Abbas

doi:10.1007/s00371-021-02259-3

Encoded motion image-based dynamic hand gesture recognition

Original article
Published: 09 August 2021

Volume 38, pages 1957–1974, (2022)
Cite this article

The Visual Computer Aims and scope Submit manuscript

798 Accesses
13 Citations
2 Altmetric
Explore all metrics

Abstract

Dynamic hand gesture recognition is a crucial need in a smart human–computer interaction (HCI) system. Dynamic imaging has been recently introduced as a gesture description paradigm for simultaneously capturing spatial, temporal, and structural information from the depth video. However, existing techniques based on dynamic images cannot differentiate gesture movements that follow the same path but in opposite directions, for example, “moving a hand down” versus “moving a hand up.” To solve the issue, we have proposed an approach in which a gesture depth video is converted into a single image called an encoded motion image (EMI). The EMI has been given to a modified pre-trained 2D-CNN(two-dimensional convolutional neural network) based on VGG-19 to classify gestures present in the depth video. The experiments were carried out on two datasets: a multi-modal large-scale EgoGesture and MSR Gesture 3D datasets. For the EgoGesture dataset, the proposed method achieved an accuracy of 90.63%. Such a result provides state-of-the-art accuracy when employing this large-scale dataset of 83 classes and the 2D-CNN approach. For the MSR Gesture 3D dataset, the proposed method accuracy is 99.24%, which outperforms the state-of-the-art methods. This work also highlights the recognition accuracy and precision of each gesture. Instead of high-end systems like GPU, the experiments are conducted using a web-based data science environment called Kaggle to demonstrate the work's economic efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HandSense: smart multimodal hand gesture recognition based on deep neural networks

Article 23 August 2018

Towards an end-to-end isolated and continuous deep gesture recognition process

Article 06 April 2022

mXception and dynamic image for hand gesture recognition

Article 17 February 2024

References

Mitra, S., Acharya, T.: Gesture recognition: a survey. IEEE Trans. Syst. Man Cybern. Part C 37, 311–324 (2007). https://doi.org/10.1109/TSMCC.2007.893280
Article Google Scholar
Hasan, H., Abdul-Kareem, S.: RETRACTED ARTICLE: Human–computer interaction using vision-based hand gesture recognition systems: a survey. Neural Comput. Appl. 25, 251–261 (2014). https://doi.org/10.1007/s00521-013-1481-0
Article Google Scholar
Chang, C.C., Chen, J.J., Tai, W.K., Han, C.C.: New approach for static gesture recognition. J. Inf. Sci. Eng. 22, 1047–1057 (2006). https://doi.org/10.6688/JISE.2006.22.5.4
Article Google Scholar
Kopuklu, O., Gunduz, A., Kose, N., Rigoll, G.: Online dynamic hand gesture recognition including efficiency analysis. IEEE Trans. Biomet. Behav. Identity Sci. 2, 85–97 (2020). https://doi.org/10.1109/tbiom.2020.2968216
Article Google Scholar
Pavlovic, V.I., Sharma, R., Huang, T.S.: Visual interpretation of hand gestures for human-computer interaction: a review. IEEE Trans. Pattern Anal. Mach. Intell. 19, 677–695 (1997). https://doi.org/10.1109/34.598226
Article Google Scholar
Barbhuiya, A.A., Karsh, R.K., Jain, R.: CNN based feature extraction and classification for sign language. Multimed. Tools Appl. 80, 3051–3069 (2021). https://doi.org/10.1007/s11042-020-09829-y
Article Google Scholar
Wang, P., Li, W., Ogunbona, P., Wan, J., Escalera, S.: RGB-D-based human motion recognition with deep learning: a survey. Comput. Vis. Image Underst. 171, 118–139 (2018). https://doi.org/10.1016/j.cviu.2018.04.007
Article Google Scholar
Yang, R., Yang, R.: DMM-pyramid based deep architectures for action recognition with depth cameras. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). pp. 37–49. Springer, Cham (2015)
O’Mahony, N., Campbell, S., Carvalho, A., Harapanahalli, S., Hernandez, G.V., Krpalkova, L., Riordan, D., Walsh, J.: Deep learning vs traditional computer vision. In: Arai, K., Kapoor, S. (eds.) Advances in Intelligent Systems and Computing, pp. 128–144. Springer, Cham (2020)
Google Scholar
Al-Shamayleh, A.S., Ahmad, R., Abushariah, M.A.M., Alam, K.A., Jomhari, N.: A systematic literature review on vision based gesture recognition techniques. Multimed. Tools Appl. 77, 28121–28184 (2018). https://doi.org/10.1007/s11042-018-5971-z
Article Google Scholar
Koehn, P.: Combining genetic algorithms and neural networks : the encoding problem (1994)
Ji, S., Xu, W., Yang, M., Yu, K.: 3D Convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35, 221–231 (2013). https://doi.org/10.1109/TPAMI.2012.59
Article Google Scholar
Liu, Z., Zhang, C., Tian, Y.: 3D-based Deep convolutional neural network for action recognition with depth sequences. Image Vis. Comput. 55, 93–100 (2016). https://doi.org/10.1016/j.imavis.2016.04.004
Article Google Scholar
Guyon, I., Athitsos, V., Jangyodsuk, P., Escalante, H.J.: The ChaLearn gesture dataset (CGD 2011). Mach. Vis. Appl. 25, 1929–1951 (2014). https://doi.org/10.1007/s00138-014-0596-3
Article Google Scholar
Donahue, J., Hendricks, L.A., Rohrbach, M., Venugopalan, S., Guadarrama, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans. Pattern Anal. Mach. Intell. 39, 677–691 (2017). https://doi.org/10.1109/TPAMI.2016.2599174
Article Google Scholar
Molchanov, P., Yang, X., Gupta, S., Kim, K., Tyree, S., Kautz, J.: Online detection and classification of dynamic hand gestures with recurrent 3D convolutional neural networks. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pp. 4207–4215 (2016)
Duan, J., Wan, J., Li, S.Z., Zhou, S., Guo, X.: A unified framework for multi-modal isolated gesture recognition. ACM Trans. Multimed. Comput. Commun. (2018). https://doi.org/10.1145/3131343
Article Google Scholar
Narayana, P., Beveridge, J.R., Draper, B.A.: Gesture recognition: focus on the hands. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5235–5244. IEEE (2018)
Elboushaki, A., Hannane, R., Afdel, K., Koutti, L.: MultiD-CNN: a multi-dimensional feature learning approach based on deep convolutional networks for gesture recognition in RGB-D image sequences. Expert Syst. Appl. 139, 112829 (2019). https://doi.org/10.1016/j.eswa.2019.112829
Article Google Scholar
dos Santos, C.C., Samatelo, J.L.A., Vassallo, R.F.: Dynamic gesture recognition by using CNNs and star RGB: a temporal information condensation. Neurocomputing 400, 238–254 (2020). https://doi.org/10.1016/j.neucom.2020.03.038
Article Google Scholar
Asadi-Aghbolaghi, M., Clapes, A., Bellantonio, M., Escalante, H.J., Ponce-Lopez, V., Baro, X., Guyon, I., Kasaei, S., Escalera, S.: A Survey on Deep Learning Based Approaches for Action and Gesture Recognition in Image Sequences. In: Proc. - 12th IEEE Int. Conf. Autom. Face Gesture Recognition, FG 2017 - 1st Int. Work. Adapt. Shot Learn. Gesture Underst. Prod. ASL4GUP 2017, Biometrics Wild, Bwild 2017, Heteroge. 476–483 (2017). https://doi.org/10.1109/FG.2017.150
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM. 60, 84–90 (2017). https://doi.org/10.1145/3065386
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv Prepr. arXiv1409.1556. (2014)
Yang, X., Zhang, C., Tian, Y.: Recognizing actions using depth motion maps-based histograms of oriented gradients. In: MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia. pp. 1057–1060 (2012)
Wang, P., Li, W., Liu, S., Zhang, Y., Gao, Z., Ogunbona, P.: Large-scale continuous gesture recognition using convolutional neural networks. In: Proceedings - International Conference on Pattern Recognition. pp. 13–18. Institute of Electrical and Electronics Engineers Inc. (2016)
Wang, P., Li, W., Liu, S., Gao, Z., Tang, C., Ogunbona, P.: Large-scale isolated gesture recognition using convolutional neural networks. Proc. - Int. Conf. Pattern Recognit. (2016). https://doi.org/10.1109/ICPR.2016.7899599
Article Google Scholar
Wang, P., Li, W., Gao, Z., Tang, C., Ogunbona, P.O.: Depth pooling based large-scale 3-D action recognition with convolutional neural networks. IEEE Trans. Multimed. 20, 1051–1061 (2018). https://doi.org/10.1109/TMM.2018.2818329
Article Google Scholar
Barros, P., Parisi, G.I., Jirak, D., Wermter, S.: Real-time gesture recognition using a humanoid robot with a deep neural architecture. In: IEEE-RAS Int. Conf. Humanoid Robot. 2015-Febru, 646–651 (2015). https://doi.org/10.1109/HUMANOIDS.2014.7041431
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proc. IEEE Int. Conf. Comput. Vis. 2015 Inter, 4489–4497 (2015). https://doi.org/10.1109/ICCV.2015.510
Cao, C., Zhang, Y., Wu, Y., Lu, H., Cheng, J.: Egocentric gesture recognition using recurrent 3D convolutional neural networks with spatiotemporal transformer modules. In: Proc. IEEE Int. Conf. Comput. Vis. 2017-Octob, 3783–3791 (2017). https://doi.org/10.1109/ICCV.2017.406
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: MM 2014 - Proceedings of the 2014 ACM Conference on Multimedia. pp. 675–678. Association for Computing Machinery, Inc, New York, NY, USA (2014)
Li, Z., Zheng, Z., Lin, F., Leung, H., Li, Q.: Action recognition from depth sequence using depth motion maps-based local ternary patterns and CNN. Multimed. Tools Appl. 78, 19587–19601 (2019). https://doi.org/10.1007/s11042-019-7356-3
Article Google Scholar
Azad, R., Asadi-Aghbolaghi, M., Kasaei, S., Escalera, S.: Dynamic 3D hand gesture recognition by learning weighted depth motion maps. IEEE Trans. Circuits Syst. Video Technol. 29, 1729–1740 (2019). https://doi.org/10.1109/TCSVT.2018.2855416
Article Google Scholar
Zhang, Z., Wei, S., Song, Y., Zhang, Y.: Gesture recognition using enhanced depth motion map and static pose map. In: Proc. - 12th IEEE Int. Conf. Autom. Face Gesture Recognition, FG 2017 - 1st Int. Work. Adapt. Shot Learn. Gesture Underst. Prod. ASL4GUP 2017, Biometrics Wild, Bwild 2017, Heteroge. 238–244 (2017). https://doi.org/10.1109/FG.2017.38
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2016-Decem, 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pp. 2818–2826 (2016)
Lin, M., Chen, Q., Yan, S.: Network In: Network. 2nd Int. Conf. Learn. Represent. ICLR 2014 - Conf. Track Proc. (2013)
Zhang, Y., Cao, C., Cheng, J., Lu, H.: EgoGesture: A New Dataset and Benchmark for Egocentric Hand Gesture Recognition. IEEE Trans. Multimed. 20, 1038–1050 (2018). https://doi.org/10.1109/TMM.2018.2808769
Article Google Scholar
Kurakin, A., Zhang, Z., Liu, Z.: A real time system for dynamic hand gesture recognition with a depth sensor. In: European Signal Processing Conference. pp. 1975–1979 (2012)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. arXiv Prepr. arXiv1406.2199. 1, 568–576 (2014)
Chen, C., Jafari, R., Kehtarnavaz, N.: Action recognition from depth sequences using depth motion maps-based local binary patterns. In: Proceedings—2015 IEEE Winter Conference on Applications of Computer Vision, WACV 2015. pp. 1092–1099. Institute of Electrical and Electronics Engineers Inc. (2015)

Download references

Author information

Authors and Affiliations

Electronics and Communication Engineering Department, National Institute of Technology, Silchar, Assam, 788010, India
Rahul Jain, Ram Kumar Karsh & Abul Abbas Barbhuiya

Authors

Rahul Jain
View author publications
You can also search for this author in PubMed Google Scholar
Ram Kumar Karsh
View author publications
You can also search for this author in PubMed Google Scholar
Abul Abbas Barbhuiya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rahul Jain.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jain, R., Karsh, R.K. & Barbhuiya, A.A. Encoded motion image-based dynamic hand gesture recognition. Vis Comput 38, 1957–1974 (2022). https://doi.org/10.1007/s00371-021-02259-3

Download citation

Accepted: 11 July 2021
Published: 09 August 2021
Issue Date: June 2022
DOI: https://doi.org/10.1007/s00371-021-02259-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Encoded motion image-based dynamic hand gesture recognition

Abstract

Access this article

Similar content being viewed by others

HandSense: smart multimodal hand gesture recognition based on deep neural networks

Towards an end-to-end isolated and continuous deep gesture recognition process

mXception and dynamic image for hand gesture recognition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Encoded motion image-based dynamic hand gesture recognition

Abstract

Access this article

Similar content being viewed by others

HandSense: smart multimodal hand gesture recognition based on deep neural networks

Towards an end-to-end isolated and continuous deep gesture recognition process

mXception and dynamic image for hand gesture recognition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation