Abstract
Dynamic hand gesture recognition is a crucial need in a smart human–computer interaction (HCI) system. Dynamic imaging has been recently introduced as a gesture description paradigm for simultaneously capturing spatial, temporal, and structural information from the depth video. However, existing techniques based on dynamic images cannot differentiate gesture movements that follow the same path but in opposite directions, for example, “moving a hand down” versus “moving a hand up.” To solve the issue, we have proposed an approach in which a gesture depth video is converted into a single image called an encoded motion image (EMI). The EMI has been given to a modified pre-trained 2D-CNN(two-dimensional convolutional neural network) based on VGG-19 to classify gestures present in the depth video. The experiments were carried out on two datasets: a multi-modal large-scale EgoGesture and MSR Gesture 3D datasets. For the EgoGesture dataset, the proposed method achieved an accuracy of 90.63%. Such a result provides state-of-the-art accuracy when employing this large-scale dataset of 83 classes and the 2D-CNN approach. For the MSR Gesture 3D dataset, the proposed method accuracy is 99.24%, which outperforms the state-of-the-art methods. This work also highlights the recognition accuracy and precision of each gesture. Instead of high-end systems like GPU, the experiments are conducted using a web-based data science environment called Kaggle to demonstrate the work's economic efficiency.
Similar content being viewed by others
References
Mitra, S., Acharya, T.: Gesture recognition: a survey. IEEE Trans. Syst. Man Cybern. Part C 37, 311–324 (2007). https://doi.org/10.1109/TSMCC.2007.893280
Hasan, H., Abdul-Kareem, S.: RETRACTED ARTICLE: Human–computer interaction using vision-based hand gesture recognition systems: a survey. Neural Comput. Appl. 25, 251–261 (2014). https://doi.org/10.1007/s00521-013-1481-0
Chang, C.C., Chen, J.J., Tai, W.K., Han, C.C.: New approach for static gesture recognition. J. Inf. Sci. Eng. 22, 1047–1057 (2006). https://doi.org/10.6688/JISE.2006.22.5.4
Kopuklu, O., Gunduz, A., Kose, N., Rigoll, G.: Online dynamic hand gesture recognition including efficiency analysis. IEEE Trans. Biomet. Behav. Identity Sci. 2, 85–97 (2020). https://doi.org/10.1109/tbiom.2020.2968216
Pavlovic, V.I., Sharma, R., Huang, T.S.: Visual interpretation of hand gestures for human-computer interaction: a review. IEEE Trans. Pattern Anal. Mach. Intell. 19, 677–695 (1997). https://doi.org/10.1109/34.598226
Barbhuiya, A.A., Karsh, R.K., Jain, R.: CNN based feature extraction and classification for sign language. Multimed. Tools Appl. 80, 3051–3069 (2021). https://doi.org/10.1007/s11042-020-09829-y
Wang, P., Li, W., Ogunbona, P., Wan, J., Escalera, S.: RGB-D-based human motion recognition with deep learning: a survey. Comput. Vis. Image Underst. 171, 118–139 (2018). https://doi.org/10.1016/j.cviu.2018.04.007
Yang, R., Yang, R.: DMM-pyramid based deep architectures for action recognition with depth cameras. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). pp. 37–49. Springer, Cham (2015)
O’Mahony, N., Campbell, S., Carvalho, A., Harapanahalli, S., Hernandez, G.V., Krpalkova, L., Riordan, D., Walsh, J.: Deep learning vs traditional computer vision. In: Arai, K., Kapoor, S. (eds.) Advances in Intelligent Systems and Computing, pp. 128–144. Springer, Cham (2020)
Al-Shamayleh, A.S., Ahmad, R., Abushariah, M.A.M., Alam, K.A., Jomhari, N.: A systematic literature review on vision based gesture recognition techniques. Multimed. Tools Appl. 77, 28121–28184 (2018). https://doi.org/10.1007/s11042-018-5971-z
Koehn, P.: Combining genetic algorithms and neural networks : the encoding problem (1994)
Ji, S., Xu, W., Yang, M., Yu, K.: 3D Convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35, 221–231 (2013). https://doi.org/10.1109/TPAMI.2012.59
Liu, Z., Zhang, C., Tian, Y.: 3D-based Deep convolutional neural network for action recognition with depth sequences. Image Vis. Comput. 55, 93–100 (2016). https://doi.org/10.1016/j.imavis.2016.04.004
Guyon, I., Athitsos, V., Jangyodsuk, P., Escalante, H.J.: The ChaLearn gesture dataset (CGD 2011). Mach. Vis. Appl. 25, 1929–1951 (2014). https://doi.org/10.1007/s00138-014-0596-3
Donahue, J., Hendricks, L.A., Rohrbach, M., Venugopalan, S., Guadarrama, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans. Pattern Anal. Mach. Intell. 39, 677–691 (2017). https://doi.org/10.1109/TPAMI.2016.2599174
Molchanov, P., Yang, X., Gupta, S., Kim, K., Tyree, S., Kautz, J.: Online detection and classification of dynamic hand gestures with recurrent 3D convolutional neural networks. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pp. 4207–4215 (2016)
Duan, J., Wan, J., Li, S.Z., Zhou, S., Guo, X.: A unified framework for multi-modal isolated gesture recognition. ACM Trans. Multimed. Comput. Commun. (2018). https://doi.org/10.1145/3131343
Narayana, P., Beveridge, J.R., Draper, B.A.: Gesture recognition: focus on the hands. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5235–5244. IEEE (2018)
Elboushaki, A., Hannane, R., Afdel, K., Koutti, L.: MultiD-CNN: a multi-dimensional feature learning approach based on deep convolutional networks for gesture recognition in RGB-D image sequences. Expert Syst. Appl. 139, 112829 (2019). https://doi.org/10.1016/j.eswa.2019.112829
dos Santos, C.C., Samatelo, J.L.A., Vassallo, R.F.: Dynamic gesture recognition by using CNNs and star RGB: a temporal information condensation. Neurocomputing 400, 238–254 (2020). https://doi.org/10.1016/j.neucom.2020.03.038
Asadi-Aghbolaghi, M., Clapes, A., Bellantonio, M., Escalante, H.J., Ponce-Lopez, V., Baro, X., Guyon, I., Kasaei, S., Escalera, S.: A Survey on Deep Learning Based Approaches for Action and Gesture Recognition in Image Sequences. In: Proc. - 12th IEEE Int. Conf. Autom. Face Gesture Recognition, FG 2017 - 1st Int. Work. Adapt. Shot Learn. Gesture Underst. Prod. ASL4GUP 2017, Biometrics Wild, Bwild 2017, Heteroge. 476–483 (2017). https://doi.org/10.1109/FG.2017.150
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM. 60, 84–90 (2017). https://doi.org/10.1145/3065386
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv Prepr. arXiv1409.1556. (2014)
Yang, X., Zhang, C., Tian, Y.: Recognizing actions using depth motion maps-based histograms of oriented gradients. In: MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia. pp. 1057–1060 (2012)
Wang, P., Li, W., Liu, S., Zhang, Y., Gao, Z., Ogunbona, P.: Large-scale continuous gesture recognition using convolutional neural networks. In: Proceedings - International Conference on Pattern Recognition. pp. 13–18. Institute of Electrical and Electronics Engineers Inc. (2016)
Wang, P., Li, W., Liu, S., Gao, Z., Tang, C., Ogunbona, P.: Large-scale isolated gesture recognition using convolutional neural networks. Proc. - Int. Conf. Pattern Recognit. (2016). https://doi.org/10.1109/ICPR.2016.7899599
Wang, P., Li, W., Gao, Z., Tang, C., Ogunbona, P.O.: Depth pooling based large-scale 3-D action recognition with convolutional neural networks. IEEE Trans. Multimed. 20, 1051–1061 (2018). https://doi.org/10.1109/TMM.2018.2818329
Barros, P., Parisi, G.I., Jirak, D., Wermter, S.: Real-time gesture recognition using a humanoid robot with a deep neural architecture. In: IEEE-RAS Int. Conf. Humanoid Robot. 2015-Febru, 646–651 (2015). https://doi.org/10.1109/HUMANOIDS.2014.7041431
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proc. IEEE Int. Conf. Comput. Vis. 2015 Inter, 4489–4497 (2015). https://doi.org/10.1109/ICCV.2015.510
Cao, C., Zhang, Y., Wu, Y., Lu, H., Cheng, J.: Egocentric gesture recognition using recurrent 3D convolutional neural networks with spatiotemporal transformer modules. In: Proc. IEEE Int. Conf. Comput. Vis. 2017-Octob, 3783–3791 (2017). https://doi.org/10.1109/ICCV.2017.406
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: MM 2014 - Proceedings of the 2014 ACM Conference on Multimedia. pp. 675–678. Association for Computing Machinery, Inc, New York, NY, USA (2014)
Li, Z., Zheng, Z., Lin, F., Leung, H., Li, Q.: Action recognition from depth sequence using depth motion maps-based local ternary patterns and CNN. Multimed. Tools Appl. 78, 19587–19601 (2019). https://doi.org/10.1007/s11042-019-7356-3
Azad, R., Asadi-Aghbolaghi, M., Kasaei, S., Escalera, S.: Dynamic 3D hand gesture recognition by learning weighted depth motion maps. IEEE Trans. Circuits Syst. Video Technol. 29, 1729–1740 (2019). https://doi.org/10.1109/TCSVT.2018.2855416
Zhang, Z., Wei, S., Song, Y., Zhang, Y.: Gesture recognition using enhanced depth motion map and static pose map. In: Proc. - 12th IEEE Int. Conf. Autom. Face Gesture Recognition, FG 2017 - 1st Int. Work. Adapt. Shot Learn. Gesture Underst. Prod. ASL4GUP 2017, Biometrics Wild, Bwild 2017, Heteroge. 238–244 (2017). https://doi.org/10.1109/FG.2017.38
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2016-Decem, 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pp. 2818–2826 (2016)
Lin, M., Chen, Q., Yan, S.: Network In: Network. 2nd Int. Conf. Learn. Represent. ICLR 2014 - Conf. Track Proc. (2013)
Zhang, Y., Cao, C., Cheng, J., Lu, H.: EgoGesture: A New Dataset and Benchmark for Egocentric Hand Gesture Recognition. IEEE Trans. Multimed. 20, 1038–1050 (2018). https://doi.org/10.1109/TMM.2018.2808769
Kurakin, A., Zhang, Z., Liu, Z.: A real time system for dynamic hand gesture recognition with a depth sensor. In: European Signal Processing Conference. pp. 1975–1979 (2012)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. arXiv Prepr. arXiv1406.2199. 1, 568–576 (2014)
Chen, C., Jafari, R., Kehtarnavaz, N.: Action recognition from depth sequences using depth motion maps-based local binary patterns. In: Proceedings—2015 IEEE Winter Conference on Applications of Computer Vision, WACV 2015. pp. 1092–1099. Institute of Electrical and Electronics Engineers Inc. (2015)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jain, R., Karsh, R.K. & Barbhuiya, A.A. Encoded motion image-based dynamic hand gesture recognition. Vis Comput 38, 1957–1974 (2022). https://doi.org/10.1007/s00371-021-02259-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-021-02259-3