Skip to main content
Log in

Convolutional neural network with spatial pyramid pooling for hand gesture recognition

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Hand gesture provides a means for human to interact through a series of gestures. While hand gesture plays a significant role in human–computer interaction, it also breaks down the communication barrier and simplifies communication process between the general public and the hearing-impaired community. This paper outlines a convolutional neural network (CNN) integrated with spatial pyramid pooling (SPP), dubbed CNN–SPP, for vision-based hand gesture recognition. SPP is discerned mitigating the problem found in conventional pooling by having multi-level pooling stacked together to extend the features being fed into a fully connected layer. Provided with inputs of varying sizes, SPP also yields a fixed-length feature representation. Extensive experiments have been conducted to scrutinize the CNN–SPP performance on two well-known American sign language (ASL) datasets and one NUS hand gesture dataset. Our empirical results disclose that CNN–SPP prevails over other deep learning-driven instances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Lim KM, Tan AW, Tan SC (2016) Block-based histogram of optical flow for isolated sign language recognition. J Vis Commun Image Represent 40:538–545

    Article  Google Scholar 

  2. Lim KM, Tan AW, Tan SC (2016) A feature covariance matrix with serial particle filter for isolated sign language recognition. Expert Syst Appl 54:208–218

    Article  Google Scholar 

  3. Lim KM, Tan AW, Tan SC (2017) A four dukkha state-space model for hand tracking. Neurocomputing 267:311–319

    Article  Google Scholar 

  4. Kour KP, Mathew L (2017) Sign language recognition using image processing. Int J Adv Res Comput Sci Softw Eng 7(8):10

    Article  Google Scholar 

  5. Kumar BP, Manjunatha M (2017) A hybrid gesture recognition method for American sign language. Indian J Sci Technol 10(1):1–12

    Article  Google Scholar 

  6. He Y, Li G, Liao Y, Sun Y, Kong J, Jiang G, Jiang D, Tao B, Xu S, Liu H (2017) Gesture recognition based on an improved local sparse representation classification algorithm. Cluster Comput 22:10935–10946

    Article  Google Scholar 

  7. Muthukumar K, Poorani S, Gobhinath S (2017) Vision based hand gesture recognition for Indian sign languages using local binary patterns with support vector machine classifier. Adv Natl Appl Sci 11(6):314–322

    Google Scholar 

  8. Hu Y (2018) Finger spelling recognition using depth information and support vector machine. Multimedia Tools Appl 77(21):29043–29057

    Article  Google Scholar 

  9. Pariwat T, Seresangtakul, P (2017) Thai finger-spelling sign language recognition using global and local features with SVM. In: 2017 9th international conference on knowledge and smart technology (KST), pp 116–120. IEEE

  10. Silanon K (2017) Thai finger-spelling recognition using a cascaded classifier based on histogram of orientation gradient features. Computational intelligence and neuroscience 2017

  11. Jadooki S, Mohamad D, Saba T, Almazyad AS, Rehman A (2017) Fused features mining for depth-based hand gesture recognition to classify blind human communication. Neural Comput Appl 28(11):3285–3294

    Article  Google Scholar 

  12. Zare AA, Zahiri SH (2018) Recognition of a real-time signer-independent static Farsi sign language based on fourier coefficients amplitude. Int J Mach Learn Cybernet 9(5):727–741

    Article  Google Scholar 

  13. Nai W, Liu Y, Rempel D, Wang Y (2017) Fast hand posture classification using depth features extracted from random line segments. Pattern Recogn 65:1–10

    Article  Google Scholar 

  14. Hu Y, Zhao HF, Wang ZG (2018) Sign language fingerspelling recognition using depth information and deep belief networks. Int J Pattern Recognit Artif Intell 32(06):1850018

    Article  MathSciNet  Google Scholar 

  15. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  16. Lim KM, Tan AWC, Lee CP, Tan SC (2019) Isolated sign language recognition using convolutional neural network hand modelling and hand energy image. Multimedia Tools Appl 78:1–28

    Article  Google Scholar 

  17. Nakjai P, Katanyukul T (2019) Hand sign recognition for Thai finger spelling: an application of convolution neural network. J Signal Process Syst 91(2):131–146

    Article  Google Scholar 

  18. Li Y, Wang X, Liu W, Feng B (2018) Deep attention network for joint hand gesture localization and recognition using static rgb-d images. Inf Sci 441:66–78

    Article  MathSciNet  Google Scholar 

  19. Kwolek B, Sako S (2017) Learning siamese features for finger spelling recognition. In: International conference on advanced concepts for intelligent vision systems, Springer, pp 225–236

  20. Hosoe H, Sako S, Kwolek B (2017) Recognition of jsl finger spelling using convolutional neural networks. In: 2017 Fifteenth IAPR international conference on machine vision applications (MVA), IEEE, pp 85–88

  21. Gao Q, Liu J, Ju Z, Li Y, Zhang T, Zhang L (2017) Static hand gesture recognition with parallel cnns for space human-robot interaction. In: International conference on intelligent robotics and applications, Springer, pp 462–473

  22. Kania, K, Markowska-Kaczmar U (2018) American sign language fingerspelling recognition using wide residual networks. In: International conference on artificial intelligence and soft computing, Springer, pp 97–107

  23. Oliveira M, Chatbri H, Little S, Ferstl Y, O’Connor NE, Sutherland A (2017) Irish sign language recognition using principal component analysis and convolutional neural networks. In: 2017 international conference on digital image computing: techniques and applications (DICTA), IEEE, pp 1–8

  24. Flores CJL, Cutipa AG, Enciso RL (2017) Application of convolutional neural networks for static hand gestures recognition under different invariant features. In: 2017 IEEE XXIV international conference on electronics, electrical engineering and computing (INTERCON), IEEE, pp 1–4

  25. Alani AA, Cosma G, Taherkhani A, McGinnity T (2018) Hand gesture recognition using an adapted convolutional neural network with data augmentation. In: 2018 4th international conference on information management (ICIM), IEEE, pp 5–12

  26. Arenas JOP, Moreno RJ, Beleño RDH (2018) Convolutional neural network with a dag architecture for control of a robotic arm by means of hand gestures. Contemp Eng Sci 11(12):547–557

    Article  Google Scholar 

  27. Tazhigaliyeva N, Kalidolda N, Imashev A, Islam S, Aitpayev K, Parisi GI, Sandygulova A (2017) Cyrillic manual alphabet recognition in rgb and rgb-d data for sign language interpreting robotic system (slirs). In: 2017 IEEE international conference on robotics and automation (ICRA), IEEE, pp 4531–4536

  28. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach F, Blei D (eds) Proceedings of the 32nd international conference on machine learning, Proceedings of machine learning research, vol 37, pp 448–456. PMLR, Lille, France

  29. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  30. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  31. Pugeault N, Bowden R (2011) Spelling it out: real-time asl fingerspelling recognition. In: 2011 IEEE international conference on computer vision workshops (ICCV workshops), IEEE, pp 1114–1119

  32. Barczak ALC, Reyes N.H, Abastillas M, Piccio A, Susnjak T (2011) A new 2d static hand gesture colour image dataset for asl gestures

  33. Kumar PP, Vadakkepat P, Loh AP (2010) Hand posture and face recognition using a fuzzy-rough approach. Int J Humanoid Rob 7(03):331–356

    Article  Google Scholar 

  34. Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298

    Article  Google Scholar 

  35. Zhang L, Yang F, Zhang YD, Zhu YJ (2016) Road crack detection using deep convolutional neural network. In: 2016 IEEE international conference on image processing (ICIP), IEEE, pp 3708–3712

  36. Kagaya H, Aizawa K, Ogawa, M (2014) Food detection and recognition using convolutional neural network. In: Proceedings of the 22nd ACM international conference on Multimedia, pp. 1085–1088

  37. Pigou L, Dieleman S, Kindermans PJ, Schrauwen B (2014) Sign language recognition using convolutional neural networks. In: European conference on computer vision, Springer, pp 572–578

  38. Ma Y, Zhou G, Wang S, Zhao H, Jung W (2018) SignFi: sign language recognition using WiFi. Proc ACM Interact Mobile Wearable Ubiquit Technol 2(1):1–21

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by Fundamental Research Grant Scheme, Grant No. MMUE/190026.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kian Ming Lim.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tan, Y.S., Lim, K.M., Tee, C. et al. Convolutional neural network with spatial pyramid pooling for hand gesture recognition. Neural Comput & Applic 33, 5339–5351 (2021). https://doi.org/10.1007/s00521-020-05337-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05337-0

Keywords

Navigation