Abstract
Hand gesture provides a means for human to interact through a series of gestures. While hand gesture plays a significant role in human–computer interaction, it also breaks down the communication barrier and simplifies communication process between the general public and the hearing-impaired community. This paper outlines a convolutional neural network (CNN) integrated with spatial pyramid pooling (SPP), dubbed CNN–SPP, for vision-based hand gesture recognition. SPP is discerned mitigating the problem found in conventional pooling by having multi-level pooling stacked together to extend the features being fed into a fully connected layer. Provided with inputs of varying sizes, SPP also yields a fixed-length feature representation. Extensive experiments have been conducted to scrutinize the CNN–SPP performance on two well-known American sign language (ASL) datasets and one NUS hand gesture dataset. Our empirical results disclose that CNN–SPP prevails over other deep learning-driven instances.
Similar content being viewed by others
References
Lim KM, Tan AW, Tan SC (2016) Block-based histogram of optical flow for isolated sign language recognition. J Vis Commun Image Represent 40:538–545
Lim KM, Tan AW, Tan SC (2016) A feature covariance matrix with serial particle filter for isolated sign language recognition. Expert Syst Appl 54:208–218
Lim KM, Tan AW, Tan SC (2017) A four dukkha state-space model for hand tracking. Neurocomputing 267:311–319
Kour KP, Mathew L (2017) Sign language recognition using image processing. Int J Adv Res Comput Sci Softw Eng 7(8):10
Kumar BP, Manjunatha M (2017) A hybrid gesture recognition method for American sign language. Indian J Sci Technol 10(1):1–12
He Y, Li G, Liao Y, Sun Y, Kong J, Jiang G, Jiang D, Tao B, Xu S, Liu H (2017) Gesture recognition based on an improved local sparse representation classification algorithm. Cluster Comput 22:10935–10946
Muthukumar K, Poorani S, Gobhinath S (2017) Vision based hand gesture recognition for Indian sign languages using local binary patterns with support vector machine classifier. Adv Natl Appl Sci 11(6):314–322
Hu Y (2018) Finger spelling recognition using depth information and support vector machine. Multimedia Tools Appl 77(21):29043–29057
Pariwat T, Seresangtakul, P (2017) Thai finger-spelling sign language recognition using global and local features with SVM. In: 2017 9th international conference on knowledge and smart technology (KST), pp 116–120. IEEE
Silanon K (2017) Thai finger-spelling recognition using a cascaded classifier based on histogram of orientation gradient features. Computational intelligence and neuroscience 2017
Jadooki S, Mohamad D, Saba T, Almazyad AS, Rehman A (2017) Fused features mining for depth-based hand gesture recognition to classify blind human communication. Neural Comput Appl 28(11):3285–3294
Zare AA, Zahiri SH (2018) Recognition of a real-time signer-independent static Farsi sign language based on fourier coefficients amplitude. Int J Mach Learn Cybernet 9(5):727–741
Nai W, Liu Y, Rempel D, Wang Y (2017) Fast hand posture classification using depth features extracted from random line segments. Pattern Recogn 65:1–10
Hu Y, Zhao HF, Wang ZG (2018) Sign language fingerspelling recognition using depth information and deep belief networks. Int J Pattern Recognit Artif Intell 32(06):1850018
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Lim KM, Tan AWC, Lee CP, Tan SC (2019) Isolated sign language recognition using convolutional neural network hand modelling and hand energy image. Multimedia Tools Appl 78:1–28
Nakjai P, Katanyukul T (2019) Hand sign recognition for Thai finger spelling: an application of convolution neural network. J Signal Process Syst 91(2):131–146
Li Y, Wang X, Liu W, Feng B (2018) Deep attention network for joint hand gesture localization and recognition using static rgb-d images. Inf Sci 441:66–78
Kwolek B, Sako S (2017) Learning siamese features for finger spelling recognition. In: International conference on advanced concepts for intelligent vision systems, Springer, pp 225–236
Hosoe H, Sako S, Kwolek B (2017) Recognition of jsl finger spelling using convolutional neural networks. In: 2017 Fifteenth IAPR international conference on machine vision applications (MVA), IEEE, pp 85–88
Gao Q, Liu J, Ju Z, Li Y, Zhang T, Zhang L (2017) Static hand gesture recognition with parallel cnns for space human-robot interaction. In: International conference on intelligent robotics and applications, Springer, pp 462–473
Kania, K, Markowska-Kaczmar U (2018) American sign language fingerspelling recognition using wide residual networks. In: International conference on artificial intelligence and soft computing, Springer, pp 97–107
Oliveira M, Chatbri H, Little S, Ferstl Y, O’Connor NE, Sutherland A (2017) Irish sign language recognition using principal component analysis and convolutional neural networks. In: 2017 international conference on digital image computing: techniques and applications (DICTA), IEEE, pp 1–8
Flores CJL, Cutipa AG, Enciso RL (2017) Application of convolutional neural networks for static hand gestures recognition under different invariant features. In: 2017 IEEE XXIV international conference on electronics, electrical engineering and computing (INTERCON), IEEE, pp 1–4
Alani AA, Cosma G, Taherkhani A, McGinnity T (2018) Hand gesture recognition using an adapted convolutional neural network with data augmentation. In: 2018 4th international conference on information management (ICIM), IEEE, pp 5–12
Arenas JOP, Moreno RJ, Beleño RDH (2018) Convolutional neural network with a dag architecture for control of a robotic arm by means of hand gestures. Contemp Eng Sci 11(12):547–557
Tazhigaliyeva N, Kalidolda N, Imashev A, Islam S, Aitpayev K, Parisi GI, Sandygulova A (2017) Cyrillic manual alphabet recognition in rgb and rgb-d data for sign language interpreting robotic system (slirs). In: 2017 IEEE international conference on robotics and automation (ICRA), IEEE, pp 4531–4536
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach F, Blei D (eds) Proceedings of the 32nd international conference on machine learning, Proceedings of machine learning research, vol 37, pp 448–456. PMLR, Lille, France
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Pugeault N, Bowden R (2011) Spelling it out: real-time asl fingerspelling recognition. In: 2011 IEEE international conference on computer vision workshops (ICCV workshops), IEEE, pp 1114–1119
Barczak ALC, Reyes N.H, Abastillas M, Piccio A, Susnjak T (2011) A new 2d static hand gesture colour image dataset for asl gestures
Kumar PP, Vadakkepat P, Loh AP (2010) Hand posture and face recognition using a fuzzy-rough approach. Int J Humanoid Rob 7(03):331–356
Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298
Zhang L, Yang F, Zhang YD, Zhu YJ (2016) Road crack detection using deep convolutional neural network. In: 2016 IEEE international conference on image processing (ICIP), IEEE, pp 3708–3712
Kagaya H, Aizawa K, Ogawa, M (2014) Food detection and recognition using convolutional neural network. In: Proceedings of the 22nd ACM international conference on Multimedia, pp. 1085–1088
Pigou L, Dieleman S, Kindermans PJ, Schrauwen B (2014) Sign language recognition using convolutional neural networks. In: European conference on computer vision, Springer, pp 572–578
Ma Y, Zhou G, Wang S, Zhao H, Jung W (2018) SignFi: sign language recognition using WiFi. Proc ACM Interact Mobile Wearable Ubiquit Technol 2(1):1–21
Acknowledgements
This research was supported by Fundamental Research Grant Scheme, Grant No. MMUE/190026.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tan, Y.S., Lim, K.M., Tee, C. et al. Convolutional neural network with spatial pyramid pooling for hand gesture recognition. Neural Comput & Applic 33, 5339–5351 (2021). https://doi.org/10.1007/s00521-020-05337-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05337-0