Skip to main content
Log in

SHAPE: a dataset for hand gesture recognition

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Hand gestures are becoming an important part of the communication method between humans and machines in the era of fast-paced urbanization. This paper introduces a new standard dataset for hand gesture recognition, Static HAnd PosturE (SHAPE), with adequate side, variation, and practicality. Compared with the previous datasets, our dataset has more classes, subjects, or scenes than other datasets. In addition, the SHAPE dataset is also one of the first datasets to focus on Asian subjects with Asian hand gestures. The SHAPE dataset contains more than 34,000 images collected from 20 distinct subjects with different clothes and backgrounds. A recognition architecture is also presented to investigate the proposed dataset. The architecture consists of two phases that are the hand detection phase for preprocessing and the classification phase by customized state-of-the-art deep neural network models. This paper investigates not only the high accuracy, but also the lightweight hand gesture recognition models that are suitable for resource-constrained devices such as portable edge devices. The promising application of this study is to create a human–machine interface that solves the problem of insufficient space for a keyboard or a mouse in small devices. Our experiments showed that the proposed architecture could obtain high accuracy with the self-built dataset. Details of our dataset can be seen online at https://users.soict.hust.edu.vn/linhdt/dataset/

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availibility

Details of our dataset can be found online at https://users.soict.hust.edu.vn/linhdt/dataset/. Accessing our dataset is available from the corresponding author upon reasonable request.

References

  1. ALC B, Reyes N, Abastillas M, Piccio A, Susnjak T (2011) A new 2d static hand gesture colour image dataset for asl gestures

  2. Priyal SP, Bora PK (2013) A robust static hand gesture recognition system using geometry based normalizations and krawtchouk moments. Pattern Recogn 46(8):2202–2219. https://doi.org/10.1016/j.patcog.2013.01.033

    Article  MATH  Google Scholar 

  3. Pinto RF, Borges CD, Almeida A, Paula IC (2019) Static hand gesture recognition based on convolutional neural networks. J Electr Comput Eng

  4. Zhang Y, Cao C, Cheng J, Lu H (2018) Egogesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans Multimed 20(5):1038–1050. https://doi.org/10.1109/TMM.2018.2808769

    Article  Google Scholar 

  5. Nikouei SY, Chen Y, Song S, Xu R, Choi B-Y, Faughnan TR (2018) Real-time human detection as an edge service enabled by a lightweight cnn. In: 2018 IEEE International Conference on Edge Computing (EDGE), pp 125–129. https://doi.org/10.1109/EDGE.2018.00025

  6. Nikouei SY, Chen Y, Song S, Xu R, Choi B-Y, Faughnan T (2018) Smart surveillance as an edge network service: from harr-cascade, svm to a lightweight CNN. In: 2018 IEEE 4th International Conference on Collaboration and Internet Computing (cic), pp 256–265. https://doi.org/10.1109/CIC.2018.00042

  7. Triesch J, Von Der Malsburg C (1996) Robust classification of hand postures against complex backgrounds. In: Proceedings of the Second International Conference on Automatic Face and Gesture Recognition, pp 170–175. https://doi.org/10.1109/AFGR.1996.557260

  8. Kumar PP, Vadakkepat P, Loh AP (2010) Hand posture and face recognition using a fuzzy-rough approach. Int J HR 7(03):331–356. https://doi.org/10.1142/S0219843610002180

    Article  Google Scholar 

  9. Pisharady PK, Vadakkepat P, Loh AP (2013) Attention based detection and recognition of hand postures against complex backgrounds. Int J Comput Vis 101(3):403–419. https://doi.org/10.1007/s11263-012-0560-5

    Article  Google Scholar 

  10. Baraldi L, Paci F, Serra G, Benini L, Cucchiara R (2014) Gesture recognition in ego-centric videos using dense trajectories and hand segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 688–693. https://doi.org/10.1109/CVPRW.2014.107

  11. Dreuw P, Neidle C, Athitsos V, Sclaroff S, Ney H (2008) Benchmark databases for video-based automatic sign language recognition. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), Marrakech, Morocco

  12. Just A, Rodriguez Y, Marcel S (2006) Hand posture classification and recognition using the modified census transform. In: 7th International Conference on Automatic Face and Gesture Recognition (FGR06), pp 351–356. https://doi.org/10.1109/FGR.2006.62

  13. Cihan Camgöz N, Koller O, Hadfield S, Bowden R (2020) Sign language transformers: Joint end-to-end sign language recognition and translation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 10020–10030. https://doi.org/10.1109/CVPR42600.2020.01004

  14. Messer T (2009) Static hand gesture recognition. University of Fribourg, Switzerland

    Google Scholar 

  15. Rokade US, Doye D, Kokare M (2009) Hand gesture recognition using object based key frame selection. In: 2009 IEEE International Conference on Digital Image Processing, pp 288–291. https://doi.org/10.1109/ICDIP.2009.74

  16. Ren Y, Zhang F (2009) Hand gesture recognition based on meb-svm. In: 2009 International Conference on Embedded Software and Systems, pp 344–349. https://doi.org/10.1109/ICESS.2009.21

  17. Simon T, Joo H, Matthews I, Sheikh Y (2017) Hand keypoint detection in single images using multiview bootstrapping. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4645–4653. https://doi.org/10.1109/CVPR.2017.494

  18. Moon G, Lee KM (2020) I2l-meshnet: Image-to-lixel prediction network for accurate 3d human pose and mesh estimation from a single rgb image. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, Proceedings, Part VII 16, pp 752–768. Springer

  19. Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934

  20. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90

  21. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4700–4708. https://doi.org/10.1109/CVPR.1997.609286

  22. Howard A, Sandler M, Chu G, Chen L-C, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V, et al (2019) Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1314–1324. https://doi.org/10.1109/ICCV.2019.00140

  23. Tan M, Le Q (2019) EfficientNet: Rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp 6105–6114

  24. Tan M, Chen B, Pang R, Vasudevan V, Sandler M, Howard A, Le QV (2019) Mnasnet: Platform-aware neural architecture search for mobile. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2820–2828. https://doi.org/10.1109/CVPR.2019.00293

  25. Pang Y, Yuan Y, Li X, Pan J (2011) Efficient hog human detection. Signal Process 91(4):773–781. https://doi.org/10.1016/j.sigpro.2010.08.010

    Article  MATH  Google Scholar 

  26. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European Conference on Computer Vision, pp 21–37. https://doi.org/10.1007/978-3-319-46448-0_2

  27. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587. https://doi.org/10.1109/CVPR.2014.81

  28. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  29. Morera Á, Sánchez Á, Moreno AB, Sappa ÁD, Vélez JF (2020) Ssd vs. yolo for detection of outdoor urban advertising panels under multiple variabilities. Sensors 20(16):4587. https://doi.org/10.3390/s20164587

    Article  Google Scholar 

  30. Shafiee MJ, Chywl B, Li F, Wong A (2017) Fast yolo A fast you only look once system for real-time embedded object detection in video. J Comput Vis Imaging Syst https://doi.org/10.15353/vsnl.v3i1.171

  31. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 779–788. https://doi.org/10.1109/CVPR.2016.91

  32. Redmon J, Farhadi A (2017) Yolo9000: Better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6517–6525. https://doi.org/10.1109/CVPR.2017.690

  33. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–6

  34. Wang C-Y, Bochkovskiy A, Liao H-YM (2021) Scaled-YOLOv4: scaling cross stage partial network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 13029–13038

  35. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

  36. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4510–4520. https://doi.org/10.1109/CVPR.2018.00474

  37. https://users.soict.hust.edu.vn/linhdt/dataset/. Online, last visited (October 2021)

  38. Jiang Z, Zhao L, Li S, Jia Y (2020) Real-time object detection method based on improved yolov4-tiny. arXiv e-prints

Download references

Acknowledgements

This work was supported by the cooperation between Hanoi University of Science and Technology and Naver Corporation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tuan Linh Dang.

Ethics declarations

Conflicts of interest

The authors declared that they have no conflicts of interest with regard to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dang, T.L., Nguyen, H.T., Dao, D.M. et al. SHAPE: a dataset for hand gesture recognition. Neural Comput & Applic 34, 21849–21862 (2022). https://doi.org/10.1007/s00521-022-07651-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07651-1

Keywords

Navigation