Skip to main content
Log in

The ChaLearn gesture dataset (CGD 2011)

  • Special Issue Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

This paper describes the data used in the ChaLearn gesture challenges that took place in 2011/2012, whose results were discussed at the CVPR 2012 and ICPR 2012 conferences. The task can be described as: user-dependent, small vocabulary, fixed camera, one-shot-learning. The data include 54,000 hand and arm gestures recorded with an RGB-D \(\hbox {Kinect}^\mathrm{TM}\)camera. The data are organized into batches of 100 gestures pertaining to a small gesture vocabulary of 8–12 gestures, recorded by the same user. Short continuous sequences of 1–5 randomly selected gestures are recorded. We provide man-made annotations (temporal segmentation into individual gestures, alignment of RGB and depth images, and body part location) and a library of function to preprocess and automatically annotate data. We also provide a subset of batches in which the user’s horizontal position is randomly shifted or scaled. We report on the results of the challenge and distribute sample code to facilitate developing new solutions. The data, datacollection software and the gesture vocabularies are downloadable from http://gesture.chalearn.org. We set up a forum for researchers working on these data http://groups.google.com/group/gesturechallenge.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. For round 1: http://www.kaggle.com/c/GestureChallenge. For round 2: http://www.kaggle.com/c/GestureChallenge2.

  2. http://gesture.chalearn.org/data/data-annotations

  3. For ease of visualization, earlier experiments were recorded in a different format: depth encoded as gray levels and RGB images were concatenated vertically and stored as a single Matlab movie. However, we later realized that we were loosing depth resolution for some videos because Matlab movies used only 8 bits of resolution (256 levels) and the depth resolution of our videos attained sometimes more than 1,000. Hence, we recorded later batches using cell arrays for K.

  4. http://gesture.chalearn.org/data.

  5. http://ffmpeg.org/.

  6. http://gesture.chalearn.org/data/sample-code.

References

  1. Accelerative Integrated Method (AIM) foreign language teaching methodology, http://www.aimlanguagelearning.com/

  2. Computer vision datasets on the web. http://www.cvpapers.com/datasets.html

  3. Imageclef—the clef cross language image retrieval track. http://www.imageclef.org/

  4. The Pascal visual object classes homepage. http://pascallin.ecs.soton.ac.uk/challenges/VOC/

  5. Alon, Jonathan, Athitsos, Vassilis, Yuan, Quan, Sclaroff, Stan: A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE Trans. Patt. Anal. Mach. Intell. 31(9), 1685–1699 (2009)

    Article  Google Scholar 

  6. Beyer, M.: Teach your baby to sign: an illustrated guide to simple sign language for babies. Fair Winds Press, Minneapolis (2007)

    Google Scholar 

  7. Calatroni, A., Roggen, D., Tröster, G.: Collection and curation of a large reference dataset for activity recognition. In: Systems, Man, and Cybernetics (SMC), 2011 IEEE International Conference on, pp. 30–35. (2011)

  8. Carroll, C., Carroll, R.: Mudras of India: a comprehensive guide to the hand gestures of yoga and Indian dance. Jessica Kingsley Publishers, London (2012)

    Google Scholar 

  9. Chavarriaga, R., Sagha, H, Calatroni, A., Tejaswi D.S., Tröster, G., José del Millán, R., Roggen, D.: The opportunity challenge: A benchmark database for on-body sensor-based activity recognition. Patt. Recogn. Lett. (2013)

  10. Private communication

  11. Curwen, J.: The standard course of lessons & exercises in the Tonic Sol-Fa Method of teaching music: (Founded on Miss Glover’s Scheme for Rendering Psalmody Congregational. A.D. 1835.).. Nabu Press, Charleston (2012)

    Google Scholar 

  12. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection, pp. 886–893. CVPR, Providence (2005)

    Google Scholar 

  13. Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Proceedings of the 9th European conference on Computer Vision—Volume Part II. ECCV’06, pp. 428–441. Springer-Verlag, Berlin, (2006)

  14. De la Torre Frade, F., Hodgins, J.K., Bargteil, A.W., Martin A., Xavier, M., Justin C., Collado I Castells, A., Beltran, J.: Guide to the carnegie mellon university multimodal activity (cmu-mmac) database. In: Technical Report CMU-RI-TR-08-22, Robotics Institute, Pittsburgh, (2008)

  15. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: A large-scale hierarchical image database. In: CVPR09, (2009)

  16. Dreuw, P., Neidle, C., Athitsos, V, Sclaroff, S., Ney, H.: Benchmark databases for video-based automatic sign language recognition. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), European Language Resources Association (ELRA), Marrakech, (2008)

  17. Eichner, Marcin, Marín-Jiménez, Manuel Jesús, Zisserman, Andrew, Ferrari, Vittorio: 2D articulated human pose estimation and retrieval in (almost) unconstrained still images. Intern. J. Comp. Vis. 99(2), 190–214 (2012)

    Article  Google Scholar 

  18. Jair, E.H., Guyon, I.: Principal motion: Pca-based reconstruction of motion histograms. In: Technical report, ChaLearn Technical Memorandum, (2012). http://www.causality.inf.ethz.ch/Gesture/principal_motion.pdf

  19. Escalante, H.J., Guyon, I., Athitsos, V., Jangyodsuk, P., Wan, J.: Principal motion components for gesture recognition using a single-example. CoRR abs/1310.4822 (2013). http://arxiv.org/abs/1310.4822

  20. Escalera, S., Gonzàlez, J., Baró, X., Reyes, M., Lopes, O., Guyon, I, Athitsos, V., Jair E.H.: Multi-modal gesture recognition challenge 2013: Dataset and results. In: Technical report, ChaLearn Technical Memorandum, (2013)

  21. Glomb, P., Romaszewski, M., Opozda, S., Sochan, A.: Choosing and modeling the hand gesture database for a natural user interface. In: Proceedings of the 9th international conference on Gesture and Sign Language in Human–Computer Interaction and Embodied Communication. GW’11, pp. 24–35. Springer-Verlag, Berlin, (2012)

  22. Gross, R., Shi, J.: The cmu motion of body (mobo) database. In: Technical Report CMU-RI-TR-01-18. Robotics Institute, Carnegie Mellon University, Pittsburgh, (2001)

  23. Guyon, I.: Athitsos, V., Jangyodsuk, P., Jair E.H.: ChaLearn gesture demonstration kit. In: Technical report, ChaLearn Technical Memorandum, (2013)

  24. Guyon, I., Athitsos, V., Jangyodsuk, P., Jair E.H., Hamner, B.: Results and analysis of the ChaLearn gesture challenge 2012. In: Advances in Depth Image Analysis and Applications, volume 7854 of, Lecture Notes in Computer Science, pp. 186–204. (2013)

  25. Guyon, I., Athitsos, V., Jangyodsuk, P., Hamner, B., Jair E.H.: Chalearn gesture challenge: design and first results. In: CVPR Workshops, pp. 1–6. IEEE (2012)

  26. Hargrave, J.L.: Let me see your body talk. Kendall/Hunt Pub. Co., Dubuque (1995)

    Google Scholar 

  27. Hwang, B.-W., Kim, S., Lee, S.-W.: A full-body gesture database for automatic gesture recognition. In: FG, pp. 243–248. IEEE Computer Society (2006)

  28. Kendon, A.: Gesture: visible action as utterance. Cambridge University Press, Cambridge (2004)

  29. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: ICCV (2011)

  30. Laptev, Ivan: On space–time interest points. Intern. J. Comp. Vis. 64(2–3), 107–123 (2005)

    Article  Google Scholar 

  31. Larsson, M., Serrano V.I., Kragic, D., Kyrki V.: Cvap arm/hand activity database, http://www.csc.kth.se/~danik/gesture_database/

  32. Malgireddy, Manavender, Nwogu, Ifeoma, Govindaraju, Venu: Language-motivated approaches to action recognition. JMLR 14, 2189–2212 (2013)

    MathSciNet  Google Scholar 

  33. Martnez, A.M., Wilbur, R.B., Shay, R., Kak, A.C.: Purdue rvl-slll asl database for automatic recognition of american sign language. In: Proceedings of the 4th IEEE International Conference on Multimodal Interfaces. ICMI ’02, pp. 167–172. IEEE Computer Society, Washington, (2002)

  34. McNeill, D.: Hand and mind: what gestures reveal about thought. Psychology/cognitive science. University of Chicago Press, Chicago (1996)

    Google Scholar 

  35. Moeslund, T.B., Bajers, F.: Summaries of 107 computer vision-based human motion capture papers (1999)

  36. Moeslund, Thomas B., Hilton, Adrian, Krüger, Volker, Sigal, L. (eds.): Visual analysis of humans—looking at people. Springer, Berlin (2011)

    Google Scholar 

  37. Müller, M., Röder, T., Clausen, M., Eberhardt, B., Krüger, B., Weber, A.: Documentation mocap database hdm05. In: Technical Report CG-2007-2, Universität Bonn, (2007)

  38. Munari, B.: Speak Italian: the fine art of the gesture. Chronicle Books, San Francisco (2005)

    Google Scholar 

  39. World Federation of the Deaf and World Federation of the Deaf. Unification of Signs Commission. Gestuno: international sign language of the deaf. GESTUNO: International Sign Language of the Deaf, Langage Gestuel International Des Sourds. British Deaf Association [for] the World Federation of the Deaf (1975)

  40. Raptis, M., Kirovski, D., Hoppes, H.: Real-time classification of dance gestures from skeleton animation. In: Proceedings of the ACM SIGGRAPH/Eurographics symposium on Computer animation, (2011)

  41. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: In CVPR (2011)

  42. Sigal, Leonid, Balan, Alexandru O.: Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comp. Vision 87(1–2), 4–27 (2010)

    Article  Google Scholar 

  43. Antonio, T., Robert, F., Freeman, W.T.: 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans. Patt. Anal. Mach. Intell. 30(11) (2008)

  44. Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. Info. Theory IEEE Trans 13(2), 260–269 (1967)

    Article  MATH  Google Scholar 

  45. von Laban, R., Lange, R.: Laban’s principles of dance and movement notation. Macdonald & Evans, Canada (1975)

    Google Scholar 

  46. Wagner, M., Armstrong, N.: Field guide to gestures: how to identify and interpret virtually every gesture known to man. Field Guide, Quirk Books, Philadelphia (2003)

    Google Scholar 

  47. Wan, J., Ruan, Q., Li, W.: One-shot learning gesture recognition from rgb-d data using bag-of-features. JMLR (2013)

Download references

Acknowledgments

This challenge was organized by ChaLearn http://chalearn.org whose directors are gratefully acknowledged. The submission website was hosted by Kaggle http://kaggle.com and we thank Ben Hamner for his wonderful support. Our sponsors include Microsoft (Kinect for Xbox 360) and Texas Instrument who donated prizes. We are very grateful to Alex Kipman and Laura Massey at Microsoft and to Branislav Kisacanin at Texas Instrument who made this possible. We also thank the committee members and participants of the CVPR 2011, CVPR 2012, and ICPR 2012 gesture recognition workshop, the judges of the demonstration competitions hosted in conjunction with CVPR 2012 and ICPR 2012 and the Pascal2 reviewers who made valuable suggestions. We are particularly grateful to Richard Bowden, Philippe Dreuw, Ivan Laptev, Jitendra Malik, Greg Mori, and Christian Vogler, who provided us with useful guidance in the design of the dataset.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Isabelle Guyon.

Additional information

This effort was initiated by the DARPA Deep Learning program and was supported by the US National Science Foundation (NSF) under grants ECCS 1128436 and ECCS 1128296, the EU Pascal2 network of excellence and the Challenges in Machine Learning foundation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.

Appendix

Appendix

1.1 Results by challenge participant

We used the code provided by 15 top ranking participants in both challenge rounds to compute performances on the validation and final evaluation sets (Table 5). We also provide results on 20 other batches selected for our translation experiments. Untranslated data are referred to as “utran” and translated data as “tran”. Details on the methods employed by the participants are found in reference [24] and on the website of the challenge.

1.2 Development data lexicons

The development data were recorded using a subset of thirty lexicons (Table 6). They were recorded at least 11 times each by different users. We list in Table 7 the lexicons used for validation and final evaluation data. Note that some validation lexicons are also present in development data but that the final evaluation data include only new lexicons found in no other sets.

Table 6 Lexicons recorded in development data
Table 7 Results by batch. This table lists for validation and final evaluation data batches their lexicon, identity of the subject (user) that recorded the data, and performance of recognition of 15 top ranking participants in round 1 and 2. The performance score is the average generalized Levenshtein distance, which is analogous to an error rate. Best is the lowest score, Mean is the average score and Std is the standard deviation

1.3 Results by data batch

We show in Table 7 the performances by batch. We computed the best and average performance over 15 top ranking participants in round 1 and 2: Alfnie1, Alfnie2, BalazsGodeny, HITCS, Immortals, Joewan, Manavender, OneMillionMonkeys, Pennect, SkyNet, TurtleTamers, Vigilant, WayneZhang, XiaoZhuWudi, and Zonga.

1.4 Depth parameters

We also provide the parameters necessary to reconstruct the original depth data from normalized values (Table 8).

Table 8 Depth parameters. This table lists for validation and final evaluation data batches the parameters necessary to approximately reconstruct the original depth values from the normalized values. This is achieved by averaging the R,G, and B values to get a value v, then perform v/255*(MaxDepth\(-\)MinDepth) +MinDepth. The depth parameters for the development data are found on the website of the challenge. “Date” is the data at which the data batch was recorded in YY MM DD HH MM format. “DephRes” is the number of unique levels of depth

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guyon, I., Athitsos, V., Jangyodsuk, P. et al. The ChaLearn gesture dataset (CGD 2011). Machine Vision and Applications 25, 1929–1951 (2014). https://doi.org/10.1007/s00138-014-0596-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-014-0596-3

Keywords

Mathematics Subject Classification (2000)

Navigation