The ChaLearn gesture dataset (CGD 2011)

Guyon, Isabelle; Athitsos, Vassilis; Jangyodsuk, Pat; Escalante, Hugo Jair

doi:10.1007/s00138-014-0596-3

The ChaLearn gesture dataset (CGD 2011)

Special Issue Paper
Published: 21 February 2014

Volume 25, pages 1929–1951, (2014)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Isabelle Guyon¹,
Vassilis Athitsos²,
Pat Jangyodsuk² &
…
Hugo Jair Escalante³

1388 Accesses
74 Citations
Explore all metrics

Abstract

This paper describes the data used in the ChaLearn gesture challenges that took place in 2011/2012, whose results were discussed at the CVPR 2012 and ICPR 2012 conferences. The task can be described as: user-dependent, small vocabulary, fixed camera, one-shot-learning. The data include 54,000 hand and arm gestures recorded with an RGB-D \(\hbox {Kinect}^\mathrm{TM}\)camera. The data are organized into batches of 100 gestures pertaining to a small gesture vocabulary of 8–12 gestures, recorded by the same user. Short continuous sequences of 1–5 randomly selected gestures are recorded. We provide man-made annotations (temporal segmentation into individual gestures, alignment of RGB and depth images, and body part location) and a library of function to preprocess and automatically annotate data. We also provide a subset of batches in which the user’s horizontal position is randomly shifted or scaled. We report on the results of the challenge and distribute sample code to facilitate developing new solutions. The data, datacollection software and the gesture vocabularies are downloadable from http://gesture.chalearn.org. We set up a forum for researchers working on these data http://groups.google.com/group/gesturechallenge.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ChaLearn Looking at People Challenge 2014: Dataset and Results

Results and Analysis of the ChaLearn Gesture Challenge 2012

Challenges in Multi-modal Gesture Recognition

Notes

For round 1: http://www.kaggle.com/c/GestureChallenge. For round 2: http://www.kaggle.com/c/GestureChallenge2.
http://gesture.chalearn.org/data/data-annotations
For ease of visualization, earlier experiments were recorded in a different format: depth encoded as gray levels and RGB images were concatenated vertically and stored as a single Matlab movie. However, we later realized that we were loosing depth resolution for some videos because Matlab movies used only 8 bits of resolution (256 levels) and the depth resolution of our videos attained sometimes more than 1,000. Hence, we recorded later batches using cell arrays for K.
http://gesture.chalearn.org/data.
http://ffmpeg.org/.
http://gesture.chalearn.org/data/sample-code.

References

Accelerative Integrated Method (AIM) foreign language teaching methodology, http://www.aimlanguagelearning.com/
Computer vision datasets on the web. http://www.cvpapers.com/datasets.html
Imageclef—the clef cross language image retrieval track. http://www.imageclef.org/
The Pascal visual object classes homepage. http://pascallin.ecs.soton.ac.uk/challenges/VOC/
Alon, Jonathan, Athitsos, Vassilis, Yuan, Quan, Sclaroff, Stan: A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE Trans. Patt. Anal. Mach. Intell. 31(9), 1685–1699 (2009)
Article Google Scholar
Beyer, M.: Teach your baby to sign: an illustrated guide to simple sign language for babies. Fair Winds Press, Minneapolis (2007)
Google Scholar
Calatroni, A., Roggen, D., Tröster, G.: Collection and curation of a large reference dataset for activity recognition. In: Systems, Man, and Cybernetics (SMC), 2011 IEEE International Conference on, pp. 30–35. (2011)
Carroll, C., Carroll, R.: Mudras of India: a comprehensive guide to the hand gestures of yoga and Indian dance. Jessica Kingsley Publishers, London (2012)
Google Scholar
Chavarriaga, R., Sagha, H, Calatroni, A., Tejaswi D.S., Tröster, G., José del Millán, R., Roggen, D.: The opportunity challenge: A benchmark database for on-body sensor-based activity recognition. Patt. Recogn. Lett. (2013)
Private communication
Curwen, J.: The standard course of lessons & exercises in the Tonic Sol-Fa Method of teaching music: (Founded on Miss Glover’s Scheme for Rendering Psalmody Congregational. A.D. 1835.).. Nabu Press, Charleston (2012)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection, pp. 886–893. CVPR, Providence (2005)
Google Scholar
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Proceedings of the 9th European conference on Computer Vision—Volume Part II. ECCV’06, pp. 428–441. Springer-Verlag, Berlin, (2006)
De la Torre Frade, F., Hodgins, J.K., Bargteil, A.W., Martin A., Xavier, M., Justin C., Collado I Castells, A., Beltran, J.: Guide to the carnegie mellon university multimodal activity (cmu-mmac) database. In: Technical Report CMU-RI-TR-08-22, Robotics Institute, Pittsburgh, (2008)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: A large-scale hierarchical image database. In: CVPR09, (2009)
Dreuw, P., Neidle, C., Athitsos, V, Sclaroff, S., Ney, H.: Benchmark databases for video-based automatic sign language recognition. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), European Language Resources Association (ELRA), Marrakech, (2008)
Eichner, Marcin, Marín-Jiménez, Manuel Jesús, Zisserman, Andrew, Ferrari, Vittorio: 2D articulated human pose estimation and retrieval in (almost) unconstrained still images. Intern. J. Comp. Vis. 99(2), 190–214 (2012)
Article Google Scholar
Jair, E.H., Guyon, I.: Principal motion: Pca-based reconstruction of motion histograms. In: Technical report, ChaLearn Technical Memorandum, (2012). http://www.causality.inf.ethz.ch/Gesture/principal_motion.pdf
Escalante, H.J., Guyon, I., Athitsos, V., Jangyodsuk, P., Wan, J.: Principal motion components for gesture recognition using a single-example. CoRR abs/1310.4822 (2013). http://arxiv.org/abs/1310.4822
Escalera, S., Gonzàlez, J., Baró, X., Reyes, M., Lopes, O., Guyon, I, Athitsos, V., Jair E.H.: Multi-modal gesture recognition challenge 2013: Dataset and results. In: Technical report, ChaLearn Technical Memorandum, (2013)
Glomb, P., Romaszewski, M., Opozda, S., Sochan, A.: Choosing and modeling the hand gesture database for a natural user interface. In: Proceedings of the 9th international conference on Gesture and Sign Language in Human–Computer Interaction and Embodied Communication. GW’11, pp. 24–35. Springer-Verlag, Berlin, (2012)
Gross, R., Shi, J.: The cmu motion of body (mobo) database. In: Technical Report CMU-RI-TR-01-18. Robotics Institute, Carnegie Mellon University, Pittsburgh, (2001)
Guyon, I.: Athitsos, V., Jangyodsuk, P., Jair E.H.: ChaLearn gesture demonstration kit. In: Technical report, ChaLearn Technical Memorandum, (2013)
Guyon, I., Athitsos, V., Jangyodsuk, P., Jair E.H., Hamner, B.: Results and analysis of the ChaLearn gesture challenge 2012. In: Advances in Depth Image Analysis and Applications, volume 7854 of, Lecture Notes in Computer Science, pp. 186–204. (2013)
Guyon, I., Athitsos, V., Jangyodsuk, P., Hamner, B., Jair E.H.: Chalearn gesture challenge: design and first results. In: CVPR Workshops, pp. 1–6. IEEE (2012)
Hargrave, J.L.: Let me see your body talk. Kendall/Hunt Pub. Co., Dubuque (1995)
Google Scholar
Hwang, B.-W., Kim, S., Lee, S.-W.: A full-body gesture database for automatic gesture recognition. In: FG, pp. 243–248. IEEE Computer Society (2006)
Kendon, A.: Gesture: visible action as utterance. Cambridge University Press, Cambridge (2004)
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: ICCV (2011)
Laptev, Ivan: On space–time interest points. Intern. J. Comp. Vis. 64(2–3), 107–123 (2005)
Article Google Scholar
Larsson, M., Serrano V.I., Kragic, D., Kyrki V.: Cvap arm/hand activity database, http://www.csc.kth.se/~danik/gesture_database/
Malgireddy, Manavender, Nwogu, Ifeoma, Govindaraju, Venu: Language-motivated approaches to action recognition. JMLR 14, 2189–2212 (2013)
MathSciNet Google Scholar
Martnez, A.M., Wilbur, R.B., Shay, R., Kak, A.C.: Purdue rvl-slll asl database for automatic recognition of american sign language. In: Proceedings of the 4th IEEE International Conference on Multimodal Interfaces. ICMI ’02, pp. 167–172. IEEE Computer Society, Washington, (2002)
McNeill, D.: Hand and mind: what gestures reveal about thought. Psychology/cognitive science. University of Chicago Press, Chicago (1996)
Google Scholar
Moeslund, T.B., Bajers, F.: Summaries of 107 computer vision-based human motion capture papers (1999)
Moeslund, Thomas B., Hilton, Adrian, Krüger, Volker, Sigal, L. (eds.): Visual analysis of humans—looking at people. Springer, Berlin (2011)
Google Scholar
Müller, M., Röder, T., Clausen, M., Eberhardt, B., Krüger, B., Weber, A.: Documentation mocap database hdm05. In: Technical Report CG-2007-2, Universität Bonn, (2007)
Munari, B.: Speak Italian: the fine art of the gesture. Chronicle Books, San Francisco (2005)
Google Scholar
World Federation of the Deaf and World Federation of the Deaf. Unification of Signs Commission. Gestuno: international sign language of the deaf. GESTUNO: International Sign Language of the Deaf, Langage Gestuel International Des Sourds. British Deaf Association [for] the World Federation of the Deaf (1975)
Raptis, M., Kirovski, D., Hoppes, H.: Real-time classification of dance gestures from skeleton animation. In: Proceedings of the ACM SIGGRAPH/Eurographics symposium on Computer animation, (2011)
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: In CVPR (2011)
Sigal, Leonid, Balan, Alexandru O.: Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comp. Vision 87(1–2), 4–27 (2010)
Article Google Scholar
Antonio, T., Robert, F., Freeman, W.T.: 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans. Patt. Anal. Mach. Intell. 30(11) (2008)
Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. Info. Theory IEEE Trans 13(2), 260–269 (1967)
Article MATH Google Scholar
von Laban, R., Lange, R.: Laban’s principles of dance and movement notation. Macdonald & Evans, Canada (1975)
Google Scholar
Wagner, M., Armstrong, N.: Field guide to gestures: how to identify and interpret virtually every gesture known to man. Field Guide, Quirk Books, Philadelphia (2003)
Google Scholar
Wan, J., Ruan, Q., Li, W.: One-shot learning gesture recognition from rgb-d data using bag-of-features. JMLR (2013)

Download references

Acknowledgments

This challenge was organized by ChaLearn http://chalearn.org whose directors are gratefully acknowledged. The submission website was hosted by Kaggle http://kaggle.com and we thank Ben Hamner for his wonderful support. Our sponsors include Microsoft (Kinect for Xbox 360) and Texas Instrument who donated prizes. We are very grateful to Alex Kipman and Laura Massey at Microsoft and to Branislav Kisacanin at Texas Instrument who made this possible. We also thank the committee members and participants of the CVPR 2011, CVPR 2012, and ICPR 2012 gesture recognition workshop, the judges of the demonstration competitions hosted in conjunction with CVPR 2012 and ICPR 2012 and the Pascal2 reviewers who made valuable suggestions. We are particularly grateful to Richard Bowden, Philippe Dreuw, Ivan Laptev, Jitendra Malik, Greg Mori, and Christian Vogler, who provided us with useful guidance in the design of the dataset.

Author information

Authors and Affiliations

ChaLearn, 955 Creston Road, Berkeley, CA, 94708-1501, USA
Isabelle Guyon
University of Texas at Arlington, Arlington, TX, USA
Vassilis Athitsos & Pat Jangyodsuk
INAOE, Puebla, Mexico
Hugo Jair Escalante

Authors

Isabelle Guyon
View author publications
You can also search for this author in PubMed Google Scholar
Vassilis Athitsos
View author publications
You can also search for this author in PubMed Google Scholar
Pat Jangyodsuk
View author publications
You can also search for this author in PubMed Google Scholar
Hugo Jair Escalante
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Isabelle Guyon.

Additional information

This effort was initiated by the DARPA Deep Learning program and was supported by the US National Science Foundation (NSF) under grants ECCS 1128436 and ECCS 1128296, the EU Pascal2 network of excellence and the Challenges in Machine Learning foundation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.

Appendix

1.1 Results by challenge participant

We used the code provided by 15 top ranking participants in both challenge rounds to compute performances on the validation and final evaluation sets (Table 5). We also provide results on 20 other batches selected for our translation experiments. Untranslated data are referred to as “utran” and translated data as “tran”. Details on the methods employed by the participants are found in reference [24] and on the website of the challenge.

1.2 Development data lexicons

The development data were recorded using a subset of thirty lexicons (Table 6). They were recorded at least 11 times each by different users. We list in Table 7 the lexicons used for validation and final evaluation data. Note that some validation lexicons are also present in development data but that the final evaluation data include only new lexicons found in no other sets.

Table 6 Lexicons recorded in development data

Full size table

Table 7 Results by batch. This table lists for validation and final evaluation data batches their lexicon, identity of the subject (user) that recorded the data, and performance of recognition of 15 top ranking participants in round 1 and 2. The performance score is the average generalized Levenshtein distance, which is analogous to an error rate. Best is the lowest score, Mean is the average score and Std is the standard deviation

Full size table

1.3 Results by data batch

We show in Table 7 the performances by batch. We computed the best and average performance over 15 top ranking participants in round 1 and 2: Alfnie1, Alfnie2, BalazsGodeny, HITCS, Immortals, Joewan, Manavender, OneMillionMonkeys, Pennect, SkyNet, TurtleTamers, Vigilant, WayneZhang, XiaoZhuWudi, and Zonga.

1.4 Depth parameters

We also provide the parameters necessary to reconstruct the original depth data from normalized values (Table 8).

Table 8 Depth parameters. This table lists for validation and final evaluation data batches the parameters necessary to approximately reconstruct the original depth values from the normalized values. This is achieved by averaging the R,G, and B values to get a value v, then perform v/255*(MaxDepth\(-\)MinDepth) +MinDepth. The depth parameters for the development data are found on the website of the challenge. “Date” is the data at which the data batch was recorded in YY MM DD HH MM format. “DephRes” is the number of unique levels of depth

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guyon, I., Athitsos, V., Jangyodsuk, P. et al. The ChaLearn gesture dataset (CGD 2011). Machine Vision and Applications 25, 1929–1951 (2014). https://doi.org/10.1007/s00138-014-0596-3

Download citation

Received: 31 January 2013
Revised: 26 September 2013
Accepted: 14 January 2014
Published: 21 February 2014
Issue Date: November 2014
DOI: https://doi.org/10.1007/s00138-014-0596-3

Keywords

Mathematics Subject Classification (2000)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The ChaLearn gesture dataset (CGD 2011)

Abstract

Access this article

Similar content being viewed by others

ChaLearn Looking at People Challenge 2014: Dataset and Results

Results and Analysis of the ChaLearn Gesture Challenge 2012

Challenges in Multi-modal Gesture Recognition

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

1.1 Results by challenge participant

1.2 Development data lexicons

1.3 Results by data batch

1.4 Depth parameters

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2000)

Navigation

The ChaLearn gesture dataset (CGD 2011)

Abstract

Access this article

Similar content being viewed by others

ChaLearn Looking at People Challenge 2014: Dataset and Results

Results and Analysis of the ChaLearn Gesture Challenge 2012

Challenges in Multi-modal Gesture Recognition

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

1.1 Results by challenge participant

1.2 Development data lexicons

1.3 Results by data batch

1.4 Depth parameters

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

Search

Navigation