Skip to main content
Log in

A Survey on Perception Methods for Human–Robot Interaction in Social Robots

  • Survey
  • Published:
International Journal of Social Robotics Aims and scope Submit manuscript

Abstract

For human–robot interaction (HRI), perception is one of the most important capabilities. This paper reviews several widely used perception methods of HRI in social robots. Specifically, we investigate general perception tasks crucial for HRI, such as where the objects are located in the rooms, what objects are in the scene, and how they interact with humans. We first enumerate representative social robots and summarize the most three important perception methods from these robots: feature extraction, dimensionality reduction, and semantic understanding. For feature extraction, four widely used signals including visual-based, audio-based, tactile-based and rang sensors-based are reviewed, and they are compared based on their advantages and disadvantages. For dimensionality reduction, representative methods including principle component analysis (PCA), linear discriminant analysis (LDA), and locality preserving projections (LPP) are reviewed. For semantic understanding, conventional techniques for several typical applications such as object recognition, object tracking, object segmentation, and speaker localization are discussed, and their characteristics and limitations are also analyzed. Moreover, several popular data sets used in social robotics and published semantic understanding results are analyzed and compared in light of our analysis of HRI perception methods. Lastly, we suggest important future work to analyze fundamental questions on perception methods in HRI.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Notes

  1. In this paper, we do not differentiate feature representation and feature extraction while there are some slightly differences. Generally speaking, feature representation methods focus on extracting low-level features and feature extraction methods aim at extracting middle- and high-level features.

  2. In some social robots, semantic understanding is included in the intermediate mechanism. In this paper, we consider it as a part of the perception system since it is indirectly fulfilled on the acquired signals and the understanding can represent the outside environments that a robot really wants to know.

  3. In some areas, emotion recognition is also called affective computing and emotion is referred to as affective state. In this paper, we still use emotion recognition because it is more commonly used in the literature.

  4. In this survey, we mainly focus on four widely used signals used in existing social robots. Hence, other types of signals such as GPS, temperature and pain of humans will not be discussed.

  5. Please note that the core algorithm refers to the approach used for some semantic understanding tasks such as detection/classification/prediction in HRI by using the extracted features.

  6. Please note that semantic understanding presented in this paper refers to the interaction between humans and robots, and hence other tasks such as navigation in mobile robots and interaction between robots and the environments are not included in this paper.

  7. Face, human and emotion can be unified as a different categories of objects.

  8. Please note that some other results are not included here. In this table, only the representative social robots listed in Table 1 are selected.

References

  1. Fong T, Nourbakhsh I, Dautenhahn K (2003) A survey of socially interactive robots. Robot Auton Syst 42(3–4):143–166

    MATH  Google Scholar 

  2. Breazeal C (2002) Designing sociable robots. MIT Press, Cambridge

    Google Scholar 

  3. Bartneck C, Forlizzi J (2004) A design-centred framework for social human–robot interaction. In: IEEE international workshop on robot and human interactive communication, pp 591–594

    Google Scholar 

  4. Hegel F, Muhl C, Wrede B, Martina H-F, Sagerer G (2009) Understanding social robots. In: International conference on advance in computer–human interactions, pp 169–174

    Google Scholar 

  5. Social robot, accessed 5 November, 2011 [Online]. Available from: http://en.wikipedia.org/wiki/Social_robot

  6. Breazeal C (2003) Toward sociable robots. Robot Auton Syst 42(3–4):167–175

    MATH  Google Scholar 

  7. Hirose M, Ogawa K (2007) Honda humanoid robots development. Philos Trans R Soc, Math Phys Eng Sci 365:11–19

    Google Scholar 

  8. Fong T, Nourbakhsh I, Dautenhahn K (2003) A survey of socially interactive robots. Robot Auton Syst 42(3–4):143–166

    MATH  Google Scholar 

  9. Jensen B, Tomatis N, Mayor L, Drygajlo A, Siegwart R (2005) Robots meet humans-interaction in public spaces. IEEE Trans Ind Electron 52(6):1530–1546

    Google Scholar 

  10. Jones C, Deeming A (2008) Affective human–robotic interaction. In: Lecture Notes in Computer Science, vol 4868. Springer, Berlin, pp 175–185

    Google Scholar 

  11. The FG-NET Aging Database, accessed 25 February, 2008 [Online]. Available from: http://www.fgnet.rsunit.com/

  12. Fitzpatrick PM, Metta G (2002) Towards manipulation-driven vision. In: IEEE international conference on intelligent robots and systems, vol 1, pp 43–48

    Google Scholar 

  13. Scassellati B (1998) Eye finding via face detection for a foveated, active vision system. In: National conference on artificial intelligence, pp 969–976

    Google Scholar 

  14. Tikhanoff V, Cangelosi A, Fitzpatrick P, Metta G, Natale L, Nori F (2008) An open-source simulator for cognitive robotics research: the prototype of the iCub humanoid robot simulator. In: Performance metrics for intelligent systems (PerMIS) workshop, pp 57–61

    Google Scholar 

  15. Sandini G, Metta G, Vernon D (2007) The iCub cognitive humanoid robot: an open-system research platform for enactive cognition. In: Lecture notes in computer science, vol 4850. Springer, Berlin, pp 358–369

    Google Scholar 

  16. Rolf M, Hanheide M, Rohlfing KJ (2009) Attention via synchrony: making use of multimodal cues in social learning. IEEE Trans Auton Mental Dev 1(1):55–67

    Google Scholar 

  17. Figueira D, Lopes M, Ventura R, Ruesch J (2009) Towards a spatial model for humanoid social robots. In: Lecture notes in computer science. Springer, Berlin, pp 287–298

    Google Scholar 

  18. Hornstein J, Lopes M, Santos-Victor J, Lacerda F (2006) Sound localization for humanoid robots—building audio-motor maps based on the HRTF. In: International conference on intelligent robots and systems, pp 1170–1176

    Google Scholar 

  19. Breazeal C (2003) Emotion and sociable humanoid robots. Int J Hum-Comput Stud 59(1–2):119–155

    Google Scholar 

  20. Breazeal C, Edsinger A, Fitzpatrick P, Scassellati B (2001) Active vision for sociable robots. IEEE Trans Syst Man Cybern, Part A, Syst Hum 31(5):443–453

    Google Scholar 

  21. Aryananda L (2002) Recognizing and remembering individuals: online and unsupervised face recognition for humanoid robot. In: IEEE international conference on intelligent robots and systems, vol 2, pp 1202–1207

    Google Scholar 

  22. Breazeal C, Aryananda L (2002) Recognition of affective communicative intent in robot-directed speech. Auton Robots 12(1):83–104

    MATH  Google Scholar 

  23. Ge S, Wang C, Hang C (2008) Facial expression imitation in human robot interaction. In: IEEE international symposium on robot and human interactive communication, pp 213–218

    Google Scholar 

  24. Barciela G, Paz E, López J, Sanz R, Perez D (2008) Building a robot head: design and control issues. In: IEEE international symposium on robot and human interactive communication, pp 213–218

    Google Scholar 

  25. Breazeal C, Kidd CD, Thomaz AL, Hoffman G, Berlin M (2005) Effects of nonverbal communication on efficiency and robustness in human–robot teamwork. In: International conference on intelligent robots and systems, pp 383–388

    Google Scholar 

  26. Feil-Seifer D, Matarić MJ (2005) Defining socially assistive robotics. In: International conference on rehabilitation robotics, pp 465–468

    Google Scholar 

  27. Simmons R, Goldberg D, Goode A, Montemerlo M, Roy N, Sellner B, Urmson C, Maxwell B (2003) GRACE: an autonomous robot for the AAAI robot challenge. AI Mag 24(2):51–72

    Google Scholar 

  28. Michalowski MP, Šabanović S, Disalvo C, Busquets D, Hiatt LM, Melchior NA, Simmons R (2007) Socially distributed perception: GRACE plays social tag at AAAI 2005. Auton Robots 22(4):385–397

    Google Scholar 

  29. Clodic A, Fleury S, Alami R, Herrb M, Chatila R (2005) Supervision and interaction. In: International conference on advanced robotics, pp 725–732

    Google Scholar 

  30. Jensen B, Philippsen R, Siegwart R (2003) Narrative situation assessment for human–robot interaction. In: IEEE international conference on robotics and automation, vol 1, pp 1503–1508

    Google Scholar 

  31. Jensen B, Froidevaux G, Greppin X, Lorotte A, Mayor L, Meisser M, Ramel G, Siegwart R (2003) Multi-robot human-interaction and visitor flow management. In: IEEE international conference on robotics and automation, pp 2388–2393

    Google Scholar 

  32. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: IEEE computer society conference on computer vision and pattern recognition, vol 1, pp I511–I518

    Google Scholar 

  33. Germa T, Lerasle F, Danès P, Brèthes L (2007) Human/robot visual interaction for a tour-guide robot. In: IEEE international conference on intelligent robots and systems, pp 3448–3453

    Google Scholar 

  34. Hasanuzzaman Md, Zhang T, Ampornaramveth V, Gotoda H, Shirai Y, Ueno H (2007) Adaptive visual gesture recognition for human–robot interaction using a knowledge-based software platform. Robot Auton Syst 55(8):643–657

    Google Scholar 

  35. Kanda T, Glas DF, Shiomi M (2009) Abstracting people’s trajectories for social robots to proactively approach customers. IEEE Trans Robot 25(6):1382–1396

    Google Scholar 

  36. Movellan JR, Tanaka F, Fasel IR, Taylor C, Ruvolo P, Eckhardt M (2007) The RUBI project: a progress report. In: ACM/IEEE conference on human–robot interaction—robot as team member, pp 333–339

    Google Scholar 

  37. Ruvolo P, Fasel I, Movellan J (2008) Auditory mood detection for social and educational robots. In: IEEE international conference on robotics and automation, pp 3551–3556

    Google Scholar 

  38. Bartlett MS, Littlewort G, Frank M, Lainscsek C, Fasel I, Movellan J (2006) Fully automatic facial action recognition in spontaneous behavior. In: International conference on automatic face and gesture recognition, pp 223–230

    Google Scholar 

  39. Christensen HI (2003) Intelligent home appliances. In: Springer tracts in advanced robotics. Springer, Berlin, pp 319–330

    Google Scholar 

  40. Lohse M, Hegel F, Wrede B (2008) Domestic applications for social robots-an online survey on the influence of appearance and capabilities. J Phys Agents 2(2):21–32

    Google Scholar 

  41. Asfour T, Regenstein K, Azad P, Schroder O, Bierbaum A, Vahrenkamp N, Dillmann R (2006) ARMAR-III: an integrated humanoid platform for sensory-motor control. In: International conference on humanoid robots, pp 169–175

    Google Scholar 

  42. Ekenel HK, Stiefelhagen R (2005) A generic face representation approach for local appearance based face verification. In: IEEE computer society conference on computer vision and pattern recognition workshops, vol 03, p 155

    Google Scholar 

  43. Nickel K, Gehrig T, Stiefelhagen R, McDonough J (2005) A joint particle filter for audio-visual speaker tracking. In: International conference on multimodal interfaces, pp 61–68

    Google Scholar 

  44. Kraft F, Malkin R, Schaaf T, Waibel A (2005) Temporal ICA for classification of acoustic events in a kitchen environment. In: European conference on speech communication and technology, pp 2689–2692

    Google Scholar 

  45. Voit M, Nickel K, Stiefelhagen R (2007) Neural network-based head pose estimation and multi-view fusion. In: Lecture notes in computer science, vol 4122. Springer, Berlin, pp 291–298

    Google Scholar 

  46. Nickel K, Stiefelhagen R (2007) Visual recognition of pointing gestures for human–robot interaction. Image Vis Comput 25(12):1875–1884

    Google Scholar 

  47. Osada J, Ohnaka S, Sato M (2006) The scenario and design process of childcare robot. In: PaPeRo, international conference on advances in computer entertainment technology. Springer, Berlin

    Google Scholar 

  48. Sato A, Imaoka H, Suzuki T, Hosoi T (2005) Advances in face detection and recognition technologies. NEC J Adv Technol 2(1):28–34

    Google Scholar 

  49. Betkowska A, Shinoda K, Furui S (2007) Robust speech recognition using factorial HMMs for home environments. Eurasip J Adv Signal Process. doi:10.1155/2007/20593

  50. Stiehl W, Breazeal C (2005) Affective touch for robotic companions. In: Lecture notes in computer science, vol 3784. Springer, Berlin, pp 747–754

    Google Scholar 

  51. Esau N, Kleinjohann L, Kleinjohann B (2006) Emotional communication with the robot head MEXI. In: International conference on control, automation, robotics and vision, pp 1–7

    Google Scholar 

  52. Stichling D, Kleinjohann B (2002) Low latency color segmentation on embedded real-time systems. In: IFIP world computer congress—TC10 stream on distributed and parallel embedded systems, vol 219, pp 247–256

    Google Scholar 

  53. Austermann A, Esa N, Kleinjohann L, Kleinjohann B (2005) Prosody based emotion recognition for MEXI. In: International conference on intelligent robots and systems, vol 3, pp 1138–1144

    Google Scholar 

  54. Esau N, Kleinjohann L, Kleinjohann B (2005) An adaptable fuzzy affective states model for affective states recognition. In: EUSFLAT—LFA, pp 73–78

    Google Scholar 

  55. Hirth J, Schmitz N, Berns K (2007) Emotional architecture for the humanoid robot head ROMAN. In: IEEE international conference on robotics and automation, pp 2150–2155

    Google Scholar 

  56. Schmitz N, Spranger C, Berns K (2009) 3D audio perception system for humanoid robots. In: International conferences on advances in computer–human interactions, pp 181–186

    Google Scholar 

  57. Strupp S, Schmitz N, Berns K (2008) Visual-based emotion detection for natural man–machine interaction. In: Lecture notes in computer science, vol 5243. Springer, Berlin, pp 356–363

    Google Scholar 

  58. Hackel M, Schwope S, Fritsch J, Wrede B, Sagerer G (2006) Designing a sociable humanoid robot for interdisciplinary research. Adv Robot 20(11):1219–1235

    Google Scholar 

  59. Vogt T, Andreé E (2005) Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In: IEEE international conference on multimedia and expo, pp 474–477

    Google Scholar 

  60. Spexard T, Haasch A, Fritsch J, Sagerer G (2006) Human-like person tracking with an anthropomorphic robot. In: IEEE international conference on robotics and automation, pp 1286–1292

    Google Scholar 

  61. Haasch A, Hohenner S, Hüwel S, Kleinehagenbrock M, Lang S, Toptsis I, Fink G, Fritsch J, Wrede B, Sagerer G (2004) BIRON-the bielefeld robot companion. In: International workshop on advances in service robotics, pp 27–32

    Google Scholar 

  62. Fritsch J, Kleinehagenbrock M, Lang S, Plötz T, Fink GA, Sagerer G (2003) Multi-modal anchoring for human–robot interaction. Robot Auton Syst 43(2–3):133–147

    Google Scholar 

  63. Lang S, Kleinehagenbrock M, Hohenner S, Fritsch J, Fink GA, Sagerer G (2003) Providing the basis for human–robot-interaction: a multi-modal attention system for a mobile robot. In: International conference on multimodal interfaces, pp 28–35

    Google Scholar 

  64. Bennewitz M, Faber F, Joho D, Behnke S (2007) Fritz—a humanoid communication robot. In: IEEE international conference on robot & human interactive communication, pp 1072–1077

    Google Scholar 

  65. Lisetti CL, Brown SM, Alvarez K, Marpaung AH (2004) A social informatics approach to human–robot interaction with a service social robot. IEEE Trans Syst Man Cybern, Part C, Appl Rev 34(2):195–209

    Google Scholar 

  66. Brown SM, Lisetti CL, Marpaung AH (2002) Cherry, the little red robot…with a mission…and a personality. In: AAAI fall symposium

    Google Scholar 

  67. Marpaung AH, Lisetti CL (2002) Multilevel emotion modeling for autonomous agents. In: AAAI fall symposium—technical report FS-04-05, pp 39–46

    Google Scholar 

  68. Kerstin S, Anders G, Helge H (2003) Social and collaborative aspects of interaction with a service robot. Robot Auton Syst 42:223–234

    MATH  Google Scholar 

  69. Chopra A, Obsniuk M, Jenkin MR (2006) The nomad 200 and the nomad SuperScout: reverse engineered and resurrected. In: Canadian conference on computer and robot vision

    Google Scholar 

  70. Kozima H, Michalowski M, Nakagawa C (2009) A playful robot for research, therapy, and entertainment. Int J Soc Robot 1:3–18

    Google Scholar 

  71. Wada K, Shibata T (2007) Living with seal robots—its sociopsychological and physiological influences on the elderly at a care house. IEEE Trans Robot 23(5):972–980

    Google Scholar 

  72. Goris K, Saldien J, Lefeber D (2008) Probo, a testbed for human robot interaction. In: ACM/IEEE international conference on human–robot interaction, pp 253–254

    Google Scholar 

  73. Saldien J, Goris K, Vanderborght B, Lefeber D (2008) On the design of an emotional interface for the huggable robot probo. In: The reign of catz and dogz, AISB2008

    Google Scholar 

  74. Goris K, Saldien J, Vanderborght B, Lefeber D (2008) The huggable robot probo: design of a robotic head. In: The reign of catz and dogz, AISB2008

    Google Scholar 

  75. Poel M, Heylen D, Nijholt A, Meulemans M, Breemen A (2009) Gaze behaviour, believability, likability and the iCat. AI Soc 24:61–73

    Google Scholar 

  76. Van Breemen AJN (2004) Animation engine for believable interactive user-interface robots. In: IEEE/RSJ international conference on intelligent robots and systems, vol 3, pp 2873–2878

    Google Scholar 

  77. Ronald C, Fujita M, Tsuyoshi T, Rika H (2003) An ethological and emotional basis for human–robot interaction. Robot Auton Syst 42:191–201

    MATH  Google Scholar 

  78. Oh JH, Hanson D, Kim WS, Han IY, Kim JY, Park IW (2006) Design of android type humanoid robot Albert HUBO. In: IEEE international conference on intelligent robots and systems, pp 1428–1433

    Google Scholar 

  79. Miwa H, Itoh K, Matsumoto M, Zecca M, Takanobu H, Roccella S, Carrozza MC, Takanishi A (2004) Effective affective statesal expressions with affective states expression humanoid robot WE-4RII—integration of humanoid robot hand RCH-1. In: International conference on intelligent robots and systems, vol 3, pp 2203–2208

    Google Scholar 

  80. Ogura Y, Aikawa H, Shimomura K, Kondo H, Morishima A, Lim HO, Takanishi A (2006) Development of a new humanoid robot WABIAN-2. In: IEEE international conference on robotics and automation, pp 76–81

    Google Scholar 

  81. Zecca M, Mizoguch Y, Endo K, Iida F, Kawabata Y, Endo N, Itoh K, Takanishi A (2009) Whole body emotion expressions for expressions for KOBIAN humanoid robot-preliminary experiments with different affective statesal patterns. In: IEEE international workshop on robot and human interactive communication, pp 381–386

    Google Scholar 

  82. Salichs MA, Barber R, Khamis AM, Malfaz M, Gorostiza JF, Pacheco R, Rivas R, García D (2006) Maggie: a robotic platform for human–robot social interaction. In: IEEE conference on robotics, automation and mechatronics

    Google Scholar 

  83. Gorostiza J, Barber R, Khamis A, Pacheco M, Rivas R, Corrales A, Delgado E, Salichs M (2006) Multimodal human–robot interaction framework for a personal robot. In: International symposium on robot and human interactive communication, pp 39–44

    Google Scholar 

  84. Kormushev P, Nenchev DN, Calinon S, Caldwell DG (2011) Upper-body kinesthetic teaching of a free-standing humanoid robot. In: International conference on robotics and automation, pp 3970–3975

    Google Scholar 

  85. Ishida T, Kuroki Y, Yamaguchi J (2003) Development of mechanical system for a small biped entertainment robot. In: International workshop on robot and human interactive communication, pp 297–302

    Google Scholar 

  86. Park IW, Kim JY, Lee J, Oh JH (2005) Mechanical design of humanoid robot platform KHR-3 (KAIST humanoid robot—3: HUBO). In: International conference on humanoid robots, pp 321–326

    Google Scholar 

  87. Okada K, Ogura T, Haneda A, Kousaka D, Nakai H, Inaba M, Inoue H (2004) Integrated system software for HRP2 humanoid. In: International conference on robotics and automation, vol 4, pp 3207–3212

    Google Scholar 

  88. Cousins S (2010) ROS on the PR2. IEEE Robot Autom Mag 17(3):23–25

    Google Scholar 

  89. Bischoff R, Huggenberger U, Prassler E (2011) KUKA youBot-a mobile manipulator for research and education. In: International conference on robotics and automation, pp 1–4

    Google Scholar 

  90. Goodrich MA, Schultz AC (2007) Human–robot interaction: a survey. Found Trends Hum-Comput Interact 1(3):203–275

    MATH  Google Scholar 

  91. Castleman KR (1996) Digital image processing. Prentice Hall, New York

    Google Scholar 

  92. Kinect, Accessed 9 December, 2011 [Online] Available from: http://en.wikipedia.org/wiki/Kinect

  93. Bumblebee2, Accessed 2010 [Online] Available from: http://www.ptgrey.com/products/stereo.asp

  94. Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259

    Google Scholar 

  95. Darrell T, Gordon GM, Harville M, Woodfill J (2000) Integrated person tracking using stereo, color, and pattern detection. Int J Comput Vis 37(2):175–185

    MATH  Google Scholar 

  96. Wang X, Xu H, Wang H, Li H (2008) Robust real-time face detection with skin color detection and the modified census transform. In: IEEE international conference on information and automation, pp 590–595

    Google Scholar 

  97. Kakumanu P, Makrogiannis S, Bourbakis N (2007) A survey of skin-color modeling and detection methods. Pattern Recognit 40:1106–1122

    MATH  Google Scholar 

  98. Ford A, Roberts A (1998) Colour space conversions

  99. Ruesch J, Lopes M, Bernardino A, Hörnstein J, Santos-Victor J, Pfeifer R (2008) Multimodal saliency-based bottom-up attention a framework for the humanoid robot iCub. In: IEEE international conference on robotics and automation, pp 962–967

    Google Scholar 

  100. Chen J, Tiddeman B (2007) Facial feature detection under various illuminations. In: Lecture notes in computer science, vol 4841. Springer, Berlin, pp 498–508

    Google Scholar 

  101. Zabih R, Woodfill J (1994) Non-parametric local transforms for computing visual correspondence. In: European conference on computer vision, pp 151–158

    Google Scholar 

  102. Song M, Tao D, Liu Z, Li X, Zhou M (2009) Image ratio features for facial expression recognition application. IEEE Trans Syst Man Cyber Part B Cyber. doi:10.1109/TSMCB.2009.2029076

  103. Wang L, He D-C (1990) Texture classification using texture spectrum. Pattern Recognit 23(8):905–910

    Google Scholar 

  104. Ojala T, Pietikäinen M, Harwood D (1996) Texture classification using texture spectrum. Pattern Recognit 29(1):51–59

    Google Scholar 

  105. Ojala T, Pietikäine M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987

    Google Scholar 

  106. Jin H, Liu Q, Lu H, Tong X (2004) Face detection using improved LBP under Bayesian framework. In: International conference on image and graphics, pp 306–309

    Google Scholar 

  107. Ahonen T, Hadid A, Matti P (2006) Face description with local binary patterns: application to face recognition. IEEE Trans Pattern Anal Mach Intell 28(12):2037–2041

    Google Scholar 

  108. Shan C, Gong S, McOwan PW (2009) Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis Comput 27(6):803–816

    Google Scholar 

  109. Solar J, Quinteros J (2008) Illumination compensation and normalization in eigenspace-based face recognition: a comparative study of different pre-processing approaches. Pattern Recognit Lett 29:1966–1979

    Google Scholar 

  110. Zhao G, Pietikainen M (2009) Boosted multi-resolution spatiotemporal descriptors for facial expression recognition. Pattern Recognit Lett 30(12):1117–1127

    Google Scholar 

  111. Zabih R, Woodfill J (1996) A non-parametric approach to visual correspondence. IEEE Trans Pattern Anal Mach Intell

  112. Christian K, Ernst A (2006) Face detection and tracking in video sequences using the modified census transformation. Image Vis Comput 24:564–572

    Google Scholar 

  113. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Google Scholar 

  114. Lowe DG (1999) Object recognition from local scale-invariant features. In: International conference on computer vision, pp 1150–1157

    Google Scholar 

  115. Tian YL, Kanade T, Conn JF (2001) Recognizing action units for facial expression analysis. IEEE Trans Pattern Anal Mach Intell 23(2):97–115

    Google Scholar 

  116. Kass M, Witkin A, Terzopoulos D (1988) Snakes: active contour models. Int J Comput Vis 1(4):321–331

    Google Scholar 

  117. Illingworth J, Kittler J (1987) The adaptive hough transform. IEEE Trans Pattern Anal Mach Intell 9(5):690–698

    Google Scholar 

  118. Levi K, Weiss Y (2004) Learning object detection from a small number of examples: the importance of good features. In: IEEE computer society conference on computer vision and pattern recognition, vol 2, pp II53–II60

    Google Scholar 

  119. Raman M, Himanshu A (2009) Study and comparison of various image edge detection techniques. Int J Image Process 3(1):1–12

    Google Scholar 

  120. Horn BKP, Schunck BG (1981) Determining optical flow. Artif Intell 17(1–3):185–203

    Google Scholar 

  121. Brox T, Bruhn A, Papenberg N, Weickert J (2004) High accuracy optical flow using a theory for warping. In: Lecture notes in computer science, vol 3024. Springer, Berlin, pp 25–36

    Google Scholar 

  122. Bab-Hadiashar A, Suter D (1998) Robust optic flow computation. Int J Comput Vis 29(1):59–77

    Google Scholar 

  123. Iida F (2003) Biologically inspired visual odometer for navigation of a flying robot. Robot Auton Syst 44:201–208

    Google Scholar 

  124. Cédras C, Shah M (1995) Motion-based recognition: a survey. Image Vis Comput 13(2):129–155

    Google Scholar 

  125. Moeslund T, Granum E (2001) A survey of computer vision-based human motion capture. Comput Vis Image Underst 81:231–268

    MATH  Google Scholar 

  126. Wang J, Singh S (2003) Video analysis of human dynamics: a survey. Real-Time Imaging 9:321–346

    Google Scholar 

  127. Lu J, Zhang E (2007) Gait recognition for human identification based on ICA and fuzzy SVM through multiple views fusion. Pattern Recognit Lett 28(16):2401–2411

    Google Scholar 

  128. Lu J, Tan Y-P (2010) Uncorrelated discriminant nearest feature line analysis for face recognition. IEEE Signal Process Lett 17(2):185–188

    Google Scholar 

  129. Lu J, Tan Y-P (2010) Uncorrelated discriminant simplex analysis for view-invariant gait signal computing. Pattern Recognit Lett 31(5):382–393

    Google Scholar 

  130. Lu J, Tan Y-P (2010) Gait-based human age estimation. IEEE Trans Inf Forensics Secur 5(4):761–770

    MathSciNet  Google Scholar 

  131. Lu J (2010) Enhanced locality sensitive discriminant analysis for image recognition. Electron Lett 46(3):217–218

    Google Scholar 

  132. Lu J, Tan Y-P (2010) A doubly weighted approach for appearance-based subspace learning methods. IEEE Trans Inf Forensics Secur 5(1):71–81

    Google Scholar 

  133. Lu J, Tan Y-P (2010) Regularized locality preserving projections and its extensions for face recognition. IEEE Trans Syst Man Cybern, Part B, Cybern 40(2):958–963

    Google Scholar 

  134. Lu J, Tan Y-P (2010) Cost-sensitive subspace learning for face recognition. In: IEEE international conference on computer vision and pattern recognition, pp 2661–2666

    Google Scholar 

  135. Lu J, Tan Y-P (2011) Nearest feature space analysis for classification. IEEE Signal Process Lett 18(1):55–58

    MathSciNet  Google Scholar 

  136. Liu N, Lu J, Tan Y-P (2011) Joint subspace learning for view-invariant gait recognition. IEEE Signal Process Lett 18(7):431–434

    Google Scholar 

  137. Lu J, Zhou X, Tan Y-P, Shang Y, Zhou J (2012) Cost-sensitive semi-supervised discriminant analysis for face recognition. IEEE Trans Inf Forensics Secur 7(3):944–953

    Google Scholar 

  138. Lu J, Tan Y-P (2013) Cost-sensitive subspace analysis and extensions for face recognition. IEEE Trans Inf Forensics Secur 7(3):510–519

    Google Scholar 

  139. Lu J, Tan Y-P, Wang G (2013) Discriminative multimanifold analysis for face recognition from a single training sample per person. IEEE Trans Pattern Anal Mach Intell 35(1):39–51

    Google Scholar 

  140. Lu J, Zhang E, Kang X, Xue Y, Chen Y (2006) Palmprint recognition using wavelet decomposition and 2D principal component analysis. In: International conference on communications, circuits and systems proceedings, pp 2133–2136

    Google Scholar 

  141. Lu J, Zhao Y, Xue Y, Hu J (2008) Palmprint recognition via locality preserving projections and extreme learning machine neural network. In: International conference on signal processing, pp 2096–2099

    Google Scholar 

  142. Zhang E, Lu J, Duan G (2005) Gait recognition via independent component analysis based on support vector machine and neural network. In: International conference on natural computation, pp 640–649

    Google Scholar 

  143. Turk M, Pentland A (1991) Eigenfaces for recognition. J Cogn Neurosci 3(1):71–86

    Google Scholar 

  144. Belhumenur PN, Hepanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherface: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720

    Google Scholar 

  145. He X, Yan S, Hu Y, Niyogi P, Zhang HJ (2005) Face recognition using Laplacian faces. IEEE Trans Pattern Anal Mach Intell 27(3):328–340

    Google Scholar 

  146. Dabbaghchian SP, Ghaemmaghami M, Aghagolzadeh A (2010) Feature extraction using discrete cosine transform and discrimination power analysis with a face recognition technology. Pattern Recognit 43:1431–1440

    MATH  Google Scholar 

  147. Stiefelhagen R, Ekenel HK, Fügen C, Gieselmann P, Holzapfel H, Kraft F, Nickel K, Waibel A (2007) Enabling multimodal human–robot interaction for the Karlsruhe humanoid robot. IEEE Trans Robot 23(5):840–851

    Google Scholar 

  148. Liu C, Wechsler H (2002) Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans Image Process 11(4):467–476

    Google Scholar 

  149. Lu J, Zhao Y, Hu J (2009) Enhanced Gabor-based region covariance matrices for palmprint recognition. Electron Lett 45(17):880–881

    Google Scholar 

  150. Tong Y, Liao W, Ji Q (2007) Facial action unit recognition by exploiting their dynamic and semantic relationships. IEEE Trans Pattern Anal Mach Intell 29(10):1683–1699

    Google Scholar 

  151. Susskind JM, Littlewort G, Bartlett MS (2007) Human and computer recognition of facial expressions of emotion. Neuropsychologia 45(1):152–162

    Google Scholar 

  152. Pavani SK, Delgado D, Frangi AF (2010) Haar-like features with optimally weighted rectangles for rapid object detection. Pattern Recognit 43(1):160–172

    MATH  Google Scholar 

  153. Papageorgiou CP, Oren M, Poggio T (1998) A general framework for object detection. In: IEEE international conference on computer vision, pp 555–562

    Google Scholar 

  154. Yang P, Li Q, Metaxas DN (2009) Boosting encoded dynamic features for facial expression recognition. Pattern Recognit Lett 30(2):132–139

    Google Scholar 

  155. Lai K, Bo L, Ren X, Fox D (2011) A large-scale hierarchical multi-view RGB-D object dataset. In: IEEE international conference on robotics and automation, pp 1817–1824

    Google Scholar 

  156. Martin AF, Robert, CB (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395

    Google Scholar 

  157. Benavidez P, Jamshidi M (2011) Mobile robot navigation and target tracking system. In: International conference on system of systems engineering, pp 299–304

    Google Scholar 

  158. Johnson A, Hebert M (1999) Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Trans Pattern Anal Mach Intell 21(5):433–449

    Google Scholar 

  159. Rusu RB, Blodow N, Beetz M (2009) Fast point feature histograms (FPFH) for 3D registration. In: IEEE international conference on robotics and automation, pp 3212–3217

    Google Scholar 

  160. Bo L, Ren X, Fox D (2011) Depth kernel descriptors for object recognition. In: International conference on intelligent robots and systems, pp 821–826

    Google Scholar 

  161. Hartley R, Zisserman A (2000) Multiple view geometry in computer vision. Cambridge University Press, Cambridge, pp 1–12

    MATH  Google Scholar 

  162. Kim I, Kim D, Cha Y, Lee K, Kuc T (2007) An embodiment of stereo vision system for mobile robot for real-time measuring distance and object tracking. In: International conference on control, automation and systems, pp 1029–1033

    Google Scholar 

  163. Li Z, Jarvis R (2009) A multi-modal gesture recognition system in a human–robot interaction scenario. In: International workshop on robotic and sensors environments, pp 41–46

    Google Scholar 

  164. Thompson S, Kagami S (2005) Humanoid robot localisation using stereo vision. In: International conference on humanoid robots, pp 19–25

    Google Scholar 

  165. Prasad R, Saruwatari H, Shikano K (2004) Robots that can hear, understand and talk. Adv Robot 18(5):533–564

    Google Scholar 

  166. Sweeney L, Thompson P (1997) Speech perception using real-time phoneme detection: the BeBe system

  167. Jaisal PK, Mishra PK (2012) A review of speech pattern recognition: survey. Int J Comput Sci Technol 3(1):709–713

    Google Scholar 

  168. Clavel C, Vasilescu I, Devillers L, Richard G, Ehrette T (2008) Fear-type emotion recognition for future audio-based surveillance systems. Speech Commun 50:487–503

    Google Scholar 

  169. Vogt T, André E, Johannes W (2008) Automatic recognition of emotion from speech: a review of the literature and recommendations for practical realisation. In: Lecture note in computer science, vol 4868. Springer, Berlin, pp 75–91

    Google Scholar 

  170. Hyun K, Kim E, Kwak Y (2007) Emotional feature extraction based on phoneme information for speech emotion recognition. In: IEEE international conference on robot & human interactive communication, pp 802–806

    Google Scholar 

  171. Devillers L, Vidrascu L, Lamel L (2005) Challenges in real-life emotion annotation and machine learning based detection. Neural Netw 18:407–422

    Google Scholar 

  172. Rong J, Gang L, Chen Y (2008) Acoustic feature selection for automatic emotion recognition from speech. Inf Process Manag 45(3):315–328

    Google Scholar 

  173. Hegel F, Spexard T, Wrede B, Horstmann G, Vogt T (2006) Playing a different imitation game: interaction with an empathic android robot. In: IEEE-RAS international conference on humanoid robots, pp 56–61

    Google Scholar 

  174. Morrison D, Wang R, Silva L (2007) Ensemble methods for spoken emotion recognition in call-centres. Speech Commun 49:98–112

    Google Scholar 

  175. Markel JD (1972) The SIFT algorithm for fundamental frequency estimation. IEEE Trans Audio Electroacoust AU-20(5):367–377

    Google Scholar 

  176. Wang C, Seneff S (2000) Robust pitch tracking for prosodic modeling in telephone speech. In: IEEE international conference on acoustics, speech and signal processing, vol 3, pp 1343–1346

    Google Scholar 

  177. Ahmadi S, Spanias AS (1999) Cepstrum-based pitch detection using a new statistical V/UV classification algorithm. IEEE Trans Speech Audio Process 7(3):333–338

    Google Scholar 

  178. Xu M, Duan LY, Cai J, Chia, LT, Xu C, Tian Q (2004) CHMM-based audio keyword generation. In: Lecture notes in computer science, vol 3333. Springer, Berlin, pp 566–574

    Google Scholar 

  179. Kim E, Hyun K, Kim S, Kwak Y (2009) Improved emotion recognition with a novel speaker-independent feature, IEEE/ASME Trans Mechatron. doi:10.1109/TMECH.2008.2008644

  180. Welch P (1967) The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Trans Audio Electroacoust AU-15:70–73

    MathSciNet  Google Scholar 

  181. Shibata T, Inoue K, Irie R (1996) Affective statesal robot for intelligent system—artificial affective statesal creature project. In: IEEE international workshop on robot and human communication, pp 466–471

    Google Scholar 

  182. Dao D, Sugiyama S (2006) Fabrication and characterization of 4-DOF soft-contact tactile sensor and application to robot fingers. In: International symposium on micro-NanoMechatronics and human science, pp 1–6

    Google Scholar 

  183. Tsetserukou D, Kawakami N, Tachi S (2008) An approach to contact force vector determination and its implementation to provide intelligent tactile interaction with environment. In: Lecture notes in computer science, vol 5024. Springer, Berlin, pp 151–156

    Google Scholar 

  184. Iwata H, Hoshino H, Morita T, Sugano S (2001) Force detectable surface covers for humanoid robots. In: International conference on advanced intelligent mechatronics, pp 1205–1210

    Google Scholar 

  185. Stiehl W, Breazeal C (2006) A sensitive skin for robotic companions featuring temperature, force, and electric field sensors. In: IEEE/RSJ international conference on intelligent robots and systems, pp 1952–1959

    Google Scholar 

  186. Stiehl W, Lieberman J, Breazeal C, Basel L, Lalla L, Wolf M (2005) Design of a therapeutic robotic companion for relational, affective touch. In: IEEE international workshop on robots and human interactive communication, pp 408–415

    Google Scholar 

  187. Shibata T (2004) Ubiquitous surface tactile sensor. In: IEEE technical exhibition based conference on robotics and automation, pp 5–6

    Google Scholar 

  188. Berger DA (1988) On using a tactile sensor for real-time feature extraction. Master’s thesis, Carnegie-Mellon University

    Google Scholar 

  189. Iwata H, Sugano S (2005) Human–robot-contact-state identification based on tactile recognition. IEEE Trans Ind Electron 52(6):1468–1477

    Google Scholar 

  190. Göger D, Gorges N, Wörn H (2009) Tactile sensing for an anthropomorphic robotic hand: hardware and signal processing. In: IEEE international conference on robotics and automation, pp 895–901

    Google Scholar 

  191. Carotenuto L, Famularo D, Muraca P, Raiconi G (1997) A fuzzy classifier for tactile sensing. J Intell Robot Syst Theory Appl 20(1):71–86

    Google Scholar 

  192. Glas DF, Miyashit T, Ishiguro H, Hagita N (2007) Laser tracking of human body motion using adaptive shape modeling. In: IEEE international conference on intelligent robots and systems, pp 602–608

    Google Scholar 

  193. Gockley R, Forlizzi J, Simmons R (2007) Natural person-following behavior for social robots. In: ACM/IEEE international conference on human-robot interaction, pp 17–24

    Google Scholar 

  194. Jung B, Sukhatme GS (2009) Real-time motion tracking from a mobile robot. Int J Soc Robot. doi:10.1007/s12369-009-0038

  195. Glas DF, Miyashit T, Ishiguro H, Hagita N (2009) Laser-based tracking of human position and orientation using parametric shape modeling. Adv Robot 23:405–428

    Google Scholar 

  196. Morales J, Martinez JL, Mandow A, Pequeno-Boter A, Garcia-Cerezo A (2011) Design and development of a fast and precise low-cost 3D laser rangefinder. In: International conference on mechatronics, pp 621–626

    Google Scholar 

  197. Scholer F, Behley J, Steinhage V, Schulz D, Cremers AB (2011) Person tracking in three-dimensional laser range data with explicit occlusion adaption. In: International conference on robotics and automation, pp 1297–1303

    Google Scholar 

  198. Spinello L, Arras KO, Triebel R, Siegwart R (2010) A layered approach to people detection in 3D range data. In: Proceedings of the national conference on artificial intelligence, vol 3, pp 1625–1630

    Google Scholar 

  199. Navarro-Serment LE, Mertz C, Hebert M (2010) Pedestrian detection and tracking using three-dimensional LADAR data. Int J Robot Res 29(12):1516–1528

    Google Scholar 

  200. Harrison A, Newman P (2008) High quality 3D laser ranging under general vehicle motion. In: International conference on robotics and automation, pp 7–12

    Google Scholar 

  201. Pantic M, Leon J (2003) Toward an affect-sensitive multimodal human–computer interaction. Proc IEEE 91(9):1370–1390

    Google Scholar 

  202. Fragopanagos N, Taylor J (2005) Emotion recognition in human–computer interaction. Neural Netw 18(4):389–405

    Google Scholar 

  203. Zeng Z, Tu J, Brian M, Huang T (2008) Audio-visual affective expression recognition through multistream fused HMM. IEEE Trans Multimed 10(4):570–577

    Google Scholar 

  204. Johnson DO, Agah A (2009) Human robot interaction through semantic integration of multiple modalities, dialog management, and contexts. Int J Soc Robot 1:283–305

    Google Scholar 

  205. Spexard T, Hanheide M (2007) Gerhard sagerer, human-oriented interaction with an anthropomorphic robot. IEEE Trans Robot 23(5):852–862

    Google Scholar 

  206. Turk M, Pentland A (1991) Eigedces for recognition. J Cogn Neurosci 3(1):71–86

    Google Scholar 

  207. Guillaume L, Miroslav R (2009). Directed reading: boosting algorithms

  208. Inamura T, Toshima I, Nakamura Y (2003) Acquiring motion elements for bidirectional computation of motion recognition and generation. Exp Robot VIII, 5:372–381

    Google Scholar 

  209. Esau N, Kleinjohann L, Kleinjohann B (2005) An adaptable fuzzy affective states model for affective states recognition. In: European society for fuzzy logic and technology, pp 73–78

    Google Scholar 

  210. Altun H, Polat G (2009) Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection. Expert Syst Appl 36(4):8197–8203

    Google Scholar 

  211. Zhai Y, Yeary MB, Cheng S, Kehtarnavaz N (2009) An object-tracking algorithm based on multiple-model particle filtering with state partitioning. IEEE Trans Instrum Meas 58(5):1797–1809

    Google Scholar 

  212. Wu X, Gong H, Chen P, Zhong Z, Xu Y (2009) Surveillance robot utilizing video and audio information. J Intell Robot Syst 55(4–5):403–421

    MATH  Google Scholar 

  213. Nummiaro K, Koller-Meier, E, Van Gool, L (2003) An adaptive color-based particle filter. Image Vis Comput 21(1):99–110

    Google Scholar 

  214. Kwon HS, Kim, YJ, Lim, MT (2005) Person tracking with a mobile robot using particle filters in complex environment. In: International society for optical engineering, vol 6042. SPIE Press, Bellingham

    Google Scholar 

  215. Muñoz-Salinas R, García-Silvente M, Medina Carnicer R (2008) Adaptive multi-modal stereo people tracking without background modelling. J Vis Commun Image Represent 19(2):75–91

    Google Scholar 

  216. Tao Z, Biwen Z, Lee L, Kaber D (2008) Service robot anthropomorphism and interface design for emotion in human–robot interaction. In: IEEE conference on automation science and engineering, pp 674–679

    Google Scholar 

  217. Serrano A, de Diego IM, Conde C, Cabello E (2009) Recent advances in face biometrics with Gabor wavelets: a review. Pattern Recogn Lett. doi:10.1016/j.patrec.2009.11.002

  218. Whitehill J, Littlewort G, Fasel I, Bartlett M, Movellan J (2009) Toward practical smile detection. IEEE Trans Pattern Anal Mach Intell 31(11):2106–2111

    Google Scholar 

  219. Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B (2005) A database of German emotional speech. In: European conference on speech communication and technology, pp 1517–1520

  220. Quast H (2001) Automatic recognition of nonverbal speech: an approach to model the perception of para- and extralinguistic vocal communication with neural networks. Master’s thesis, University of Göttingen

  221. Kanade T, Cohn J, Tian YL (2000) Comprehensive database for facial expression analysis. In: IEEE international conference on face and gesture analysis, pp 46–53

    Google Scholar 

  222. Zhang W, Shan S, Gao W, Chen X, Zhang H (2005) Local Gabor binary pattern histogram sequence (LGBPHS): a novel non-statistical model for face representation and recognition. In: IEEE international conference on computer vision, vol 1, pp 786–791

    Google Scholar 

  223. Goodrich MA, Schultz AC (2007) Human–robot interaction: a survey. Found Trends Hum-Comput Interact 1(3):203–275

    MATH  Google Scholar 

  224. Drury JL, Scholtz J, Yanco HA (2004) Applying CSCW and HCI techniques to human–robot interaction. In: CHI 2004 workshop on shaping human–robot interaction

    Google Scholar 

  225. Moller A, Roalter L, Kranz M (2011) Cognitive objects for human–computer interaction and human–robot interaction. In: HRI2011, 6–9 March, Lausanne, Switzerland

    Google Scholar 

  226. Saldien J, Goris K, Vanderborght B, Vanderfaeillie J, Lefeber D (2010) Cognitive Objects Hum-Comput Interact Hum-Robot Interact 2(4):377–389

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haibin Yan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yan, H., Ang, M.H. & Poo, A.N. A Survey on Perception Methods for Human–Robot Interaction in Social Robots. Int J of Soc Robotics 6, 85–119 (2014). https://doi.org/10.1007/s12369-013-0199-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12369-013-0199-6

Keywords

Navigation