Skip to main content
Log in

Learning Novel Objects for Extended Mobile Manipulation

  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

We propose a method for learning novel objects from audio visual input. The proposed method is based on two techniques: out-of-vocabulary (OOV) word segmentation and foreground object detection in complex environments. A voice conversion technique is also involved in the proposed method so that the robot can pronounce the acquired OOV word intelligibly. We also implemented a robotic system that carries out interactive mobile manipulation tasks, which we call “extended mobile manipulation”, using the proposed method. In order to evaluate the robot as a whole, we conducted a task “Supermarket” adopted from the RoboCup@Home league as a standard task for real-world applications. The results reveal that our integrated system works well in real-world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Inamura, T., Okada, K., Tokutsu, S., Hatao, N., Inaba, M., Inoue, H.: HRP-2W: a humanoid platform for research on support behavior in daily life environments. Robot. Auton. Syst. 57(2), 145–154 (2009)

    Article  Google Scholar 

  2. Wyrobek, K., Berger, E., Van der Loos, H., Salisbury, J.: Towards a personal robotics development platform: rationale and design of an intrinsically safe personal robot. IEEE Int. Conf. Robot. Autom. 2165–2170 (2008)

  3. Weisshardt, F., Reiser, U., Parlitz, C., Verl, A.: Making high-tech service robot platforms available. In: Proceedings-ISR/ROBOTIK 2010 (2010)

  4. Stückler, J., Behnke, S.: Integrating indoor mobility, object manipulation, intuitive interaction for domestic service tasks. In: IEEE-RAS International Conference on Humanoid Robots (2009)

  5. Holz, D., Paulus, J., Breuer, T., Giorgana, G., Reckhaus, M., Hegger, F., Müller, C., Jin, Z., Hartanto, R., Ploeger, P., et al.: The b-it-bots RoboCup@ home 2009 team description paper. RoboCup 2009@ Home League Team Descriptions, Graz, Austria (2009)

  6. RoboCup@Home: (2010)

  7. 2010 Mobile Manipulation Challenge: http://www.willowgarage.com/mmc10 (2010)

  8. Semantic Robot Vision Challenge: http://www.semantic-robot-vision-challenge.org/ (2009)

  9. Bazzi, I., Glass, J.: A multi-class approach for modelling out-of-vocabulary words. In: Seventh International Conference on Spoken Language Processing (2002)

  10. Nakano, M., Iwahashi, N., Nagai, T., Sumii, T., Zuo, X., Taguchi, R., Nose, T., Mizutani, A., Nakamura, T., Attamim, M., et al.: Grounding new words on the physical world in multi-domain human-robot dialogues. In: 2010 AAAI Fall Symposium Series, pp. 74–79 (2010)

  11. Holzapfel, H., Neubig, D., Waibel, A.: A dialogue approach to learning object descriptions and semantic categories. Robot. Auton. Syst. 56(11):1004–1013 (2008)

    Article  Google Scholar 

  12. Toda, T., Ohtani, Y., Shikano, K.: One-to-many and many-to-one voice conversion based on eigenvoices. In: IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 4, pp. 1249–1252 (2007)

  13. Rother, C., Kolmogorov, V., Blake, A.: Grabcut: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. (TOG) 23(3), 309–314 (2004)

    Article  Google Scholar 

  14. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2002)

    Google Scholar 

  15. Mishra, A.K., Aloimonos, Y.: Active segmentation. Int. J. Human. Rob. 6, 361–386 (2009)

    Article  Google Scholar 

  16. Hasler, S., Wersing, H., Kirstein, S., Körner, E.: Large-scale real-time object identification based on analytic features. In: Artificial Neural Networks–ICANN 2009, pp. 663–672 (2009)

  17. Kim, H., Murphy-Chutorian, E., Triesch, J.: Semi-autonomous learning of objects. In: Computer Vision and Pattern Recognition Workshop, p. 145 (2006)

  18. Wersing, H., Kirstein, S., Gotting, M., Brandl, H., Dunn, M., Mikhailova, I., Goerick, C., Steil, J., Ritter, H., Korner, E.: Online learning of objects in a biologically motivated visual architecture. Int. J. Neural Syst. 17(4), 219–230 (2007)

    Article  Google Scholar 

  19. Iwahashi, N.: Robots that learn language: developmental approach to human-machine conversations. In: Symbol Grounding and Beyond, pp. 143–167 (2006)

  20. Roy, D.: Grounding words in perception and action: computational insights. Trends Cogn. Sci. 9(8), 389–396 (2005)

    Article  Google Scholar 

  21. Fujita, M., Hasegawa, R., Takagi, T., Yokono, J., Shimomura, H.: An autonomous robot that eats information via interaction with humans and environments. In: IEEE International Workshop on Robot and Human Interactive Communication, pp. 383–389 (2002)

  22. Johnson-Roberson, M., Skantze, G., Bohg, J., Gustafson, J., Carlson, R., Kragic, D.: Enhanced visual scene understanding through human-robot dialog. In: 2010 AAAI Fall Symposium on Dialog with Robots (2010)

  23. Mesa imaging: http://www.mesa-imaging.ch/index.php

  24. Okada, K., Kagami, S., Inaba, M., Inoue, H.: Plane segment finder: algorithm, implementation and applications. IEEE Int. Conf. Robot. Autom. 2, 2120–2125 (2005)

    Google Scholar 

  25. Nakamura, S., Markov, K., Nakaiwa, H., Kikui, G., Kawai, H., Jitsuhiro, T., Zhang, J., Yamamoto, H., Sumita, E., Yamamoto, S.: The ATR multilingual speech-to-speech translation system. IEEE Trans. Audio, Speech, Lang. Process. 14(2), 365–376 (2006)

    Article  Google Scholar 

  26. Fujimoto, M., Nakamura, S.: Sequential non-stationary noise tracking using particle filtering with switching dynamical system. In: IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1 (2006)

  27. Kawai, H., Toda, T., Ni, J., Tsuzaki, M., Tokuda, K.: XIMERA: a new TTS from ATR based on corpus-based technologies. In: Fifth ISCA Workshop on Speech Synthesis, pp. 179–184 (2004)

  28. Okada, H., Omori, T., Iwahashi, N., Sugiura, K., Nagai, T., Watanabe, N., Mizutani, A., Nakamura, T., Attamimi, M.: Team eR@sers 2009 in the @home league team description paper (2009)

  29. Nene, S.A., Nayar, S.K., Murase, H.: Columbia Object Image Library (COIL-100). Technical report (1996)

  30. International Telecommunication Union: ITU-T P.800. http://www.itu.int/rec/T-REC-P.800/en

  31. Attamimi, M., Mizutani, A., Nakamura, T., Nagai, T., Funakoshi, K., Nakano, M.: Real-time 3D visual sensor for robust object recognition. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4560–4565 (2010)

  32. RoboCup@Home league committee: RoboCup@ Home rules & regulations. http://www.ai.rug.nl/robocupathome/documents/rulebook2009_FINAL.pdf (2009)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tomoaki Nakamura.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nakamura, T., Sugiura, K., Nagai, T. et al. Learning Novel Objects for Extended Mobile Manipulation. J Intell Robot Syst 66, 187–204 (2012). https://doi.org/10.1007/s10846-011-9605-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10846-011-9605-1

Keywords

Navigation