ROS open-source audio recognizer: ROAR environmental sound detection tools for robot programming

Romano, Joseph M.; Brindza, Jordan P.; Kuchenbecker, Katherine J.

doi:10.1007/s10514-013-9323-6

ROS open-source audio recognizer: ROAR environmental sound detection tools for robot programming

Published: 05 February 2013

Volume 34, pages 207–215, (2013)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Joseph M. Romano¹,
Jordan P. Brindza² &
Katherine J. Kuchenbecker³

1099 Accesses
11 Citations
Explore all metrics

Abstract

Advances in audio recognition have enabled the real-world success of a wide variety of interactive voice systems over the last two decades. More recently, these same techniques have shown promise in recognizing non-speech audio events. Sounds are ubiquitous in real-world manipulation, such as the click of a button, the crash of an object being knocked over, and the whine of activation from an electric power tool. Surprisingly, very few autonomous robots leverage audio feedback to improve their performance. Modern audio recognition techniques exist that are capable of learning and recognizing real-world sounds, but few implementations exist that are easily incorporated into modern robotic programming frameworks. This paper presents a new software library known as the ROS Open-source Audio Recognizer (ROAR). ROAR provides a complete set of end-to-end tools for online supervised learning of new audio events, feature extraction, automatic one-class Support Vector Machine model tuning, and real-time audio event detection. Through implementation on a Barrett WAM arm, we show that combining the contextual information of the manipulation action with a set of learned audio events yields significant improvements in robotic task-completion rates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Applications in Intelligent Sound Analysis

Adaptive Interface for Mapping Body Movements to Sounds

Intelligent Audio Analysis: A Definition

References

Benesty, J., Sondhi, M. M., & Huang, Y. (Eds.). (2008). Springer handbook of speech processing. Berlin: Springer.
Google Scholar
Borst, C., Wimbock, T., Schmidt, F., Fuchs, M., Brunner, B., Zacharias, F., et al. (2009). Rollin’ Justin—mobile platform with variable base. In Proceedings of the IEEE international conference on robotics and automation.
Cai, R., Lu, L., Hanjalic, A., Zhang, H. J., & Cai, L. H. (2006). A flexible framework for key audio effects detection and auditory context inference. IEEE Transactions on Audio, Speech, and Language Processing, 14, 1026–1039.
Article Google Scholar
Chu, S., Narayanan, S., Kuo, C. C. J., Matarić, M.J. (2006). Where am I? Scene recognition for mobile robots using audio features. In Proceedings of the IEEE international conference on multimedia and expo (pp. 885–888).
Ciocarlie, M., Hsiao, K., Jones, G. E., Chitta, S., Rusu, R. B., & Sucan, I. A. (2010). Towards reliable grasping and manipulation in household environments. In Proceedings of the international symposium on experimental, robotics.
Cohen, I. (2002). Noise estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Processing Letters, 9(1), 12–15.
Article Google Scholar
Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28(4), 357–366.
Article Google Scholar
Dufaux, A. (2001). Detection and recognition of impulsive sound signals. Ph.D. thesis, University of Neuchâtel.
Eaton, J. W. (2002). GNU Octave Manual. Network Theory Limited.
Ellis, D. P. W. (2005). PLP and RASTA (and MFCC, and inversion) in Matlab (2005). www.ee.columbia.edu/~dpwe/resources/matlab/rastamat. Online web resource.
Graf, B., Hans, M., & Schraft, R. D. (2004). Care-O-bot II—development of a next generation robotic home assistant. Autonomous Robots, 16(2), 193–205.
Article Google Scholar
Gray, S. R., Romano, J. M., Brindza, J. P., Kim, S., Kuchenbecker, K. J., Kumar, V. (2011). Planning manipulation and grasping tasks with a redundant arm. In Proceedings of the ASME international design engineering technical conferences & computers and information in, engineering conference.
Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America, 87(4), 1738–1752.
Article Google Scholar
Jain, A., & Kemp, C. C. (2010). EL-E: An assistive mobile manipulator that autonomously fetches objects from flat surfaces. Autonomous Robots, 28(1), 45–64.
Article Google Scholar
Lim, A., Mizumoto, T., Cahier, L. K., Otsuka, T., Takahashi, T., Komatani, K., Ogata, T., Okuno, H. G. (2010). Robot musical accompaniment: Integrating audio and visual cues for real-time synchronization with a human flutist. In Proceedings of the IEEE international conference on intelligent robots and systems.
Nakamura, T., Nagai, T., & Iwahashi, N. (2007). Multimodel object categorization by a robot. In Proceedings of the IEEE international conference on intelligent robots and systems.
Okuno, H. G., & Nakadai, K. (2007). Computational auditory scene analysis and its application to robot audition: Five years experience. In Proceedings of the second international conference on informatics research for development of knowledge society infrastructure (pp. 69–76).
Oppenheim, A. V., & Schafer, R. W. (2004). From frequency to quefrency: A history of the cepstrum. IEEE Signal Processing Magazine, 21(5), 95–106.
Article Google Scholar
Portêlo, J., Bugalho, M., Trancoso, I., Neto, J., Abad, A., & Serralheiro, A. (2009). Non-speech audio event detection. In Proceedings of the IEEE international conference on acoustics, speech and, signal processing (pp. 1973–1976).
Quigley, M., Gerkey, B., Conley, K., Faust, J., Foote, T., Leibs, J., et al. (2009). ROS: An Open-source Robot Operating System. In Open-source software workshop of the IEEE international conference on robotics and automation.
Rabaoui, A., Davy, M., Rossignol, S., Lachiri, Z., & Ellouze, N. (2007). Improved one-class SVM classifier for sounds classification. In Proceedings of the IEEE conference on advanced video and signal based surveillance (pp. 117–122).
Ramo, J., Siddiqi, A., Dubrawski, A., Gordon, G., & Sharma, A. (2010). Automatic state discovery for unstructured audio scene classification. In Proceedings of IEEE international conference on acoustic speech and signal processing.
Rodemann, T., Joublin, F., & Goerick, C. (2009). Filtering environmental sounds using basic audio cues in robot audition. In Proceedings of the international conference on advanced robotics.
Rojas, J., & Peters, R. A, I. I. (2005). Sensory integration with articulated motion on a humanoid robot. Applied Bionics and Biomechanics, 2(3–4), 171–178.
Article Google Scholar
Romano, J. M., Hsiao, K., Niemeyer, G., Chitta, S., & Kuchenbecker, K. J. (2011). Human-inspired robotic grasp control with tactile sensing. IEEE Transactions on Robotics.
Sakagami, Y., Watanabe, R., & Aoyama, C.: The intelligent ASIMO: System overview and integration. In Proceedings of the IEEE international conference on intelligent robotics and systems (pp. 2478–2483).
Sarle, W. S. (1997). Neural network FAQ. ftp://ftp.sas.com/pub/neural/FAQ.html. Periodic posting to the Usenet newsgroup comp.ai.neural-nets.
Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., & Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural Computation, 13, 1443–1471.
Article MATH Google Scholar
Sinapov, J., & Stoytchev, A. (2009). From acoustic object recognition to object categorization by a humanoid robot. In Proceedings of the RSS 2009 workshop: Mobile manipulation in human, environments.
Sonnenburg, S., Raetsch, G., Henschel, S., Widmer, C., Behr, J., Zien, A., et al. (2010). The SHOGUN machine learning toolbox. Journal of Machine Learning Research, 11, 1799–1802.
MATH Google Scholar
Srinivasa, S., Ferguson, D., Helfrich, C., Berenson, D., Collet, A., Diankov, R., et al. (2009). Herb: A home exploring robotic butler. Autonomous Robots, 28(1), 5–20.
Article Google Scholar
Torres-Jara, E., Natale, L., & Fitzpatrick, P. (2005). Tapping into touch. In Proceedings of the fifth international workshop on epigenetic robotics: Modeling cognitive development in robotic systems (pp. 79–86).
Valin, J. M., Michaud, F., & Rouat, J. (2007). Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering. Robotics and Autonomous Systems, 55(3), 216–228.
Google Scholar
Valin, J. M., Yamamoto, S., Rouat, J., Michaud, F., Nakadai, K., & Okuno, H. G. (2007). Robust recognition of simultaneous speech by a mobile robot. IEEE Transactions on Robotics, 23(4), 742–752.
Google Scholar
Vaseghi, S. V. (2000). Advanced digital signal processing and noise reduction. Chichester: Wiley.
Wu, X., Gong, H., Chen, P., Zhong, Z., & Xu, Y. (2009). Surveillance robot utilizing video and audio information. Journal of Intelligent & Robotic Systems, 55, 403–421.
Google Scholar

Download references

Acknowledgments

This work was supported by funding from the DARPA Autonomous Robotic Manipulation Software Track (US Army RDECOM contract W91CRB-10-C-0127) and by the University of Pennsylvania.

Author information

Authors and Affiliations

Robotics and Controls Group, Rethink Robotics Inc., Boston, MA, USA
Joseph M. Romano
Department of Computer and Information Science, GRASP Laboratory, University of Pennsylvania, Philadelphia, PA, USA
Jordan P. Brindza
Department of Mechanical Engineering and Applied Mechanics, Haptics Group, GRASP Laboratory, University of Pennsylvania, Philadelphia, PA, USA
Katherine J. Kuchenbecker

Authors

Joseph M. Romano
View author publications
You can also search for this author in PubMed Google Scholar
Jordan P. Brindza
View author publications
You can also search for this author in PubMed Google Scholar
Katherine J. Kuchenbecker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joseph M. Romano.

Electronic Supplementary Material

The Below is the Electronic Supplementary Material.

ESM 1 (MP4 118736 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Romano, J.M., Brindza, J.P. & Kuchenbecker, K.J. ROS open-source audio recognizer: ROAR environmental sound detection tools for robot programming. Auton Robot 34, 207–215 (2013). https://doi.org/10.1007/s10514-013-9323-6

Download citation

Received: 01 May 2012
Accepted: 03 January 2013
Published: 05 February 2013
Issue Date: April 2013
DOI: https://doi.org/10.1007/s10514-013-9323-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ROS open-source audio recognizer: ROAR environmental sound detection tools for robot programming

Abstract

Access this article

Similar content being viewed by others

Applications in Intelligent Sound Analysis

Adaptive Interface for Mapping Body Movements to Sounds

Intelligent Audio Analysis: A Definition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic Supplementary Material

ESM 1 (MP4 118736 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ROS open-source audio recognizer: ROAR environmental sound detection tools for robot programming

Abstract

Access this article

Similar content being viewed by others

Applications in Intelligent Sound Analysis

Adaptive Interface for Mapping Body Movements to Sounds

Intelligent Audio Analysis: A Definition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic Supplementary Material

ESM 1 (MP4 118736 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation