ABSTRACT
Effective communication with a mobile robot using speech is a difficult problem even when you can control the auditory scene. Robot ego-noise, echoes, and human interference are all common sources of decreased intelligibility. In real-world environments, however, these common problems are supplemented with many different types of background noise sources. For instance, military scenarios might be punctuated by high decibel plane noise and bursts from weaponry that mask parts of the speech output from the robot. Even in non-military settings, however, fans, computers, alarms, and transportation noise can cause enough interference that they might render a traditional speech interface unintelligible. In this work, we seek to overcome these problems by applying robotic advantages of sensing and mobility to a text-to-speech interface. Using perspective taking skills to predict how the human user is being affected by new sound sources, a robot can adjust its speaking patterns and/or reposition itself within the environment to limit the negative impact on intelligibility, making a speech interface easier to use.
- Junqua, J-C. The Lombard Reflex and its Role on Human Listeners and Automatic Speech Recognizers. J. Acoustical Society Of America, 93, 1 (1993), 510--524.Google ScholarCross Ref
- Martinson, E. and Brock, D. "Auditory Perspective Taking", Proceeding of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction, Salt Lake City, UT, March 2006, 345--346. Google ScholarDigital Library
- Sofge, D., et al., "Collaborating with Humanoid Robots in Space". International Journal of Humanoid Robotics, 2,2 (2005), 181--201.Google ScholarCross Ref
- Trafton, J.G., et al., "Enabling effective human-robot interaction using perspective-taking in robots". IEEE Trans. on Systems, Man and Cybernetics, Part A, 35, 4(2005), 460--470. Google ScholarDigital Library
- Hiatt, L., Trafton, J., Harrison, A., Schultz, A. A Cognitive Model for Spatial Perspective Taking. In International Conference on Cognitive Modeling. Mahwah, NJ. 2004, 354--355.Google Scholar
- Perzanowski, D., et al., Communicating with teams of cooperative robots. In Multi-Robot Systems: From Swarms to Intelligent Automata, A. Schultz and L. Parker, eds. 2002, Kluwer: The Netherlands, 16--20.Google Scholar
- Brown, G. and Wang, D. "Separation of Speech by Computational Auditory Scene Analysis", Speech Enhancement, J. Benesty, S. Makino and J. Chen (Eds.), Springer, New York, 2005, 371--402.Google ScholarCross Ref
- Brock, D.P. and J.A. Ballas. Audio in VR: Beyond entertainment setups and telephones. In Proceedings of International Conference on Human-Computer Interaction. Las Vegas, NV, 2005.Google Scholar
- Langner, B., and Black, A. Using Speech in Noise to Improve Understandability for Elderly Listeners, ASRU 2005, San Juan, Puerto Rico, 2005, 112--116.Google ScholarCross Ref
- D. Pan, B. Heng, S. Cheung, and E. Chang, Improving Speech Synthesis for High Intelligibility under Adverse Conditions. In Proceedings of the 6th International Conference on Spoken Language Processing, Beijing, China, October 2000.Google ScholarCross Ref
- Schultz, A., and Adams, W. Continuous localization using evidence grids. In Proceedings of IEEE International Conf. on Robotics and Automation, Leuven, Belgium, 1998, 2833--2839.Google ScholarCross Ref
- Yamamoto, S., et al., "Enhanced Robot Speech Recognition Based on Microphone Array Source Separation and Missing Feature Theory". Proceeding of Int. Conf. on Robotics and Automation (ICRA), Barcelona, Spain 2005.Google Scholar
- G. Bradski, A. Kaehler, and V. Pisarevsky, Learning-based computer vision with intel's open source computer vision library. In Intel Technology Journal, 9,1, (May 2005).Google Scholar
- Quatiri, T. Discrete Time Speech Signal Processing, Pearson Education, Dehli, India, 2002. Google ScholarDigital Library
- B. Mungamuru and P. Aarabi, Enhanced Sound Localization, IEEE Trans. on Systems, Man, and Cybernetics, 34, 2004, 1526--1540. Google ScholarDigital Library
- Martinson, E. and Schultz, A. "Auditory Evidence Grids," to be published in Proceeding of Int. Conf. on Intelligent Robots and Systems (IROS), Beijing, China 2006.Google Scholar
- Martinson, E. and Arkin, R. Noise Maps for Acoustically Sensitive Navigation. Proceedings of SPIE, 5609 (December 2004),50--60.Google ScholarCross Ref
- K. Hughes, A. Tokuta, and N. Ranganathan, "Trulla: An Agorithm for Path Planning Among Weighted Regions by Localized Propogations," Proceedings of Int. Conf. on Intelligent Robots and Systems (IROS), Raleigh, NC, 1992.Google Scholar
Index Terms
- Improving human-robot interaction through adaptation to the auditory scene
Recommendations
Effects of Robot Sound on Auditory Localization in Human-Robot Collaboration
HRI '18: Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot InteractionAuditory cues facilitate situational awareness by enabling humans to infer what is happening in the nearby environment. Unlike humans, many robots do not continuously produce perceivable state-expressive sounds. In this work, we propose the use of ...
Autonomy and Common Ground in Human-Robot Interaction: A Field Study
In a two-year study of a collaborative human-robot system, researchers observed a science team in Pittsburgh and a robot in Chile.The system was part of a project intended to inform planetary exploration while studying a terrestrial desert. Over two ...
Single robot - Multiple human interaction via intelligent user interfaces
This project addresses some research issues concerning design of intelligent user interfaces for improving human-robot interaction. In some critical applications, users interact with robots via Graphical User Interfaces (GUIs), which usually contain ...
Comments