ABSTRACT
We review key challenges of developing spoken dialog systems that can engage in interactions with one or multiple participants in relatively unconstrained environments. We outline a set of core competencies for open-world dialog, and describe three prototype systems. The systems are built on a common underlying conversational framework which integrates an array of predictive models and component technologies, including speech recognition, head and pose tracking, probabilistic models for scene analysis, multiparty engagement and turn taking, and inferences about user goals and activities. We discuss the current models and showcase their function by means of a sample recorded interaction, and we review results from an observational study of open-world, multiparty dialog in the wild.
- M. Argyle. Bodily Communication, International University Press, Inc, New York (1975).Google Scholar
- D. Bohus and E. Horvitz, Learning to Predict Engagement with a Spoken Dialog System in Open-World Settings, in Proceedings of SIGdial'09, London, UK (2009) Google ScholarDigital Library
- D. Bohus and E. Horvitz, Models for Multiparty Engagement in Open-World Dialog, in Proceedings of SIGdial'09, London, UK (2009) Google ScholarDigital Library
- D. Bohus and A. Rudnicky. The RavenClaw Dialog Management Framework: Architecture and Systems, Computer Speech and Language, DOI:10.1016/j.csl.2008.10.001 Google ScholarCross Ref
- R. Cole. Tools for Research and Education in Speech Science, in Proceedings of International Conference of Phonetic Sciences, San Francisco, CA (1999)Google Scholar
- G. Ferguson, and J. Allen. TRIPS: An Intelligent Integrated Problem-Solving Assistant, in Proceedings of AAAI'98, Madison, WI (1998) Google ScholarDigital Library
- E. Goffman, Behaviour in public places: notes on the social order of gatherings, The Free Press, New York (1963)Google Scholar
- E. Horvitz. Reflections on Challenges and Promises of Mixed-Initiative Interaction, in AI Magazine vol. 28, Number 2 (2007)Google Scholar
- E. Horvitz, P. Koch, C.M. Kadie, and A. Jacobs. Coordinate: Probabilistic Forecasting of Presence and Availability, in Proceedings of UAI '02, Edmonton, Canada (2002). Google ScholarDigital Library
- E. Horvitz, J. Apacible, and P. Koch. BusyBody: Creating and Fielding Personalized Models of the Cost of Interruption, in Proceedings of CSCW, ACM Press, (2004). Google ScholarDigital Library
- J. Jaffe and S. Feldstein. Rhythms of Dialogue, Academic Press (1970)Google Scholar
- A. Kendon. Conducting Interaction: Patterns of Behavior in Focused Encounters, Studies in International Sociolinguistics, Cambridge University Press (1990)Google Scholar
- F. Kronlid. Steps towards Multi-Party Dialogue Management, Ph.D. Thesis, University of Gothenburg (2008)Google Scholar
- S. Larsson. Issue-based dialog management, Goteborg University, Ph.D. Thesis (2002)Google Scholar
- C. Peters, C. Pelachaud, E. Bevacqua, and M. Mancini, "A model of attention and interest using gaze behavior", Lecture Notes in Computer Science, pp. 229--240, 2005. Google ScholarDigital Library
- A. Raux and M. Eskenazi. Optimizing Endpointing Thresholds using Dialogue Features in a Spoken Dialogue System, in Procs SIGdial'08, Columbus, OH (2008) Google ScholarDigital Library
- C. Rich, C. Sidner, and N. Lesh. COLLAGEN: Applying Collaborative Discourse Theory to Human-Computer Interaction, in AI Magazine. 22:15--25 (2001) Google ScholarDigital Library
- H. Sacks, A. Schegloff, G. Jefferson. A simplest systematic for the organization of turn-taking for conversation. Language, 50(4):696--735 (1974).Google ScholarCross Ref
- C. Sidner and C. Lee. Engagement rules for human-robot collaborative interactions, in IEEE International Conference on Systems, Man and Cybernetics, Vol 4, 3957--3962, (2003)Google Scholar
- Situated Interaction Project page: http://research.microsoft.com/en-us/um/people/dbohus/research_situated_interaction.htmlGoogle Scholar
- K. R. Thórisson. A Mind Model for Multimodal Communicative Creatures and Humanoids, in International Journal of Applied Artificial Intelligence, 13(4-5): 449--486 (1999)Google ScholarCross Ref
- K. R. Thórisson. Natural Turn-Taking Needs No Manual: Computational Theory and Model, From Perception to Action, in Multimodality in Language and Speech Systems, 173--207, Kluwer Academic Publishers (2003)Google Scholar
- D. Traum and J. Rickel. Embodied Agents for Multi-party Dialogue, in Immersive Virtual Worlds, AAMAS'02, pp 766--773 (2002) Google ScholarDigital Library
- Q. Wang, W. Zhang, X. Tang and H. Shum. Real-Time Bayesian 3-D Pose Tracking, in IEEE Trans. CSVT, vol. 16, no.12, pp. 1533--1541 (2006) Google ScholarDigital Library
- C. Zhang, and Y. Rui. Robust Visual Tracking via Pixel Classification and Integration, in ICPR'2006, Hong Kong, China (2006) Google ScholarDigital Library
Index Terms
Dialog in the open world: platform and applications
Recommendations
Facilitating multiparty dialog with gaze, gesture, and speech
ICMI-MLMI '10: International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal InteractionWe study how synchronized gaze, gesture and speech rendered by an embodied conversational agent can influence the flow of conversations in multiparty settings. We begin by reviewing a computational framework for turn-taking that provides the foundation ...
Decisions about turns in multiparty conversation: from perception to action
ICMI '11: Proceedings of the 13th international conference on multimodal interfacesWe present a decision-theoretic approach for guiding turn taking in a spoken dialog system operating in multiparty settings. The proposed methodology couples inferences about multiparty conversational dynamics with assessed costs of different outcomes, ...
Attention and Gaze in Situated Language Interaction
GazeIn '14: Proceedings of the 7th Workshop on Eye Gaze in Intelligent Human Machine Interaction: Eye-Gaze & MultimodalityThe ability to engage in natural language interaction in physically situated settings hinges on a set of competencies such as managing conversational engagement, turn taking, understanding, language and behavior generation, and interaction planning. In ...
Comments