Abstract
If no specific precautions are taken, people talking to a computer can—the same way as while talking to another human—speak aside, either to themselves or to another person. On the one hand, the computer should notice and process such utterances in a special way; on the other hand, such utterances provide us with unique data to contrast these two registers: talking vs. not talking to a computer. In this paper, we present two different databases, SmartKom and SmartWeb, and classify and analyse On-Talk (addressing the computer) vs. Off-Talk (addressing someone else)—and by that, the user’s focus of attention—found in these two databases employing uni-modal (prosodic and linguistic) features, and employing multimodal information (additional face detection).
Similar content being viewed by others
References
Alexandersson J, Buschbeck-Wolf B, Fujinami T, Kipp M, Koch S, Maier E, Reithinger N, Schmitz B, Siegel M (1998) Dialogue acts in VERBMOBIL-2, 2nd edn. Verbmobil Report 226
Batliner A, Buckow J, Huber R, Warnke V, Nöth E, Niemann H (1999) Prosodic feature evaluation: brute force or well designed? In: Proc ICPHS, San Francisco, pp 2315–2318
Batliner A, Nutt M, Warnke V, Nöth E, Buckow J, Huber R, Niemann H (1999) Automatic annotation and classification of phrase accents in spontaneous speech. In: Proc Eurospeech, Budapest, pp 519–522
Batliner A, Buckow A, Niemann H, Nöth E, Warnke V (2000) The prosody module. In: Wahlster W (ed) Verbmobil: foundations of speech-to-speech translations. Springer, Berlin, pp 106–121
Batliner A, Buckow J, Huber R, Warnke V, Nöth E, Niemann H (2001) Boiling down prosody for the classification of boundaries and accents in German and English. In: Proc Eurospeech, Aalborg, pp 2781–2784
Batliner A, Zeissler V, Nöth E, Niemann H (2002) Prosodic classification of offtalk: first experiments. In: Proceedings of the 5th TSD. Springer, Berlin, pp 357–364
Batliner A, Fischer K, Huber R, Spilker J, Nöth E (2003) How to find trouble in communication. Speech Commun 40:117–143
Batliner A, Hacker C, Nöth E (2006) To talk or not to talk with a computer: On-Talk vs. Off-Talk. In: Fischer K (ed) How people talk to computers, robots, and other artificial communication partners. SFB/TR 8 Report, University of Bremen, pp 79–100
Batliner A, Hacker C, Kaiser M, Mögele H, Nöth E (2007) Taking into account the user’s focus of attention with the help of audio-visual information: towards less artificial human-machine-communication. In: Proceedings of AVSP 2007 (international conference on auditory-visual speech processing), Hilvarenbeek, pp 51–56
Berk LE (1992) Children’s private speech: an overview of theory and the status of research. In: Diaz RM, Berk LE (eds) Private speech. From social interaction to self-regulation. Hillsdale, Erlbaum, pp 17–53
Carletta J, Dahlbäck N, Reithinger N, Walker M (1997) Standards for dialogue coding in natural language processing. Dagstuhl-Seminar-Report 167
Fischer K (2006) What computer talk is and isn’t: human-computer conversation as intercultural communication. Linguistics—computational linguistics, vol 17, AQ, Saarbrücken
Fraser N, Gilbert G (1991) Simulating speech systems. CSL 5(1):81–99
Goronzy S, Mochales R, Beringer N (2006) Developing speech dialogs for multimodal HMIs using finite state machines. In: Proc ICSLP, Pittsburgh, pp 1774–1777
Heylen D (2005) Challenges ahead. Head movements and other social acts in conversation. In: Proceedings of AISB—social presence cues for virtual humanoids, Hatfield, UK, pp 45–52
Hönig F, Hacker C, Warnke V, Nöth E, Hornegger J, Kornhuber J (2008) Developing enabling technologies for ambient assisted living: natural language interfaces, automatic focus detection and user state recognition. In: Tagungsband zum 1. deutschen AAL (Ambient Assisted Living)-Kongress. VDE Verlag, Berlin, pp 371–375
Jovanovic N, op den Akker R (2004) Towards automatic addressee identification in multi-party dialogues. In: Strube M, Sidner C (eds) Proceedings of the 5th SIGdial workshop on discourse and dialogue. Association for Computational Linguistics, Cambridge, pp 89–92
Katzenmaier M, Stiefelhagen R, Schultz T (2004) Identifying the addressee in human-human-robot interactions based on head pose and speech. In: Proc ICMI, State College, PA, pp 144–151
Klecka W (1988) Discriminant analysis, 9th edn. Sage, Thousand Oaks
Lunsford R (2004) Private speech during multimodal human-computer interaction. In: Proc ICMI, Pennsylvania, p 346
Mögele H, Kaiser M, Schiel F (2006) SmartWeb UMTS speech data collection. The SmartWeb handheld corpus. In: Proc LREC, Genova, pp 2106–2111
Nöth E, Hacker C, Batliner A (2007) Does multimodality really help? The classification of emotion and of On/Off-focus in multimodal dialogues—two case studies. In: Proceedings of the 49th international symposium ELMAR-2007, Zadar, pp 9–16
Oppermann D, Schiel F, Steininger S, Beringer N (2001) Off-talk—a problem for human-machine-interaction. In: Proc Eurospeech, Aalborg, pp 2197–2200
Piaget J (1923) Le langage et la pensée chez l’enfant. Delachaux & Niestlé, Neuchâtel
Rehm M, André E (2005) Where do they look? Gaze behaviors of multiple users interacting with an ECA. In: Intelligent virtual agents: 5th international working conference, IVA 2005. Springer, Berlin, pp 241–252
Reithinger N, Bergweiler S, Engel R, Herzog G, Pfleger N, Romanelli M, Sonntag D (2005) A look under the hood—design and development of the first SmartWeb system demonstrator. In: Proc ICMI, Trento, pp 159–166
Siepmann R, Batliner A, Oppermann D (2001) Using prosodic features to characterize off-talk in human-computer-interaction. In: Proceedings of the workshop on prosody and speech recognition, Red Bank, pp 147–150
Stiefelhagen R, Yang J, Waibel A (2002) Modeling focus of attention for meeting indexing based on multiple cues. IEEE Trans Neural Netw 13:928–938. Special issue on intelligent multimedia processing, July 2002
van Turnhout K, Terken J, Bakx I, Eggen B (2005) Identifying the intended addressee in mixed human-human and human-computer interaction from non-verbal features. In: Proc ICMI, New York, pp 175–182
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154
Vygotski L (1962) Thought and language. MIT Press, Cambridge. Original published (1934)
Wahlster W (2004) Smartweb: mobile application of the semantic Web. In: GI Jahrestagung 2004, pp 26–27
Wahlster W (ed) (2006) SmartKom: foundations of multimodal dialogue systems. Springer, Berlin
Wahlster W, Reithinger N, Blocher A (2001) SmartKom: multimodal communication with a life-like character. In: Proc Eurospeech, Aalborg, pp 1547–1550
Watzlawick P, Beavin J, Jackson DD (1967) Pragmatics of human communications. Norton, New York
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was funded by the German Federal Ministry of Education, Science, Research and Technology (BMBF) in the framework of the SmartKom project under Grant 01 IL 905 K7 and in the framework of the SmartWeb project under Grant 01 IMD 01 F. The responsibility for the contents of this study lies with the authors.
Rights and permissions
About this article
Cite this article
Batliner, A., Hacker, C. & Nöth, E. To talk or not to talk with a computer. J Multimodal User Interfaces 2, 171 (2008). https://doi.org/10.1007/s12193-009-0016-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12193-009-0016-6