Skip to main content
Log in

To talk or not to talk with a computer

Taking into account the user’s focus of attention

  • Original Paper
  • Published:
Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Abstract

If no specific precautions are taken, people talking to a computer can—the same way as while talking to another human—speak aside, either to themselves or to another person. On the one hand, the computer should notice and process such utterances in a special way; on the other hand, such utterances provide us with unique data to contrast these two registers: talking vs. not talking to a computer. In this paper, we present two different databases, SmartKom and SmartWeb, and classify and analyse On-Talk (addressing the computer) vs. Off-Talk (addressing someone else)—and by that, the user’s focus of attention—found in these two databases employing uni-modal (prosodic and linguistic) features, and employing multimodal information (additional face detection).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Alexandersson J, Buschbeck-Wolf B, Fujinami T, Kipp M, Koch S, Maier E, Reithinger N, Schmitz B, Siegel M (1998) Dialogue acts in VERBMOBIL-2, 2nd edn. Verbmobil Report 226

  2. Batliner A, Buckow J, Huber R, Warnke V, Nöth E, Niemann H (1999) Prosodic feature evaluation: brute force or well designed? In: Proc ICPHS, San Francisco, pp 2315–2318

  3. Batliner A, Nutt M, Warnke V, Nöth E, Buckow J, Huber R, Niemann H (1999) Automatic annotation and classification of phrase accents in spontaneous speech. In: Proc Eurospeech, Budapest, pp 519–522

  4. Batliner A, Buckow A, Niemann H, Nöth E, Warnke V (2000) The prosody module. In: Wahlster W (ed) Verbmobil: foundations of speech-to-speech translations. Springer, Berlin, pp 106–121

    Google Scholar 

  5. Batliner A, Buckow J, Huber R, Warnke V, Nöth E, Niemann H (2001) Boiling down prosody for the classification of boundaries and accents in German and English. In: Proc Eurospeech, Aalborg, pp 2781–2784

  6. Batliner A, Zeissler V, Nöth E, Niemann H (2002) Prosodic classification of offtalk: first experiments. In: Proceedings of the 5th TSD. Springer, Berlin, pp 357–364

    Google Scholar 

  7. Batliner A, Fischer K, Huber R, Spilker J, Nöth E (2003) How to find trouble in communication. Speech Commun 40:117–143

    Article  MATH  Google Scholar 

  8. Batliner A, Hacker C, Nöth E (2006) To talk or not to talk with a computer: On-Talk vs. Off-Talk. In: Fischer K (ed) How people talk to computers, robots, and other artificial communication partners. SFB/TR 8 Report, University of Bremen, pp 79–100

  9. Batliner A, Hacker C, Kaiser M, Mögele H, Nöth E (2007) Taking into account the user’s focus of attention with the help of audio-visual information: towards less artificial human-machine-communication. In: Proceedings of AVSP 2007 (international conference on auditory-visual speech processing), Hilvarenbeek, pp 51–56

  10. Berk LE (1992) Children’s private speech: an overview of theory and the status of research. In: Diaz RM, Berk LE (eds) Private speech. From social interaction to self-regulation. Hillsdale, Erlbaum, pp 17–53

    Google Scholar 

  11. Carletta J, Dahlbäck N, Reithinger N, Walker M (1997) Standards for dialogue coding in natural language processing. Dagstuhl-Seminar-Report 167

  12. Fischer K (2006) What computer talk is and isn’t: human-computer conversation as intercultural communication. Linguistics—computational linguistics, vol 17, AQ, Saarbrücken

    Google Scholar 

  13. Fraser N, Gilbert G (1991) Simulating speech systems. CSL 5(1):81–99

    Google Scholar 

  14. Goronzy S, Mochales R, Beringer N (2006) Developing speech dialogs for multimodal HMIs using finite state machines. In: Proc ICSLP, Pittsburgh, pp 1774–1777

  15. Heylen D (2005) Challenges ahead. Head movements and other social acts in conversation. In: Proceedings of AISB—social presence cues for virtual humanoids, Hatfield, UK, pp 45–52

  16. Hönig F, Hacker C, Warnke V, Nöth E, Hornegger J, Kornhuber J (2008) Developing enabling technologies for ambient assisted living: natural language interfaces, automatic focus detection and user state recognition. In: Tagungsband zum 1. deutschen AAL (Ambient Assisted Living)-Kongress. VDE Verlag, Berlin, pp 371–375

    Google Scholar 

  17. Jovanovic N, op den Akker R (2004) Towards automatic addressee identification in multi-party dialogues. In: Strube M, Sidner C (eds) Proceedings of the 5th SIGdial workshop on discourse and dialogue. Association for Computational Linguistics, Cambridge, pp 89–92

    Google Scholar 

  18. Katzenmaier M, Stiefelhagen R, Schultz T (2004) Identifying the addressee in human-human-robot interactions based on head pose and speech. In: Proc ICMI, State College, PA, pp 144–151

  19. Klecka W (1988) Discriminant analysis, 9th edn. Sage, Thousand Oaks

    Google Scholar 

  20. Lunsford R (2004) Private speech during multimodal human-computer interaction. In: Proc ICMI, Pennsylvania, p 346

  21. Mögele H, Kaiser M, Schiel F (2006) SmartWeb UMTS speech data collection. The SmartWeb handheld corpus. In: Proc LREC, Genova, pp 2106–2111

    Google Scholar 

  22. Nöth E, Hacker C, Batliner A (2007) Does multimodality really help? The classification of emotion and of On/Off-focus in multimodal dialogues—two case studies. In: Proceedings of the 49th international symposium ELMAR-2007, Zadar, pp 9–16

  23. Oppermann D, Schiel F, Steininger S, Beringer N (2001) Off-talk—a problem for human-machine-interaction. In: Proc Eurospeech, Aalborg, pp 2197–2200

  24. Piaget J (1923) Le langage et la pensée chez l’enfant. Delachaux & Niestlé, Neuchâtel

    Google Scholar 

  25. Rehm M, André E (2005) Where do they look? Gaze behaviors of multiple users interacting with an ECA. In: Intelligent virtual agents: 5th international working conference, IVA 2005. Springer, Berlin, pp 241–252

    Google Scholar 

  26. Reithinger N, Bergweiler S, Engel R, Herzog G, Pfleger N, Romanelli M, Sonntag D (2005) A look under the hood—design and development of the first SmartWeb system demonstrator. In: Proc ICMI, Trento, pp 159–166

  27. Siepmann R, Batliner A, Oppermann D (2001) Using prosodic features to characterize off-talk in human-computer-interaction. In: Proceedings of the workshop on prosody and speech recognition, Red Bank, pp 147–150

  28. Stiefelhagen R, Yang J, Waibel A (2002) Modeling focus of attention for meeting indexing based on multiple cues. IEEE Trans Neural Netw 13:928–938. Special issue on intelligent multimedia processing, July 2002

    Article  Google Scholar 

  29. van Turnhout K, Terken J, Bakx I, Eggen B (2005) Identifying the intended addressee in mixed human-human and human-computer interaction from non-verbal features. In: Proc ICMI, New York, pp 175–182

  30. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154

    Article  Google Scholar 

  31. Vygotski L (1962) Thought and language. MIT Press, Cambridge. Original published (1934)

    Book  Google Scholar 

  32. Wahlster W (2004) Smartweb: mobile application of the semantic Web. In: GI Jahrestagung 2004, pp 26–27

  33. Wahlster W (ed) (2006) SmartKom: foundations of multimodal dialogue systems. Springer, Berlin

    Google Scholar 

  34. Wahlster W, Reithinger N, Blocher A (2001) SmartKom: multimodal communication with a life-like character. In: Proc Eurospeech, Aalborg, pp 1547–1550

  35. Watzlawick P, Beavin J, Jackson DD (1967) Pragmatics of human communications. Norton, New York

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anton Batliner.

Additional information

This work was funded by the German Federal Ministry of Education, Science, Research and Technology (BMBF) in the framework of the SmartKom project under Grant 01 IL 905 K7 and in the framework of the SmartWeb project under Grant 01 IMD 01 F. The responsibility for the contents of this study lies with the authors.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Batliner, A., Hacker, C. & Nöth, E. To talk or not to talk with a computer. J Multimodal User Interfaces 2, 171 (2008). https://doi.org/10.1007/s12193-009-0016-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12193-009-0016-6

Keywords

Navigation