To talk or not to talk with a computer

Batliner, Anton; Hacker, Christian; Nöth, Elmar

doi:10.1007/s12193-009-0016-6

To talk or not to talk with a computer

Taking into account the user’s focus of attention

Original Paper
Published: 26 August 2009

Volume 2, article number 171, (2008)
Cite this article

Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Anton Batliner¹,
Christian Hacker² &
Elmar Nöth¹

141 Accesses
15 Citations
Explore all metrics

Abstract

If no specific precautions are taken, people talking to a computer can—the same way as while talking to another human—speak aside, either to themselves or to another person. On the one hand, the computer should notice and process such utterances in a special way; on the other hand, such utterances provide us with unique data to contrast these two registers: talking vs. not talking to a computer. In this paper, we present two different databases, SmartKom and SmartWeb, and classify and analyse On-Talk (addressing the computer) vs. Off-Talk (addressing someone else)—and by that, the user’s focus of attention—found in these two databases employing uni-modal (prosodic and linguistic) features, and employing multimodal information (additional face detection).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review on face recognition systems: recent approaches and challenges

Article 30 July 2020

Muhtahir O. Oloyede, Gerhard P. Hancke & Hermanus C. Myburgh

Real-Time Human Pose Detection and Recognition Using MediaPipe

Face detection techniques: a review

Article 04 August 2018

Ashu Kumar, Amandeep Kaur & Munish Kumar

References

Alexandersson J, Buschbeck-Wolf B, Fujinami T, Kipp M, Koch S, Maier E, Reithinger N, Schmitz B, Siegel M (1998) Dialogue acts in VERBMOBIL-2, 2nd edn. Verbmobil Report 226
Batliner A, Buckow J, Huber R, Warnke V, Nöth E, Niemann H (1999) Prosodic feature evaluation: brute force or well designed? In: Proc ICPHS, San Francisco, pp 2315–2318
Batliner A, Nutt M, Warnke V, Nöth E, Buckow J, Huber R, Niemann H (1999) Automatic annotation and classification of phrase accents in spontaneous speech. In: Proc Eurospeech, Budapest, pp 519–522
Batliner A, Buckow A, Niemann H, Nöth E, Warnke V (2000) The prosody module. In: Wahlster W (ed) Verbmobil: foundations of speech-to-speech translations. Springer, Berlin, pp 106–121
Google Scholar
Batliner A, Buckow J, Huber R, Warnke V, Nöth E, Niemann H (2001) Boiling down prosody for the classification of boundaries and accents in German and English. In: Proc Eurospeech, Aalborg, pp 2781–2784
Batliner A, Zeissler V, Nöth E, Niemann H (2002) Prosodic classification of offtalk: first experiments. In: Proceedings of the 5th TSD. Springer, Berlin, pp 357–364
Google Scholar
Batliner A, Fischer K, Huber R, Spilker J, Nöth E (2003) How to find trouble in communication. Speech Commun 40:117–143
Article MATH Google Scholar
Batliner A, Hacker C, Nöth E (2006) To talk or not to talk with a computer: On-Talk vs. Off-Talk. In: Fischer K (ed) How people talk to computers, robots, and other artificial communication partners. SFB/TR 8 Report, University of Bremen, pp 79–100
Batliner A, Hacker C, Kaiser M, Mögele H, Nöth E (2007) Taking into account the user’s focus of attention with the help of audio-visual information: towards less artificial human-machine-communication. In: Proceedings of AVSP 2007 (international conference on auditory-visual speech processing), Hilvarenbeek, pp 51–56
Berk LE (1992) Children’s private speech: an overview of theory and the status of research. In: Diaz RM, Berk LE (eds) Private speech. From social interaction to self-regulation. Hillsdale, Erlbaum, pp 17–53
Google Scholar
Carletta J, Dahlbäck N, Reithinger N, Walker M (1997) Standards for dialogue coding in natural language processing. Dagstuhl-Seminar-Report 167
Fischer K (2006) What computer talk is and isn’t: human-computer conversation as intercultural communication. Linguistics—computational linguistics, vol 17, AQ, Saarbrücken
Google Scholar
Fraser N, Gilbert G (1991) Simulating speech systems. CSL 5(1):81–99
Google Scholar
Goronzy S, Mochales R, Beringer N (2006) Developing speech dialogs for multimodal HMIs using finite state machines. In: Proc ICSLP, Pittsburgh, pp 1774–1777
Heylen D (2005) Challenges ahead. Head movements and other social acts in conversation. In: Proceedings of AISB—social presence cues for virtual humanoids, Hatfield, UK, pp 45–52
Hönig F, Hacker C, Warnke V, Nöth E, Hornegger J, Kornhuber J (2008) Developing enabling technologies for ambient assisted living: natural language interfaces, automatic focus detection and user state recognition. In: Tagungsband zum 1. deutschen AAL (Ambient Assisted Living)-Kongress. VDE Verlag, Berlin, pp 371–375
Google Scholar
Jovanovic N, op den Akker R (2004) Towards automatic addressee identification in multi-party dialogues. In: Strube M, Sidner C (eds) Proceedings of the 5th SIGdial workshop on discourse and dialogue. Association for Computational Linguistics, Cambridge, pp 89–92
Google Scholar
Katzenmaier M, Stiefelhagen R, Schultz T (2004) Identifying the addressee in human-human-robot interactions based on head pose and speech. In: Proc ICMI, State College, PA, pp 144–151
Klecka W (1988) Discriminant analysis, 9th edn. Sage, Thousand Oaks
Google Scholar
Lunsford R (2004) Private speech during multimodal human-computer interaction. In: Proc ICMI, Pennsylvania, p 346
Mögele H, Kaiser M, Schiel F (2006) SmartWeb UMTS speech data collection. The SmartWeb handheld corpus. In: Proc LREC, Genova, pp 2106–2111
Google Scholar
Nöth E, Hacker C, Batliner A (2007) Does multimodality really help? The classification of emotion and of On/Off-focus in multimodal dialogues—two case studies. In: Proceedings of the 49th international symposium ELMAR-2007, Zadar, pp 9–16
Oppermann D, Schiel F, Steininger S, Beringer N (2001) Off-talk—a problem for human-machine-interaction. In: Proc Eurospeech, Aalborg, pp 2197–2200
Piaget J (1923) Le langage et la pensée chez l’enfant. Delachaux & Niestlé, Neuchâtel
Google Scholar
Rehm M, André E (2005) Where do they look? Gaze behaviors of multiple users interacting with an ECA. In: Intelligent virtual agents: 5th international working conference, IVA 2005. Springer, Berlin, pp 241–252
Google Scholar
Reithinger N, Bergweiler S, Engel R, Herzog G, Pfleger N, Romanelli M, Sonntag D (2005) A look under the hood—design and development of the first SmartWeb system demonstrator. In: Proc ICMI, Trento, pp 159–166
Siepmann R, Batliner A, Oppermann D (2001) Using prosodic features to characterize off-talk in human-computer-interaction. In: Proceedings of the workshop on prosody and speech recognition, Red Bank, pp 147–150
Stiefelhagen R, Yang J, Waibel A (2002) Modeling focus of attention for meeting indexing based on multiple cues. IEEE Trans Neural Netw 13:928–938. Special issue on intelligent multimedia processing, July 2002
Article Google Scholar
van Turnhout K, Terken J, Bakx I, Eggen B (2005) Identifying the intended addressee in mixed human-human and human-computer interaction from non-verbal features. In: Proc ICMI, New York, pp 175–182
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154
Article Google Scholar
Vygotski L (1962) Thought and language. MIT Press, Cambridge. Original published (1934)
Book Google Scholar
Wahlster W (2004) Smartweb: mobile application of the semantic Web. In: GI Jahrestagung 2004, pp 26–27
Wahlster W (ed) (2006) SmartKom: foundations of multimodal dialogue systems. Springer, Berlin
Google Scholar
Wahlster W, Reithinger N, Blocher A (2001) SmartKom: multimodal communication with a life-like character. In: Proc Eurospeech, Aalborg, pp 1547–1550
Watzlawick P, Beavin J, Jackson DD (1967) Pragmatics of human communications. Norton, New York
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Friedrich-Alexander-University Erlangen-Nuremberg, Martensstr. 3, 91058, Erlangen, Germany
Anton Batliner & Elmar Nöth
Elektrobit Automotive GmbH, Erlangen, Germany
Christian Hacker

Authors

Anton Batliner
View author publications
You can also search for this author in PubMed Google Scholar
Christian Hacker
View author publications
You can also search for this author in PubMed Google Scholar
Elmar Nöth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anton Batliner.

Additional information

This work was funded by the German Federal Ministry of Education, Science, Research and Technology (BMBF) in the framework of the SmartKom project under Grant 01 IL 905 K7 and in the framework of the SmartWeb project under Grant 01 IMD 01 F. The responsibility for the contents of this study lies with the authors.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Batliner, A., Hacker, C. & Nöth, E. To talk or not to talk with a computer. J Multimodal User Interfaces 2, 171 (2008). https://doi.org/10.1007/s12193-009-0016-6

Download citation

Received: 04 July 2008
Accepted: 10 July 2009
Published: 26 August 2009
DOI: https://doi.org/10.1007/s12193-009-0016-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

To talk or not to talk with a computer

Abstract

Access this article

Similar content being viewed by others

A review on face recognition systems: recent approaches and challenges

Real-Time Human Pose Detection and Recognition Using MediaPipe

Face detection techniques: a review

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

To talk or not to talk with a computer

Abstract

Access this article

Similar content being viewed by others

A review on face recognition systems: recent approaches and challenges

Real-Time Human Pose Detection and Recognition Using MediaPipe

Face detection techniques: a review

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation