Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human–Machine Spoken Dialog Interactions

Ramanarayanan, Vikram; Lange, Patrick L.; Evanini, Keelan; Molloy, Hillary R.; Suendermann-Oeft, David

doi:10.21437/Interspeech.2017-1213

Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human–Machine Spoken Dialog Interactions

Vikram Ramanarayanan, Patrick L. Lange, Keelan Evanini, Hillary R. Molloy, David Suendermann-Oeft

We present a spoken dialog-based framework for the computer-assisted language learning (CALL) of conversational English. In particular, we leveraged the open-source HALEF dialog framework to develop a job interview conversational application. We then used crowdsourcing to collect multiple interactions with the system from non-native English speakers. We analyzed human-rated scores of the recorded dialog data on three different scoring dimensions critical to the delivery of conversational English — fluency, pronunciation and intonation/stress — and further examined the efficacy of automatically-extracted, hand-curated speech features in predicting each of these sub-scores. Machine learning experiments showed that trained scoring models generally perform at par with the human inter-rater agreement baseline in predicting human-rated scores of conversational proficiency.

doi: 10.21437/Interspeech.2017-1213

Cite as: Ramanarayanan, V., Lange, P.L., Evanini, K., Molloy, H.R., Suendermann-Oeft, D. (2017) Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human–Machine Spoken Dialog Interactions. Proc. Interspeech 2017, 1711-1715, doi: 10.21437/Interspeech.2017-1213

@inproceedings{ramanarayanan17b_interspeech,
  author={Vikram Ramanarayanan and Patrick L. Lange and Keelan Evanini and Hillary R. Molloy and David Suendermann-Oeft},
  title={{Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human–Machine Spoken Dialog Interactions}},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1711--1715},
  doi={10.21437/Interspeech.2017-1213}
}