ISCA Archive Interspeech 2017
ISCA Archive Interspeech 2017

Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human–Machine Spoken Dialog Interactions

Vikram Ramanarayanan, Patrick L. Lange, Keelan Evanini, Hillary R. Molloy, David Suendermann-Oeft

We present a spoken dialog-based framework for the computer-assisted language learning (CALL) of conversational English. In particular, we leveraged the open-source HALEF dialog framework to develop a job interview conversational application. We then used crowdsourcing to collect multiple interactions with the system from non-native English speakers. We analyzed human-rated scores of the recorded dialog data on three different scoring dimensions critical to the delivery of conversational English — fluency, pronunciation and intonation/stress — and further examined the efficacy of automatically-extracted, hand-curated speech features in predicting each of these sub-scores. Machine learning experiments showed that trained scoring models generally perform at par with the human inter-rater agreement baseline in predicting human-rated scores of conversational proficiency.


doi: 10.21437/Interspeech.2017-1213

Cite as: Ramanarayanan, V., Lange, P.L., Evanini, K., Molloy, H.R., Suendermann-Oeft, D. (2017) Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human–Machine Spoken Dialog Interactions. Proc. Interspeech 2017, 1711-1715, doi: 10.21437/Interspeech.2017-1213

@inproceedings{ramanarayanan17b_interspeech,
  author={Vikram Ramanarayanan and Patrick L. Lange and Keelan Evanini and Hillary R. Molloy and David Suendermann-Oeft},
  title={{Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human–Machine Spoken Dialog Interactions}},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1711--1715},
  doi={10.21437/Interspeech.2017-1213}
}