We present a spoken dialog-based framework for the computer-assisted language learning (CALL) of conversational English. In particular, we leveraged the open-source HALEF dialog framework to develop a job interview conversational application. We then used crowdsourcing to collect multiple interactions with the system from non-native English speakers. We analyzed human-rated scores of the recorded dialog data on three different scoring dimensions critical to the delivery of conversational English — fluency, pronunciation and intonation/stress — and further examined the efficacy of automatically-extracted, hand-curated speech features in predicting each of these sub-scores. Machine learning experiments showed that trained scoring models generally perform at par with the human inter-rater agreement baseline in predicting human-rated scores of conversational proficiency.
Cite as: Ramanarayanan, V., Lange, P.L., Evanini, K., Molloy, H.R., Suendermann-Oeft, D. (2017) Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human–Machine Spoken Dialog Interactions. Proc. Interspeech 2017, 1711-1715, doi: 10.21437/Interspeech.2017-1213
@inproceedings{ramanarayanan17b_interspeech, author={Vikram Ramanarayanan and Patrick L. Lange and Keelan Evanini and Hillary R. Molloy and David Suendermann-Oeft}, title={{Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human–Machine Spoken Dialog Interactions}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={1711--1715}, doi={10.21437/Interspeech.2017-1213} }