Human benchmarks for speaker independent large vocabulary recognition performance

Leeuwen, David A. van; Berg, Leo-Geert van den; Steeneken, Herman J. M.

doi:10.21437/Eurospeech.1995-328

Human benchmarks for speaker independent large vocabulary recognition performance

David A. van Leeuwen, Leo-Geert van den Berg, Herman J. M. Steeneken

In order to evaluate and compare the recognition performance of automatic speech recognizers and of humans, sentences were selected from the Wall Street Journal database (wsjO and WSJCAMO). Eighty sentences, spoken by native British and American speakers, were presented to three automatic speech recognizers (trained under strict conditions) and thirty human listeners for recognition. A comparison of the performance of human and machine recognition was made, resulting in average total word error rates of 2.6 % for humans (native listeners) and 12.6% for machines. The ASR systems had great difficulty with sentences with a high perplexity. Listeners tended to be more sensitive to sentence length: long sentences were more difficult to recognize than short sentences.

doi: 10.21437/Eurospeech.1995-328

Cite as: Leeuwen, D.A.v., Berg, L.-G.v.d., Steeneken, H.J.M. (1995) Human benchmarks for speaker independent large vocabulary recognition performance. Proc. 4th European Conference on Speech Communication and Technology (Eurospeech 1995), 1461-1464, doi: 10.21437/Eurospeech.1995-328

@inproceedings{leeuwen95_eurospeech,
  author={David A. van Leeuwen and Leo-Geert van den Berg and Herman J. M. Steeneken},
  title={{Human benchmarks for speaker independent large vocabulary recognition performance}},
  year=1995,
  booktitle={Proc. 4th European Conference on Speech Communication and Technology (Eurospeech 1995)},
  pages={1461--1464},
  doi={10.21437/Eurospeech.1995-328}
}