ISCA Archive Interspeech 2011
ISCA Archive Interspeech 2011

Reliability-weighted acoustic model adaptation using crowd sourced transcriptions

Kartik Audhkhasi, Panayiotis G. Georgiou, Shrikanth Narayanan

This paper focuses on adaptation of acoustic models using speech transcribed by multiple noisy experts. A simple approach involves combining multiple transcripts using word frequency based Recognizer Output Voting Error Reduction (ROVER) followed by adaptation using the combined transcripts. But this assumes that the transcripts being combined are equally reliable. To overcome this assumption, we use two sets of scores to estimate this reliability. The first set is based on answers to some questions given by the transcribers. The second set is derived in an unsupervised way using the word frequency based ROVER transcripts and baseline acoustic models. The overall confidence is a convex combination of these scores and is used to perform a confidence weighted fusion. We adapt the baseline acoustic models using these combined transcripts. Recognition results for a Mexican Spanish ASR system show an absolute improvement of 0.5% in word error rate and 0.9% in sentence error rate.


doi: 10.21437/Interspeech.2011-762

Cite as: Audhkhasi, K., Georgiou, P.G., Narayanan, S. (2011) Reliability-weighted acoustic model adaptation using crowd sourced transcriptions. Proc. Interspeech 2011, 3045-3048, doi: 10.21437/Interspeech.2011-762

@inproceedings{audhkhasi11_interspeech,
  author={Kartik Audhkhasi and Panayiotis G. Georgiou and Shrikanth Narayanan},
  title={{Reliability-weighted acoustic model adaptation using crowd sourced transcriptions}},
  year=2011,
  booktitle={Proc. Interspeech 2011},
  pages={3045--3048},
  doi={10.21437/Interspeech.2011-762},
  issn={2308-457X}
}