ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Normalization of ASR confidence classifier scores via confidence mapping

Kshitiz Kumar, Chaojun Liu, Yifan Gong

Speech recognition confidence classifier (CC) score quantitatively represents the correctness of decoded utterances in a [0,1] range. We associate an operating threshold with the classifier and accept recognitions with scores greater than the threshold. Speech developers may set their own threshold but often an acoustic model (AM) or CC update alters the correct-accept (CA) vs. false-accept (FA) profile, necessitating a threshold reselection. This is specifically a problem when, (a) threshold is hardcoded with a shipped hardware or software, (b) developers may not have expertise for threshold tuning, (c) tuning isn't cost-effective and may need to be done often. To our knowledge, our work is the first to present this practical and interesting problem of avoiding threshold reselection and proposes novel confidence-mapping-based techniques to improve or retain both CA and FA at previously set thresholds. We propose and evaluate, (a) histogram-based mapping, (b) polynomial-fitting, (c) tanh-fitting, based methods to map confidences associated with false-recognitions and discuss their issues and benefits. In our tests, all of the above mapping methods fix the mean regression in CA of 21% to a gain to 1–2%, with tanh-mapping providing the best CA and FA tradeoff in our tests.


doi: 10.21437/Interspeech.2014-303

Cite as: Kumar, K., Liu, C., Gong, Y. (2014) Normalization of ASR confidence classifier scores via confidence mapping. Proc. Interspeech 2014, 1199-1203, doi: 10.21437/Interspeech.2014-303

@inproceedings{kumar14_interspeech,
  author={Kshitiz Kumar and Chaojun Liu and Yifan Gong},
  title={{Normalization of ASR confidence classifier scores via confidence mapping}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={1199--1203},
  doi={10.21437/Interspeech.2014-303}
}