A new statistical confidence measure, Context Constrained- Generalized Posterior probability (CC-GPP), is proposed for verifying phone transcriptions in speech databases. Different from generalized posterior probability (GPP), CC-GPP is computed by considering string hypotheses that bear a focused phone with partially matched left and right contexts. Parameters used for CC-GPP include context window length, a minimal number of matched context phones, and verification thresholds. They are determined by minimizing verification errors in a development set. Evaluated on a test set of 500 sentences that consist of 2.1% phone errors, CCGPP achieves 99.6% accuracy and 78.7% recall when 90% of the phones are accepted.
Cite as: Zhang, H., Wang, L., Soong, F.K., Liu, W. (2007) Context constrained-generalized posterior probability for verifying phone transcriptions. Proc. Interspeech 2007, 1330-1333, doi: 10.21437/Interspeech.2007-407
@inproceedings{zhang07b_interspeech, author={Hua Zhang and Lijuan Wang and Frank K. Soong and Wenju Liu}, title={{Context constrained-generalized posterior probability for verifying phone transcriptions}}, year=2007, booktitle={Proc. Interspeech 2007}, pages={1330--1333}, doi={10.21437/Interspeech.2007-407} }