Duration and pronunciation conditioned lexical modeling for speaker verification

Tur, Gokhan; Shriberg, Elizabeth; Stolcke, Andreas; Kajarekar, Sachin

doi:10.21437/Interspeech.2007-172

Duration and pronunciation conditioned lexical modeling for speaker verification

Gokhan Tur, Elizabeth Shriberg, Andreas Stolcke, Sachin Kajarekar

We propose a method to improve speaker recognition lexical model performance using acoustic-prosodic information. More specifically, the lexical model is trained using duration- and pronunciation-conditioned word N-grams, simultaneously modeling lexical information along with their acoustic and prosodic characteristics. Support vector machines are used for modeling and scoring, with N-gram frequency vectors serving as features. Experimental results using NIST Speaker Recognition Evaluation data sets show that this method outperforms the regular word N-gram-based lexical models. Furthermore, our approach gives additional information when combined with a high-accuracy acoustic speaker model. We believe that this is a promising step toward integrated speaker recognition models that combine multiple types of high-level features.

doi: 10.21437/Interspeech.2007-172

Cite as: Tur, G., Shriberg, E., Stolcke, A., Kajarekar, S. (2007) Duration and pronunciation conditioned lexical modeling for speaker verification. Proc. Interspeech 2007, 2049-2052, doi: 10.21437/Interspeech.2007-172

@inproceedings{tur07_interspeech,
  author={Gokhan Tur and Elizabeth Shriberg and Andreas Stolcke and Sachin Kajarekar},
  title={{Duration and pronunciation conditioned lexical modeling for speaker verification}},
  year=2007,
  booktitle={Proc. Interspeech 2007},
  pages={2049--2052},
  doi={10.21437/Interspeech.2007-172}
}