Speaker recognition using the resynthesized speech via spectrum modeling

Zhang, Xiang; Cao, Chuan; Yang, Lin; Suo, Hongbin; Zhang, Jianping; Yan, Yonghong

doi:10.21437/Interspeech.2010-165

Speaker recognition using the resynthesized speech via spectrum modeling

Xiang Zhang, Chuan Cao, Lin Yang, Hongbin Suo, Jianping Zhang, Yonghong Yan

In this paper, we present a new approach for speaker recognition, which uses the prosodic information calculated on the original speech to resynthesize the new speech data utilizing the spectrum modeling technique. The resynthesized data are modeled with sinusoids based on pitch, vibration amplitude and phase bias. We use the resynthesized speech data to extract cepstral features for speaker modeling and scoring in the same way as in traditional speaker recognition approaches. We then model these features using GMMs and compensate for speaker and channel variability effects using joint factor analysis. The experiments are carried out on the core condition of NIST 2008 speaker recognition evaluation data. The experimental results show that our proposed system achieves comparable performance to the state-of-the-art cepstral-based joint factor analysis system which uses the original data for speaker recognition.

doi: 10.21437/Interspeech.2010-165

Cite as: Zhang, X., Cao, C., Yang, L., Suo, H., Zhang, J., Yan, Y. (2010) Speaker recognition using the resynthesized speech via spectrum modeling. Proc. Interspeech 2010, 2142-2145, doi: 10.21437/Interspeech.2010-165

@inproceedings{zhang10d_interspeech,
  author={Xiang Zhang and Chuan Cao and Lin Yang and Hongbin Suo and Jianping Zhang and Yonghong Yan},
  title={{Speaker recognition using the resynthesized speech via spectrum modeling}},
  year=2010,
  booktitle={Proc. Interspeech 2010},
  pages={2142--2145},
  doi={10.21437/Interspeech.2010-165}
}