ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Iterative MMSE estimation of vocal tract length normalization factors for voice transformation

Daniel Erro, Eva Navas, Inma Hernáez

We present a method that determines the optimal configuration of a bilinear vocal tract length normalization function to transform the frequency axis of one voice according to a specific target voice. Given a number of parallel utterances of the involved speakers, the single parameter of this function can be calculated through an iterative procedure by minimizing an objective error measure defined in the cepstral domain. This method is also applicable when multiple warping classes are considered, and it can be complemented with amplitude correction filters. The resulting physically motivated cepstral transformation results in highly satisfactory conversion accuracy and improved quality with respect to standard satistical systems.

Index Terms: vocal tract length normalization, voice conversion, frequency warping plus amplitude scaling, speech synthesis.


doi: 10.21437/Interspeech.2012-32

Cite as: Erro, D., Navas, E., Hernáez, I. (2012) Iterative MMSE estimation of vocal tract length normalization factors for voice transformation. Proc. Interspeech 2012, 86-89, doi: 10.21437/Interspeech.2012-32

@inproceedings{erro12_interspeech,
  author={Daniel Erro and Eva Navas and Inma Hernáez},
  title={{Iterative MMSE estimation of vocal tract length normalization factors for voice transformation}},
  year=2012,
  booktitle={Proc. Interspeech 2012},
  pages={86--89},
  doi={10.21437/Interspeech.2012-32}
}