Maximum likelihood unit selection for corpus-based speech synthesis

Rosales, Abubeker Gamboa; Rosales, Hamurabi Gamboa; Hoffmann, Ruediger

doi:10.21437/Interspeech.2009-253

Maximum likelihood unit selection for corpus-based speech synthesis

Abubeker Gamboa Rosales, Hamurabi Gamboa Rosales, Ruediger Hoffmann

Corpus-based speech synthesis systems deliver a considerable synthesis quality since the unit selection approaches have been optimized in the last decade. Unit selection attempts to find the best combination of speech unit sequences in an inventory so that the perceptual differences between expected (natural) and synthesized signals are as low as possible. However, mismatches and distortions are still possible in concatenative speech synthesis and they are normally perceptible in the synthesized waveform. Therefore, unit selection strategies and parameter tuning are still important issues in the improvement of speech synthesis. We present a novel concept to increase the efficiency of the exhaustive speech unit search within the inventory via a unit selection model. This model bases its operation on a mapping analysis of the concatenation sub-costs, a Bayes optimal classification (BOC), and a Maximum likelihood selection (MLS). The principle advantage of the proposed unit selection method is that it does not require an exhaustive training to set up weighted coefficients for target and concatenation sub-costs. It provides an alternative for unit selection but requires further optimization, e. g. by integrating target cost mapping.

doi: 10.21437/Interspeech.2009-253

Cite as: Rosales, A.G., Rosales, H.G., Hoffmann, R. (2009) Maximum likelihood unit selection for corpus-based speech synthesis. Proc. Interspeech 2009, 748-751, doi: 10.21437/Interspeech.2009-253

@inproceedings{rosales09_interspeech,
  author={Abubeker Gamboa Rosales and Hamurabi Gamboa Rosales and Ruediger Hoffmann},
  title={{Maximum likelihood unit selection for corpus-based speech synthesis}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={748--751},
  doi={10.21437/Interspeech.2009-253}
}