ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Cross-speaker acoustic-to-articulatory inversion using phone-based trajectory HMM for pronunciation training

Thomas Hueber, Atef Ben-Youssef, Gérard Bailly, Pierre Badin, Frédéric Elisei

The article presents a statistical mapping approach for cross-speaker acoustic-to-articulatory inversion. The goal is to estimate the most likely articulatory trajectories for a reference speaker from the speech audio signal of another speaker. This approach is developed in the framework of our system of visual articulatory feedback developed for computer-assisted pronunciation training applications (CAPT). The proposed technique is based on the joint modeling of articulatory and acoustic features, for each phonetic class, using full-covariance trajectory HMM. The acousticto- articulatory inversion is achieved in 2 steps: 1) finding the most likely HMM state sequence from the acoustic observations; 2) inferring the articulatory trajectories from both the decoded state sequence and the acoustic observations. The problem of speaker adaptation is addressed using a voice conversion approach, based on trajectory GMM.

Index Terms: acoustic-to-articulatory inversion, intelligent tutoring systems, pronunciation training, trajectory HMM, voice conversion, talking head


doi: 10.21437/Interspeech.2012-240

Cite as: Hueber, T., Ben-Youssef, A., Bailly, G., Badin, P., Elisei, F. (2012) Cross-speaker acoustic-to-articulatory inversion using phone-based trajectory HMM for pronunciation training. Proc. Interspeech 2012, 783-786, doi: 10.21437/Interspeech.2012-240

@inproceedings{hueber12b_interspeech,
  author={Thomas Hueber and Atef Ben-Youssef and Gérard Bailly and Pierre Badin and Frédéric Elisei},
  title={{Cross-speaker acoustic-to-articulatory inversion using phone-based trajectory HMM for pronunciation training}},
  year=2012,
  booktitle={Proc. Interspeech 2012},
  pages={783--786},
  doi={10.21437/Interspeech.2012-240}
}