The article presents a statistical mapping approach for cross-speaker acoustic-to-articulatory inversion. The goal is to estimate the most likely articulatory trajectories for a reference speaker from the speech audio signal of another speaker. This approach is developed in the framework of our system of visual articulatory feedback developed for computer-assisted pronunciation training applications (CAPT). The proposed technique is based on the joint modeling of articulatory and acoustic features, for each phonetic class, using full-covariance trajectory HMM. The acousticto- articulatory inversion is achieved in 2 steps: 1) finding the most likely HMM state sequence from the acoustic observations; 2) inferring the articulatory trajectories from both the decoded state sequence and the acoustic observations. The problem of speaker adaptation is addressed using a voice conversion approach, based on trajectory GMM.
Index Terms: acoustic-to-articulatory inversion, intelligent tutoring systems, pronunciation training, trajectory HMM, voice conversion, talking head
Cite as: Hueber, T., Ben-Youssef, A., Bailly, G., Badin, P., Elisei, F. (2012) Cross-speaker acoustic-to-articulatory inversion using phone-based trajectory HMM for pronunciation training. Proc. Interspeech 2012, 783-786, doi: 10.21437/Interspeech.2012-240
@inproceedings{hueber12b_interspeech, author={Thomas Hueber and Atef Ben-Youssef and Gérard Bailly and Pierre Badin and Frédéric Elisei}, title={{Cross-speaker acoustic-to-articulatory inversion using phone-based trajectory HMM for pronunciation training}}, year=2012, booktitle={Proc. Interspeech 2012}, pages={783--786}, doi={10.21437/Interspeech.2012-240} }