Statistical mapping between articulatory and acoustic data for an ultrasound-based silent speech interface

Hueber, Thomas; Benaroya, Elie-Laurent; Denby, Bruce; Chollet, Gérard

doi:10.21437/Interspeech.2011-239

Statistical mapping between articulatory and acoustic data for an ultrasound-based silent speech interface

Thomas Hueber, Elie-Laurent Benaroya, Bruce Denby, Gérard Chollet

This paper presents recent developments on our "silent speech interface" that converts tongue and lip motions, captured by ultrasound and video imaging, into audible speech. In our previous studies, the mapping between the observed articulatory movements and the resulting speech sound was achieved using a unit selection approach. We investigate here the use of statistical mapping techniques, based on the joint modeling of visual and spectral features, using respectively Gaussian Mixture Models (GMM) and Hidden Markov Models (HMM). The prediction of the voiced/unvoiced parameter from visual articulatory data is also investigated using an artificial neural network (ANN). A continuous speech database consisting of one-hour of high-speed ultrasound and video sequences was specifically recorded to evaluate the proposed mapping techniques.

doi: 10.21437/Interspeech.2011-239

Cite as: Hueber, T., Benaroya, E.-L., Denby, B., Chollet, G. (2011) Statistical mapping between articulatory and acoustic data for an ultrasound-based silent speech interface. Proc. Interspeech 2011, 593-596, doi: 10.21437/Interspeech.2011-239

@inproceedings{hueber11_interspeech,
  author={Thomas Hueber and Elie-Laurent Benaroya and Bruce Denby and Gérard Chollet},
  title={{Statistical mapping between articulatory and acoustic data for an ultrasound-based silent speech interface}},
  year=2011,
  booktitle={Proc. Interspeech 2011},
  pages={593--596},
  doi={10.21437/Interspeech.2011-239}
}