Skip to main content
Log in

Optimal speech motor control and token-to-token variability: a Bayesian modeling approach

  • Original Article
  • Published:
Biological Cybernetics Aims and scope Submit manuscript

Abstract

The remarkable capacity of the speech motor system to adapt to various speech conditions is due to an excess of degrees of freedom, which enables producing similar acoustical properties with different sets of control strategies. To explain how the central nervous system selects one of the possible strategies, a common approach, in line with optimal motor control theories, is to model speech motor planning as the solution of an optimality problem based on cost functions. Despite the success of this approach, one of its drawbacks is the intrinsic contradiction between the concept of optimality and the observed experimental intra-speaker token-to-token variability. The present paper proposes an alternative approach by formulating feedforward optimal control in a probabilistic Bayesian modeling framework. This is illustrated by controlling a biomechanical model of the vocal tract for speech production and by comparing it with an existing optimal control model (GEPPETO). The essential elements of this optimal control model are presented first. From them the Bayesian model is constructed in a progressive way. Performance of the Bayesian model is evaluated based on computer simulations and compared to the optimal control model. This approach is shown to be appropriate for solving the speech planning problem while accounting for variability in a principled way.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. Muscle force F generated by the biomechanical model is specified as

    $$\begin{aligned} F=\rho [\exp (cA)-1], \end{aligned}$$
    (1)

    where c is a form parameter accounting for the gain of the feedback from the muscle to the motoneurons pool and \(\rho \) a magnitude parameter directly related to force-generating capability. A is the muscle activation corresponding to

    $$\begin{aligned} A=l-\lambda +\mu \dot{l}, \end{aligned}$$
    (2)

    where l is the actual muscle length, \(\dot{l}\) the muscle shortening or lengthening velocity and \(\mu \) a damping coefficient due to proprioceptive feedback (Payan and Perrier 1997).

  2. For simplicity, the main text presents the case of sequences of 3 phonemes, without loss of generality. For a general n-phoneme sequence, the proposed cost function would correspond to the perimeter of the corresponding \((n-1)\)-simplex defined by the n control variables in the six-dimensional control space. For the present three-phoneme case, the 2-simplex corresponds to the triangle introduced in the text. Rigorously, influence of every phoneme of the sequence on every other one would be rather modeled by a cost function involving distances between every pair of phonemes. In order to avoid the corresponding quadratic combinatorial growth of the number of terms in the cost function, its definition has been simplified into the one presented here.

References

  • Attias H (2003) Planning by probabilistic inference. In: Bishop CM, Frey BJ (eds) Proceedings of the ninth international workshop on artificial intelligence and statistics, Key West

  • Bessière P, Laugier C, Siegwart R (eds) (2008) Probabilistic reasoning and decision making in sensory-motor systems. Springer tracts in advanced robotics, vol 46. Springer, Berlin

  • Bessière P, Mazer E, Ahuactzin JM, Mekhnacha K (2013) Bayesian programming. CRC Press, Boca Raton

    Google Scholar 

  • Boutilier C, Dean T, Hanks S (1999) Decision theoretic planning: structural assumptions and computational leverage. J Artif Intell Res 10:1–94

    Google Scholar 

  • Bowers JS, Davis CJ (2012) Bayesian just-so stories in psychology and neuroscience. Psychol Bull 138(3):389–414

    Article  PubMed  Google Scholar 

  • Brown LD (1981) A complete class theorem for statistical problems with finite sample spaces. Ann Stat 9(6):1289–1300

    Article  Google Scholar 

  • Calliope (1984) La parole et son traitement automatique. Masson, Paris

  • Colas F, Diard J, Bessière P (2010) Common bayesian models for common cognitive issues. Acta Biotheor 58(2–3):191–216

    Article  PubMed  Google Scholar 

  • Daunizeau J, den Ouden HEM, Pessiglione M, Kiebel SJ, Stephan KE, Friston KJ (2010) Observing the observer (I): meta-bayesian models of learning and decision-making. PLoS One 5(12):e15554

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Feldman AG (1986) Once more on the equilibrium-point hypothesis (\(\lambda \) model) for motor control. J Mot Behav 18(1):17–54

    Article  CAS  PubMed  Google Scholar 

  • Friston K (2010) The free-energy principle: a unified brain theory? Nat Rev Neurosci 11(2):127–138

    Article  CAS  PubMed  Google Scholar 

  • Friston K (2011) What is optimal about motor control? Neuron 72(3):488–498

    Article  CAS  PubMed  Google Scholar 

  • Friston KJ, Frith CD (2015) Active inference, communication and hermeneutics. Cortex 68:129–143

    Article  PubMed Central  PubMed  Google Scholar 

  • Friston KJ, Daunizeau J, Kiebel SJ (2009) Reinforcement learning or active inference? PLoS One 4(7):e6421

    Article  PubMed Central  PubMed  Google Scholar 

  • Friston K, Mattout J, Kilner J (2011) Action understanding and active inference. Biol Cybern 104(1–2):137–160

    Article  PubMed Central  PubMed  Google Scholar 

  • Friston K, Samothrakis S, Montague R (2012) Active inference and agency: optimal control without cost functions. Biol Cybern 106(8–9):523–541

    Article  PubMed  Google Scholar 

  • Ganesh G, Haruno M, Kawato M, Burdet E (2010) Motor memory and local minimization of error and effort, not global optimization, determine motor behavior. J Neurophysiol 104(1):382–390

    Article  CAS  PubMed  Google Scholar 

  • Goodman ND, Mansinghka VK, Roy DM, Bonawitz K, Tenenbaum JB (2008) Church: a language for generative models. In: Proceedings of the 24th conference on uncertainty in artificial intelligence, vol 22, p 23

  • Gordon AD, Henzinger TA, Nori AV, Rajamani SK (2014) Probabilistic programming. In: Proceedings of the 36th international conference on software engineering (ICSE 2014, Future of Software Engineering track). ACM, New York, pp 167–181

  • Guenther FH (1995) Speech sound acquisition, coarticulation, and rate effects in a neural network model of speech production. Psychol Rev 102(3):594–621

    Article  CAS  PubMed  Google Scholar 

  • Guenther FH, Hampson M, Johnson D (1998) A theoretical investigation of reference frames for the planning of speech movements. Psychol Rev 105(4):611–633

    Article  CAS  PubMed  Google Scholar 

  • Hahn U (2014) The Bayesian boom: good thing or bad? Front Psychol 5. Art ID 765

  • Honda K (1996) Organization of tongue articulation for vowels. J Phon 24:39–52

    Article  Google Scholar 

  • Jones M, Love B (2011) Bayesian fundamentalism or enlightenment? On the explanatory status and theoretical contributions of bayesian models of cognition. Behav Brain Sci 34:169–231

    Article  PubMed  Google Scholar 

  • Jordan MI (1996) Computational motor control. In: Gazzaniga MS (ed) The cognitive neurosciences. MIT Press, Cambridge, pp 597–609

    Google Scholar 

  • Kaelbling L, Littman M, Cassandra A (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101(1–2):99–134

    Article  Google Scholar 

  • Kappen HJ, Gómez V, Opper M (2012) Optimal control as a graphical model inference problem. Mach Learn 87(2):159–182

    Article  Google Scholar 

  • Kawato M (1999) Internal models for motor control and trajectory planning. Curr Opin Neurobiol 9(6):718–727

    Article  CAS  PubMed  Google Scholar 

  • Laboissière R, Ostry DJ, Feldman AG (1996) The control of multi-muscle systems: human jaw and hyoid movements. Biol Cybern 74(4):373–384

    Article  PubMed  Google Scholar 

  • Lebeltel O, Bessière P, Diard J, Mazer E (2004) Bayesian robot programming. Auton Robot 16(1):49–79

    Article  Google Scholar 

  • Ma WJ (2010) Signal detection theory, uncertainty, and poisson-like population codes. Vis Res 50:2308–2319

    Article  PubMed  Google Scholar 

  • Ma WJ (2012) Organizing probabilistic models of perception. Trends Cogn Sci 16(10):511–518

    Article  PubMed  Google Scholar 

  • Ma L, Perrier P, Dang J (2006) Anticipatory coarticulation in vowel-consonant-vowel sequences: a crosslinguistic study of french and mandarin speakers. In: Proceedings of the 7th international seminar on speech production. Ubatuba, pp 151–158

  • Marr D, Vision (1982) A computational investigation into the human representation and processing of visual information. W.H. Freeman, New York

    Google Scholar 

  • Ménard L (2002) Production et perception des voyelles au cours de la croissance du conduit vocal: variabilité, invariance et normalisation. Unpublished Ph.D. thesis, Université Stendhal de Grenoble

  • Murphy K (2002) Dynamic bayesian networks: representation, inference and learning. Unpublished Ph.D. thesis, University of California, Berkeley, Berkeley, CA

  • Nelson W (1983) Physical principles for economies of skilled movements. Biol Cybern 46:135–147

    Article  CAS  PubMed  Google Scholar 

  • Payan Y, Perrier P (1997) Synthesis of VV sequences with a 2D biomechanical tongue model controlled by the equilibrium point hypothesis. Speech Commun 22(2):185–205

    Article  Google Scholar 

  • Perkell SJ, Nelson LW (1985) Variability in production of the vowels /i/ and /a/. J Acoust Soc Am 77:1889–1895

    Article  CAS  PubMed  Google Scholar 

  • Perkell J, Matthies M, Lane H, Guenther F, Wilhelms-Tricarico R, Wozniak J, Guiod P (1997) Speech motor control: acoustic goals, saturation effects, auditory feedback and internal models. Speech Commun 22(2):227–250

    Article  Google Scholar 

  • Perrier P, Boë LJ, Sock R (1992) Vocal tract area function estimation from midsagittal dimensions with ct scans and a vocal tract castmodeling the transition with two sets of coefficients. J Speech Lang Hear Res 35(1):53–67

    Article  CAS  Google Scholar 

  • Perrier P, Payan Y, Zandipour M, Perkell J (2003) Influences of tongue biomechanics on speech movements during the production of velar stop consonants: a modeling study. J Acoust Soc Am 114(3):1582–1599

    Article  PubMed  Google Scholar 

  • Perrier P, Ma L, Payan Y (2005) Modeling the production of VCV sequences via the inversion of a biomechanical model of the tongue. In: Proceedings of interspeech 2005, Lisbon, Portugal, pp 1041–1044

  • Poggio T, Girosi F (1989) A theory of networks for approximation and learning. Tech. rep., Artificial Intelligence Laboratory & Center for Biological Information Processing, MIT, Cambridge, MA, USA

  • Pouget A, Beck JM, Ma WJ, Latham PE (2013) Probabilistic brains: knowns and unknowns. Nat Neurosci 16(9):1170–1178

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Robert C (2007) The Bayesian choice: from decision-theoretic foundations to computational implementation. Springer, New York

  • Robert-Ribes J (1995) Modèles d’intégration audiovisuelle de signaux linguistiques: de la perception humaine a la reconnaissance automatique des voyelles. Unpublished Ph.D. thesis, Institut National Polytechnique de Grenoble

  • Schmolesky MT, Wang Y, Hanes DP, Thompson KG, Leutgeb S, Schall JD, Leventhal AG (1998) Signal timing across the macaque visual system. J Neurophysiol 79(6):3272–3278

    CAS  PubMed  Google Scholar 

  • Shim JK, Latash ML, Zatsiorsky VM (2003) Prehension synergies: trial-to-trial variability and hierarchical organization of stable performance. Exp Brain Res 152(2):173–184

    Article  PubMed Central  PubMed  Google Scholar 

  • Todorov E (2004) Optimality principles in sensorimotor control. Nat Neurosci 7(9):907–915

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Todorov E, Jordan MI (2002) Optimal feedback control as a theory of motor coordination. Nat Neurosci 5(11):1226–1235

    Article  CAS  PubMed  Google Scholar 

  • Tourville JA, Reilly KJ, Guenther FH (2008) Neural mechanisms underlying auditory feedback control of speech. Neuroimage 39(3):1429–1443

    Article  PubMed Central  PubMed  Google Scholar 

  • Toussaint M (2009) Probabilistic inference as a model of planned behavior. Künstl Intell 3(9):23–29

    Google Scholar 

  • Uno Y, Kawato M, Suzuki R (1989) Formation control of optimal trajectory in human multijoint arm movement: minimum torque-change model. Biol Cybern 61:89–101

    Article  CAS  PubMed  Google Scholar 

  • Wolpert DM (2007) Probabilistic models in human sensorimotor control. Hum Mov Sci 26:511–524

    Article  PubMed Central  PubMed  Google Scholar 

Download references

Acknowledgments

Authors wish to thank Pierre Bessière and Jean-Luc Schwartz for guidance and inspiring conversations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jean-François Patri.

Additional information

The research leading to these results has received funding from the European Research Council under the European Community’s Seventh Framework Programme (FP7/2007-2013 Grant Agreement No. 339152, “Speech Unit(e)s,” PI: Jean-Luc-Schwartz).

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 56 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Patri, JF., Diard, J. & Perrier, P. Optimal speech motor control and token-to-token variability: a Bayesian modeling approach. Biol Cybern 109, 611–626 (2015). https://doi.org/10.1007/s00422-015-0664-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00422-015-0664-4

Keywords

Navigation