Elsevier

Journal of Phonetics

Volume 74, May 2019, Pages 1-17
Journal of Phonetics

Research Article
The effects of larynx height on vowel production are mitigated by the active control of articulators

https://doi.org/10.1016/j.wocn.2019.02.002Get rights and content

Highlights

  • Larynx height affects vowel production.

  • Nevertheless, these effects are mitigated by the other articulators.

  • Modern human larynx height seems optimized for speech production.

  • But deviations from this height do not preclude speech.

Abstract

The influence of larynx position on vowel articulation is an important topic in understanding speech production, the present-day distribution of linguistic diversity and the evolution of speech and language in our lineage. We introduce here a realistic computer model of the vocal tract, constructed from actual human MRI data, which can learn, using machine learning techniques, to control the articulators in such a way as to produce speech sounds matching as closely as possible to a given set of target vowels. We systematically control the vertical position of the larynx and we quantify the differences between the target and produced vowels for each such position across multiple replications. We report that, indeed, larynx height does affect the accuracy of reproducing the target vowels and the distinctness of the produced vowel system, that there is a “sweet spot” of larynx positions that are optimal for vowel production, but that nevertheless, even extreme larynx positions do not result in a collapsed or heavily distorted vowel space that would make speech unintelligible. Together with other lines of evidence, our results support the view that the vowel space of human languages is influenced by our larynx position, but that other positions of the larynx may also be fully compatible with speech.

Introduction

The origin and evolution of language and speech are a heavily debated topic, a major division being between models proposing recent and sudden origin, restricted to modern humans only (Berwick and Chomsky, 2017, Hauser et al., 2014, Klein, 2009), versus deep origin, gradual evolution, and a wider distribution (also including archaic humans, such as the Neanderthals; Dediu and Levinson, 2013, Dediu and Levinson, 2018, Johansson, 2015, Lieberman, 2016). In particular, the speech capacities of archaic humans have been linked to the position of the larynx (itself linked to the position of the hyoid bone), and the corresponding ratio between the horizontal and the vertical parts of the vocal tract (Lieberman, 2016).

While it is currently unclear what this ratio might have been in Neanderthals and when its “modern” value evolved (Dediu and Levinson, 2013, Gokhman et al., 2017, Lieberman, 2016), a more tractable question concerns its effects on speech and language (Boë et al., 2002, de Boer and Fitch, 2010, Lieberman, 2016). More precisely, the seminal claim by Lieberman and Crelin (1971) that a high larynx (a position suggested by some for Neanderthals) reduces the vowels space, making impossible the production of the widely-used [a], [i], [u] and [ɔ], has generated a lively debate centered on the use of computer models of the vocal tract to make such inferences (Boë et al., 2007, de Boer and Fitch, 2010, Lieberman, 2007).

For example, starting from the suggestion (Honda & Tiede, 1998) that larynx height may be deduced from the shape of the oral cavity, Boë (1999) used the “variable linear articulatory model” (VLAM) (Maeda, 1990) coupled with factor analysis and a growth model to argue against (Lieberman & Crelin, 1971). Building on this and work by Boë et al., 2002, Ménard and Boë, 2000 concluded that “the maximal vowel space of a given vocal tract does not depend on the larynx height index: gestures of the tongue body (and lips and jaw) allow compensation for differences in the ratio between the dimensions of the oral cavity and pharynx” (p. 481). Boë et al. (2007) reiterated that VLAM shows a high larynx not leading to a less distinctive vowel space. However, de Boer and Fitch (2010) attributed circular reasoning to Boë et al. (2002), as the growth scaling in Boë, 1999, Boë et al., 2002, Boë et al., 2007 was applied after the articulatory factors have been extracted in the VLAM, meaning that any inferred anatomies (Neanderthals, infants) have the same degrees of articulatory freedom as modern female adults, but just with a different scaling (for example, this does not hold in the observational data from pre-babbling vocalizations of infants, which are (epilaryngeally) constricted, clearly with less degrees of articulatory freedom; Esling, Benner, & Moisik, 2015). Furthermore, such global scaling preserves the layout of the different components of the model including the angle and ratio between the pharynx and the oral cavity, but a change in this layout is precisely what has been hypothesized to set modern humans apart. Finally, de Boer and Fitch (2010) argued that the use of factor analysis in VLAM linearly extrapolates from observed to unobserved cases, likely overestimating the ability of the articulators to compensate for any effects of anatomy, and developed, in response, a model better adhering to the anatomical constraints of the vocal tract, showing that a larynx height similar to a human female would be ideal for maximally distinctive vowel inventory (Lieberman, 2012).

Here we introduce a novel computer model that has several advantages over its predecessors. First, it is based on a widely-used realistic 3D geometric model of the vocal tract (VocalTractLab 2.1) built on modern phonetic theory and calibrated with data (MRI and otherwise) from actual humans (Birkholz, 2005, Birkholz, 2013a, Birkholz and Kröger, 2006). Second, this model allows the programmatic control of multiple meaningful articulatory parameters (such as the position of the tongue tip or the degree of lip rounding), and produces the corresponding acoustic output. Third, with the author’s permission, we modified this model to allow (among others) the specification of hyoid position. Fourth, we implemented a complete agent that can control this vocal tract model using a generic machine learning algorithm, and which is capable of learning to produce a set of auditorily presented target vowels (here, [ə], [ɑ], [a], [æ], [e], [i], [o] and [u]) by controlling the free articulators of the model. This allows us to systematically study the impact of larynx height on vowel production, to find the optimal height for the production of widely-used vowels, and the compensatory strategies that can mitigate the impact of extreme larynx positions.

While still far from perfect, we think that our model represents an important advance, allowing more refined answers to questions surrounding the impact of larynx height on vowel production, and providing a platform for further improvement and application to other aspects of inter-group and inter-individual variation in speech, both pathological and normal (Dediu, Janssen, & Moisik, 2017). Given that the work reported here is in many ways novel, one of our main aims was to start from as “generic” and “theory-free” assumptions as possible and to write our code as easily replaceable and upgradeable modules.

Section snippets

Data and methods

The fundamental idea is to study how learning a set of vowels is affected by controlled changes in a particular aspect of vocal tract anatomy, here, larynx height. Such experimental manipulations are extremely difficult to conduct with human participants, but computer simulations using realistic models of the human vocal tract may offer approximations that, while imperfect, may still be good enough for answering specific questions in an objective, repeatable and quantitative manner. For more

Results

The analyses and plots reported here used R 3.4.4 (R Core Team, 2017). The full analysis (including aspects and details, including considering n=5 formants, not reported here due to space constraints) can be found in the Supplementary materials in Appendix. The patterns obtained considering n=3 and n=5 formants are roughly similar, so that we will be focusing here on the first.

We will describe first the tight relationship between the dynamically-adjusted continuous vocal tract ratio Rht

Discussion and conclusions

We focused here on the systematic variation of larynx height and on its effects on vowel acoustics and on the articulatory mechanisms engaged in compensating for it. Our computational agents, using a generic machine-learning mechanism that controls a realistic geometric model of the vocal tract, did learn to a very high degree of accuracy eight target vowels ([ə], [ɑ], [a], [æ], [e], [i], [o] and [u]) widely attested cross-linguistically and covering the modern human vowel space. However, this

Acknowledgements

We wish to thank Peter Birkholz for sharing the source code of VocalTactLab 2.1, for allowing us to modify it and for answering our questions, and to three anonymous reviewers whose comments and suggestions greatly improved the paper. This work was Funded by the Netherlands Organisation for Scientific Research (NWO) VIDI grant 276-70-022 to DD. During the writing of this paper, DD was supported by an European Institutes for Advanced Study (EURIAS) Fellowship (2017–2018) and an IDEXLyon

References (72)

  • K. Honda

    Organization of tongue articulation for vowels

    Journal of Phonetics

    (1996)
  • P. Lieberman

    Current views on Neanderthal speech capabilities: A reply to Boë et al. (2002)

    Journal of Phonetics

    (2007)
  • P. Lieberman

    Vocal tract anatomy and the neural bases of talking

    Journal of Phonetics

    (2012)
  • D.E. Lieberman et al.

    Ontogeny of postnatal hyoid and larynx descent in humans

    Archives of Oral Biology

    (2001)
  • I. Martínez et al.

    Human hyoid bones from the middle Pleistocene site of the Sima de los Huesos (Sierra de Atapuerca, Spain)

    Journal of Human Evolution

    (2008)
  • I. Martínez et al.

    Communicative capacities in Middle Pleistocene humans from the Sierra de Atapuerca in Spain

    Quaternary International

    (2013)
  • T. Nishimura et al.

    Descent of the hyoid in chimpanzees: Evolution of face flattening and speech

    Journal of Human Evolution

    (2006)
  • J.-L. Schwartz et al.

    The dispersion-focalization theory of vowel systems

    Journal of Phonetics

    (1997)
  • S.A. Xue et al.

    Normative standards for vocal tract dimensions by race as measured by acoustic pharyngometry

    Journal of Voice

    (2006)
  • J.E. Baker

    Reducing bias and inefficiency in the selection algorithm

  • W.J. Barry et al.

    Do we need a symbol for a central open vowel?

    Journal of the International Phonetic Association

    (2008)
  • H.-G. Beyer et al.

    Evolution strategies: A comprehensive introduction

    Natural Computing

    (2002)
  • P. Birkholz

    3D-artikulatorische Sprachsynthese

    Logos

    (2005)
  • P. Birkholz

    Modeling consonant-vowel coarticulation for articulatory speech synthesis

    PLoS One

    (2013)
  • Birkholz, P. (2013). Vocaltractlab 2.1 user...
  • P. Birkholz et al.

    Vocal tract model adaptation using magnetic resonance imaging

  • R.a.W. Bladon et al.

    Modeling the judgment of vowel quality differences

    The Journal of the Acoustical Society of America

    (1981)
  • L.-J. Boë

    Modelling the growth of the vocal tract vowel spaces of newly-born infants and adults: Consequences for ontogenesis and phylogenesis

  • Boersma, P., & Weenink, D. (2018). Praat: Doing phonetics by computer....
  • J. Brunner et al.

    Temporal development of compensation strategies for perturbed palate shape in German/sch/-production

  • L. Crevier-Buchman et al.

    Analogy between laryngeal gesture in Mongolian Long Song and supracricoid partial laryngectomy

    Clinical Linguistics & Phonetics

    (2012)
  • R. D’Anastasio et al.

    Micro-biomechanics of the Kebara 2 hyoid and its implications for speech in Neanderthals

    PLoS One

    (2013)
  • B. de Boer

    Modelling vocal anatomy’s significant effect on speech

    Journal of Evolutionary Psychology

    (2010)
  • B. de Boer et al.

    Computer models of vocal tract evolution: An overview and critique

    Adaptive Behavior

    (2010)
  • D. Dediu et al.

    Pushes and pulls from below: Anatomical variation, articulation and sound change. Glossa: A Journal of General

    Linguistics

    (2019)
  • D. Dediu et al.

    On the antiquity of language: The reinterpretation of Neandertal linguistic capacities and its consequences

    Frontiers in Language Sciences

    (2013)
  • Cited by (4)

    View full text