Human larynx motor cortices coordinate respiration for vocal-motor control

Vocal flexibility is a hallmark of the human species, most particularly the capacity to speak and sing. This ability is supported in part by the evolution of a direct neural pathway linking the motor cortex to the brainstem nucleus that controls the larynx - the primary sound source for communication. Early brain imaging studies demonstrated that larynx motor cortex at the dorsal end of the orofacial division of motor cortex (dLMC) integrated laryngeal and respiratory control, thereby coordinating two major muscular systems that are necessary for vocalization. Neurosurgical studies have since demonstrated the existence of a second larynx motor area at the ventral extent of the orofacial motor division (vLMC) of motor cortex. The vLMC has been presumed to be less relevant to speech motor control, but its functional role remains unknown. We employed a novel ultra-high field (7T) magnetic resonance imaging paradigm that combined singing and whistling simple melodies to localise the larynx motor cortices and test their involvement in respiratory motor control. Surprisingly, whistling activated both 'larynx areas' more strongly than singing despite the reduced involvement of the larynx during whistling. We provide further evidence for the existence of two larynx motor areas in the human brain, and the first evidence that laryngeal-respiratory integration is a shared property of both larynx motor areas. We outline explicit predictions about the descending motor pathways that give these cortical areas access to both the laryngeal and respiratory systems and discuss the implications for the evolution of speech.


Introduction
Diverse and flexible vocal communication is a hallmark of the human species, most notably in the ability to speak and sing. These human behaviours are supported by the human capacity to flexibly add novel vocal patterns to their repertoire usually by learning through imitation ( Janik and Slater, 2000 ). Few species of mammals have strong Vocal Production Learning (VPL) abilities and none of these are closely related to humans. Monkeys are particularly weak vocal learners ( Fischer and Hammerschmidt, 2019 ), while non-human apes appear to have intermediate VPL abilities ( Lameira et al., 2016 ;Wich et al., 2012 ) as well as some facility in controlling the respiratory drive for sound production ( Lameira et al., 2013 ;Perlman and Clark, 2015 ;Wich et al., 2009 ). ✩ Preprint DOI: https://psyarxiv.com/pc4uh/ * Corresponding author. E-mail address: sonja.kotz@maastrichtuniversity.nl (S.A. Kotz).
These abilities are supported by specialisations in the motor system that have been considerably altered over course of primate evolution. The human brain is peculiar in having two Larynx Motor Areas (LMCs) per hemisphere to control a single laryngeal organ, where only one larynx area would be expected ( Belyk and Brown, 2017 ). The first of these LMCs to be described is located in primary motor cortex, which contains a somatotopic map of the body's muscles in the brain ( Lotze et al., 2000 ;Penfield and Boldrey, 1937 ;Rao et al., 1995 ;Stippich et al., 2002 ). Early brain imaging studies identified a humanspecific larynx-controlling region at the dorsal extent of the orofacial somatotopy in humans ( Brown et al., 2009( Brown et al., , 2008( Brown et al., , 2004Loucks et al., tongue Simonyan et al. 2007 ;Brown et al. 2008 ;Peck et al. 2009 ;Grabski et al. 2013 ;Belyk et al., 2018b ).
The ventral Larynx Motor Cortex (vLMC) has been identified by more recent studies including neurosurgical recording ( Bouchard et al., 2013 ;Chang et al., 2013 ;Dichter et al., 2018 ) and brain imaging ( Eichert et al. 2020a ;Kleber et al. 2013 ;, though earlier studies reported activation that was consistent with this region ( Olthoff et al., 2008 ;Terumitsu et al., 2006 ). These studies confirmed the localization of the dLMC, near the articulatory muscles, and also observed a second, larynx-controlling region at the ventral extent of the orofacial motor cortex near the representation of the throat and swallowing ( Breshears et al., 2015 ;Penfield and Boldrey, 1937 ).
The mechanisms by which these two brain areas share control over the voice, and what separate contributions they may make to voice motor control, are critical to our understanding of how humans evolved to speak. The dLMC is known to integrate laryngeal and respiratory motor control Simonyan et al., 2007 ), suggesting that it may have a role in coordinating these muscular systems both of which are required for vocal sound production. The dual cortical representation of the laryngeal muscles had not yet been described at the time of these studies, and it is not known whether this laryngeal-respiratory linkage is shared by the vLMC.
We hypothesize that either both larynx motor areas integrate laryngeal and respiratory function, or laryngeal-motor integration is restricted to the dLMC. Joint control of the laryngeal and respiratory musculature is likely to reduce conduction latencies as well as the metabolic cost of coordination between the muscle groups that drive the primary sound source for communication ( Hallermann et al., 2012 ;Ju et al., 2016 ), which may have driven both regions to develop this feature. Alternatively, the restriction of laryngeal-respiratory integration to the dLMC would suggest a simpler adaptation in support of speech motor control.
We conducted an ultra-high field functional Magnetic Resonance Imaging (fMRI) study of voice motor control as compared to whistling ( Fig. 1 ). Vocal imitation of wordless melodies is an effective localizer of the dorsal and ventral larynx motor areas . Whistling imitation matches the experimental demands of vocal imitation ( Belyk et al., 2018a ) and engages a different subset of the speech relevant musculature. Whereas singing engages the laryngeal and respiratory muscles, whistling engages the articulatory and respiratory muscles ( Belyk et al., 2019 ). Comparing the neural correlates of this combination of behaviours is a novel probe of a neural system that provides the vocal dexterity required for speech and furthers our understanding of the unique features that make the human brain speech capable .

Participants
Thirteen participants (9 female), with a mean age of 26.4 (SD 5.2) years participated in the study after giving their informed consent. All participants were without neurological or psychiatric illness. Twelve participants were right-handed, and one was ambidextrous. Participants had diverse linguistic backgrounds and included native speakers of English (4), German (4), Dutch (3), Spanish (1), or Catalan (1) though all were fluent in English. Participants were recruited at Maastricht University. The study was approved by the Ethical Review Committee at the Faculty of Psychology and Neuroscience at Maastricht University (ECP-161_01_02_2016).
A second sample of twenty-four native speakers of British English (21 female) with mean age 21.0 (SD 3.3) was analysed to replicate resting state analyses. Volunteers were recruited from the participant pool at the Department of Psychology at Royal Holloway, University of London. This study was approved by the research ethics committee of Royal Holloway, University of London (587-2017-10-24-14-50-UXJT010).

Procedure
Participants performed experimental tasks in two runs of functional MRI lasting 720 s each. In the first run participants imitated simple melodies by whistling or singing. The second run was an experiment on simple speech movements not reported here. Runs followed a sparse, event-related paradigm that allowed participants to perform the experimental task without interference due to auditory noise produced by the MRI scanner during data acquisition. Imitations began 5-7 s prior to data acquisition and were followed by a 6-8 s gap to allow the BOLD (blood oxygen level dependant) response to return to baseline. Compliance to task instructions was verified by audio recordings taken during the experimental session using an MRI compatible microphone.
Participants listened to 48 simple melodies with the instruction to imitate them as accurately as possible ( Fig. 1 ). Half of the stimuli were presented in a vocal timbre matched to the gender of the participant to be imitated by singing, and the remaining half were presented in a whistled timbre to be imitated with a bilabial whistle. Singing was performed without words as a hum with the lips gently parted to isolate vocalisation. Participants practiced singing and whistling without head movement outside the scanner on a separate set of auditory stimuli. The auditory stimuli lasted 4 s and were followed by a 5-7 s silent period during which participants imitated the stimulus.
Half of the stimuli had an isochronous temporal pattern, but varied in pitch (i.e., melodies). The other half of the stimuli had a fixed pitch, but varied in temporal pattern (i.e., rhythms). Each combination of movement task (singing or whistling) and stimulus type (melodies or rhythms) was presented in two blocks of six trials for a total of 12 trials per condition. A visual cue preceded each trial indicating whether the stimulus should be sung whistled and was replaced by a fixation cross for the remainder of the trial. Blocks were presented in counterbalanced order and separated by one or two silent rest trials to a total of 12. Separate melodic and rhythmic stimuli were presented to address a separate set of hypotheses not addressed here.
Each stimulus consisted of 5 notes lasting 750 ms separated by an interval of 50 ms. Twelve melodies were composed by randomly sampling notes from a chromatic scale in the ranges A2-G3 # (110-207.65 Hz) for male voices, A3-G4 # (220-415.3 Hz) for female voices, and A5-G6 # (880-1661.22 Hz) for whistling. Notes were sampled from a uniform distribution such that every degree of the chromatic scale appeared in equal number and interval sizes followed a normal distribution centred on zero. Rhythms were composed by splitting one standard length note into two notes of half-duration (375 ms) and combining two standard length notes into one note of double duration (1500 ms). This procedure produced temporally complex stimuli with a duration and number of notes matched to the isochronous melodies. Split and combined notes occurred with equal probability at every note position to a total of 12 rhythms. Rhythmic melodies had fixed pitches at C3, C4, or C6 (130.81, 261.62, or 1046.5 Hz) for male voices, female voices, and whistling respectively.
Stimuli for singing trials were synthesized in a vocal timbre on a neutral vowel (Leon, Zero-G Limited, Okehampton, UK). Whistled stimuli were synthesized from a sine wave multiplied by an onset envelope that was empirically estimated from bilabial whistles recorded from 10 individuals ( Belyk et al., 2018a ).

Magnetic resonance imaging
MR images were acquired with a Siemens 7T MAGNETOM ultra-high field MRI with a 32-channel head coil (Nova Medical, Wilmington, USA) at the Maastricht Brain Imaging Center in Maastricht, Netherlands. Participants' heads were firmly secured with foam pillows. Noise cancelling headphones were provided to protect against auditory scanner noise. A T 1 -weighted image was collected using the MP2RAGE pulse sequence ( Marques et al., 2010 ) with 0.65 mm isotropic voxels. A run of 5 volumes with the same parameters as BOLD sensitive runs was collected A) The source filter theory of speech outlines three biomechanical systems (left) that contribute to speech acoustics (right). The respiratory apparatus generates mechanical force. Air passing through the larynx causes it to vibrate, setting up a harmonic structure the most notable feature of which is the pitch of the voice (f0). The articulatory muscles shape a series of resonant chambers in the oral cavity that selectively amplify certain frequency bands ( F1, F2, F3 ), which in turn encode the vowels of speech. B) Speech engages all three muscular systems, singing wordless melodies predominantly engages the respiratory and laryngeal systems, while whistling predominantly engages the respiratory and articulatory systems. C) Participants heard novel melodies and then imitated them by singing without words or by whistling. Stimuli were either melodies or rhythms, and the figure shows an example of each. A sparse-sampling paradigm ensured that auditory scanner noise did not interfere with participant behaviour.
with the phase encoding direction inverted to support image unwarping of susceptibility induced distortions.
Functional images sensitive to the BOLD signal were collected with the Center for Magnetic Resonance Research (CMRR) multi-band echoplanar imaging sequences ( Moeller et al., 2010 ) according to a sparse event-related sampling design ( Hall et al., 1999 ). Samples were collected 5 or 7 s after imitation onset to eliminate auditory scanner noise during stimulus presentation and task performance, as well as to minimize movement-related artefacts during image acquisition. These jittered acquisition times were selected to ensure that data were collected near the expected maxima of the BOLD response after accounting for hemodynamic lag. Images were collected with TR = 15 s, TA = 2 s, spatial resolution = 1.25 mm isotropic, slice gap = 0 mm, FOV = 160 × 160 × 120 mm, number of slices = 92, echo time = 18.6 ms, flip angle = 70°, multiband acceleration factor = 2. In each run 60 volumes were collected over 15 min and 15 s. Two volumes were collected and discarded prior to each run.
An additional run of BOLD-sensitive images was collected while participants remained at rest. The resting state runs had the same parameters as task-based runs, with the exception that there were no silent gaps between volume acquisitions ( TR = 2 s) and 360 volumes were collected per run. Resting state runs for two participants were not collected and a third participant's resting state run was ended after 174 volumes due to specific absorption rate limitations.
A second resting state BOLD sensitive dataset from a separate set of participants was analysed as a replication sample ( N = 24). These data were collected using a Siemens Trio 3T scanner located at Royal Hollaway with TR = 1 s, spatial resolution = 3 mm isometric, slice gap = 0.75 mm, FOV = 192 × 192 × 127 mm, number of slices = 34, echo time = 30 ms, flip angle = 78°, multi-band acceleration factor = 2. In each run 400 volumes were collected over 6 min and 40 s.

Data availability
The raw 7T imaging data, image processing pipeline, experiment code, and stimulus files are accessible through the Open Science Framework ( https://osf.io/zhb5q/ ).

Task-based runs
Susceptibility induced distortions due to magnetic field inhomogeneity were corrected using the TOPUP algorithm in FSL v5.0.10 ( Jenkinson et al., 2012 ;Smith et al., 2004 ). The remainder of the MRI data processing was performed with SPM12 and MATLAB version R2017a ( Mathworks, 2017 ) running on an iMac (OSX 10.11.6). All images were realigned to the first echo-planar image in each functional run. Runs were co-registered to the T 1 -weighted images for each individual participant, and spatially normalized to the Montreal Neurological Institute standard stereotaxic space ( Fonov et al., 2009 ) using a transformation matrix generated during tissue class segmentation ( Ashburner and Friston, 2005 ). Images were spatially smoothed using a Gaussian kernel with Full Width Half Maximum (FWHM) equal to three voxels. Head movement was regressed from the raw data by constructing a General Linear Model (GLM) that predicted the BOLD signal from six head motion parameters and images were masked to grey matter.

Resting state runs
In addition to the steps described above, resting state functional images were slice-time corrected and normalized for global signal intensity. The mean BOLD signal within white matter as well as cerebrospinal fluid were measured separately for each volume. The mean BOLD-signal within grey matter was not included in the regression model as the mean grey matter signal is colinear with the signal of interest to resting state functional connectivity analyses ( Bright et al., 2017 ;Murphy and Fox, 2017 ). The tissue-specific measures of global signal and six degrees of head motion were modelled in a fixed-effects analysis that included data pre-whitening. The residuals from this analysis were retrieved for further analysis.

Partial least squares analysis
In light of the high degree of dimensionality of ultra-high field MRI, we utilized non-rotated task-based Partial Least Squares (PLS) analysis ( McIntosh et al., 1996 ;McIntosh and Lobaugh, 2004 ) in lieu of the standard GLM approach ( Friston et al., 1994 ;Worseley and Friston, 1995 ). Unlike statistical parametric mapping, in which each voxel is evaluated independently, PLS detects latent variables that capture a network of correlated voxels. This approach mitigates the "curse of dimensionality " by taking advantage of the spatial smoothness of fMRI data: since adjacent voxels are highly correlated and the distributed network of brain regions that participate in the same network may exhibit similar responses to the experimental design, fMRI data are amenable to dimensionality reduction.
PLS is similar to principal components analysis (PCA) with the important exception that solutions may be constrained to covariance structures that are of theoretical interest (e.g., that covary with experimental conditions). We performed non-rotated task based PLS with the orthogonal contrasts 1) task versus rest and 2) whistling versus singing. Mean centring was applied at the group level. Significance was assessed via permutation test with 500 simulations at a confidence level of 95%. Bootstrapping was performed with 500 simulations with 100 split-halves to calculate bootstrap ratios (BSR) to assess the reliability of latent variables. We report grey matter voxels with correlations that were both unlikely under the null hypothesis according to the permutation tests ( p < 0.025 at either tail) and stable across bootstrap iterations (BSR > 2). This amounts to requiring that voxels are both statistically significant (as assessed by permutation tests) and reliable (as assessed by bootstrapping; Krishnan et al., 2011 ).

Resting state functional connectivity
Whole brain correlation maps were computed from seeds in Regions of Interest (ROIs) in the dLMC ( + /-41, -16, 39), vLMC ( + /-65, -4, 14), and tongue motor cortex ( + /-52, -6, 28). Regions of interest were defined at 5 mm spheres around peak coordinates as localized by task based PLS and labels are based on established patterns of somatotopy (e.g., Takai et al., 2010 ). Group level significance was assessed using the Statistical non-Parametric Mapping (SnPM) toolbox to perform permutation tests with 2 N iterations, variance smoothing with a FWHM of spanning three voxels to match the level of smoothing applied during pre-processing. Statistical maps were thresholded with a cluster-wise error rate of p < 0.05 calculated from a cluster forming threshold of p < 0.001. Post-hoc tests were conducted to directly compare correlations between regions of interest to test the hypothesis that the larynx motor cortices were more strongly functionally connected with each other than with a more proximal brain region that controls a different set of muscles. Pearson correlations were calculated between each pair of ROIs within each participant to measure the degree of association between brain regions. Group level inferences were determined by Welch's paired t-tests to determine whether correlation coefficients deferred from zero. The same analysis was repeated for both resting state datasets.

PLS component 1: imitation versus rest
The first latent variable was constrained to correlate with a contrast between imitative sound production by either singing or whistling versus rest. These findings reflect regions of common activation for imitative singing and whistling ( Fig. 2 and Table 1 ). This component loaded

Fig. 2. Results of PLS analysis.
A) The first latent variable (red) reflects the contrast imitation > rest, regardless of whether imitation was performed by singing or by whistling and loads predominantly onto the vLMC and dLMC. The second latent variable reflects the contrast whistling > singing (blue) and loads predominantly onto tongue primary motor cortex. Lateral surface views of the brain show the spatial relationship between the vLMC and dLMC as localized by LV 1 and tongue primary motor cortex as localized by LV 2. Insets are axial slices at the level of the dLMC, tongue motor cortex, and vLMC respectively. dLMC: dorsal larynx motor cortex; LV: latent variable; vLMC: ventral larynx motor cortex.  strongly onto voxels in the primary motor cortex in both hemispheres, including both the dLMC and vLMC. Additional regions within the motor network were also activated including the supplementary motor area (SMA), anterior cingulate cortex (ACC), and Cerebellum (Supplementary Material 1). Brain regions within the auditory network were evident, presumably due to either perceiving the target stimulus or processing auditory feedback. These brain regions included the primary auditory cortex and a large extent of the Superior Temporal Gyrus (STG), particularly in the right hemisphere, as well as the medial geniculate nucleus of the thalamus.

PLS component 2: whistling versus singing
The second latent variable was constrained to correlate with a contrast between imitative whistling versus singing. Surprisingly, whistling activated the dLMC and vLMC more strongly than singing in addition to the primary motor lip and tongue areas that were expected given the movements required to produce a whistled sound ( Fig. 2 and Table 2 ). Instead, singing more strongly activated regions elsewhere in the motor network including the cerebellum and basal ganglia, temporal lobe auditory regions including the STG, Superior Temporal Sulcus (STS), Middle Temporal Gyrus (MTG) and Temporal Pole. Non-audio motor regions were also activated more strongly during singing, including the Inferior Frontal Gyrus pars orbitalis (IFGorb), anterior insula, Claustrum, and amygdala (Supplementary Material 2).

Resting state connectivity
Seeds in the right and left dLMC and vLMC had functional connectivity throughout sensorimotor cortex, including mutual functional connectivity between the two larynx areas. However, functional connectivity between the dLMC and vLMC was no greater than between either larynx area and tongue primary motor cortex, suggesting that functional connectivity between the two larynx motor areas may be part of the broader pattern of connectivity within primary motor cortex rather than

Discussion
We report the first ultra-high field fMRI comparison of singing alongside whistling to contrast sound production with and without involvement of the laryngeal sound source. Unexpectedly, given the reduced laryngeal involvement in whistling, we observed that both the dLMC and vLMC were not only engaged by whistling, but more strongly than singing. The strong expiratory drive of whistling may account for the common activation in these brain regions across modes of sound production. One previous study of whistling observed activation that may have been consistent with the vLMC ( Dresel et al., 2005 ), but without explicit localisation of the larynx areas this correspondence could not be confirmed. We suggest that in addition to their established roles in laryngeal motor control, the dLMC and vLMC also contribute to respiratory motor control and may serve to integrate two muscular systems that require mutual coordination to support important behaviours such as speaking, singing, and airway protection.
That neither of the larynx motor areas in the human brain are specific to laryngeal motor control may be due to the placement of the larynx within the airway. Voiced sounds are produced by the vibration of the vocal folds, the tension of which determines vocal pitch ( Hollien and Moore, 1960 ;Titze, 2008 ;Titze et al., 1989 ;Titze and Story, 2002 ). However, the laryngeal muscles do not actively vibrate the vocal folds, but instead they determine the configuration of the larynx while vocal fold vibration is produced passively by the passage of air ( Story and Titze, 1995 ;Titze, 1989 ). In addition to its role in communication, the larynx also serves as a mechanism for airway protection ( Dua et al., 1997 ) and participates in reflexive movements during swallowing and Table 2 Latent variable two (Whistling versus Singing). Columns indicate the name of the brain region and its anatomical division for each activation along with coordinates in MNI space. Latent Variable (LV) scores indicate the magnitude of each activation and size indicates its extent in mm.  retching ( Ardran and Kemp, 1952 ;Lang et al., 2002 ) to form a secondary closure below the epiglottis ( Vilkman et al., 1996 ). An informal laryngoscopic investigation revealed that whistled notes are often interspersed with a light closure of the glottis (see Supplementary Materials 5 and 6), which is presumed to be part of the mechanism for the cessation of airflow. Glottal closure has previously been shown to activate the dLMC, though it is unknown whether it contributes to activation of the vLMC ( Brown et al., 2008 ). However, it is sensible to suppose that part of the LMCs joint mechanism of laryngeal and airway control should include the closure of the glottis. Regardless, this reduced involvement of the larynx during whistling is unlikely to account for the greater activation of the LMCs.
To date there have been few demonstrations that the dLMC and vLMC may have dissociable functions. While the dLMC is often reported in the absence of the vLMC, it seems likely that this merely reflects differences in the ease with which these areas are detected. This may be due to the greater abundance of large descending motor neurons in the dLMC ( Brodmann, 1909 ;Juda š and Cepanec, 2010 ;Vogt, 1910 ), which may generate a larger BOLD signal. For instance, simple phonation (singing without words) activates only the dLMC in some studies ( Belyk et al., 2018b ;Brown et al., 2008 ) but both the dLMC and vLMC in others . Furthermore, the sulci near the vLMC are more variable, which is expected to reduce rates of detection in group level analyses ( Eichert et al., 2020b ). Hence, while we observed stronger evidence for the involvement of the dLMC compared to the vLMC in integrating laryngeal and respiratory control, this may reflect differences in the detectability rather than function. One interesting exception is that the application of an external puff of air on the surface of the larynx may activate the vLMC, suggesting either that the vLMC has some sensory functionality or that this manipulation triggers airway protective reflexes that that involve the vLMC ( Miyaji et al., 2014 ).

A note on nomenclature
Early brain imaging studies disagreed on an appropriate label for what has come to be referred to as the dLMC ( Brown et al., 2008 ;Loucks et al., 2007 ;Simonyan et al., 2007 ). The present findings, contribute to growing evidence that specificity of function implied by the label LMC is not borne out by the data ( Brown et al., 2021( Brown et al., , 2008Loucks et al., 2007 ). While we do not undertake to reform the existing nomenclature here, we note that there is a growing need to do so.

Respiratory motor control
The human brain has a direct projection from cortex to the nucleus ambiguus, the brainstem nucleus that exerts direct control over the laryngeal muscles ( Iwatsubo et al., 1990 ;Kuypers, 1958a ). This direct pathway is absent in the brains of monkeys ( Jürgens and Ehrenreich, 2007 ;Simonyan and Jürgens, 2003 ), and less robust in the brains of non-human apes ( Kuypers, 1958b ). However, this novel arrangement in human laryngeal motor control is not sufficient to explain the joint cortical control of laryngeal and respiratory functions observed in the present study.
Respiratory motor control is coordinated by the nucleus retroambiguus ( Figure 3 ) which is a brainstem nucleus of the medulla located adjacent to the nucleus ambiguous ( Subramanian and Holstege, 2009 ). The nucleus retroambiguus contains upper motor neurons that project to lower motor neuron nuclei that innervate respiratory muscles such as the abdomen, pelvic floor, and intercostal muscles ( Vanderhorst et al., 2000 ). The nucleus retroambiguus also projects to laryngeal motor neurons in the nucleus ambiguus ( VanderHorst et al., 2001 ).
However, this retroambiguus-ambiguus projection appears to be unidirectional, such that cortical control over the nucleus ambiguus alone is unlikely to yield voluntary respiratory motor control. We therefore hypothesize that both human LMCs have direct projections to the nucleus retroambiguus in addition to the direct projections to the nucleus ambiguus that have already been observed. Such a parallel projection to both nucleus ambiguus and retroambiguus has not yet been identified in humans but is observed in the analogous structures of the avian song system, which is a strong model of the human vocal motor system ( Gahr, 2000 ;Petkov and Jarvis, 2012 ;Wild, 1993 ;Wild et al., 2000 ).
Observations of the direct projection to nucleus ambiguus have come from natural experiments due to cerebrovascular events ( Iwatsubo et al., 1990 ;Kuypers, 1958a ). In these studies, large cortical lesions caused the axons of upper motor neurons to degenerate, and degenerating axons were traced among more intact white matter. However, the lesions all resulted from cerebrovascular accidents of the middle cerebral artery (MCA). This artery supplies much of the speech-motor related cortex including both the vLMC and dLMC, hence the prevalence of speechmotor and swallowing disorders following MCA infarcts ( Heinsius et al., 1998 ;Theys et al., 2011 ). Consequently, it is not known whether the direct connection to the nucleus ambiguus originates from one or both larynx motor areas.
We hypothesize that the direct connection to the nucleus ambiguous stems from both the vLMC and dLMC in light of the lack of strong functional connectivity between these cortical larynx areas. We have previously hypothesized that the dLMC may have a yet detected direct projection to the nucleus retroambiguus ( Belyk and Brown, 2017 ), consistent with the analogous pathway in songbirds ( Gahr, 2000 ;Petkov and Jarvis, 2012 ;Wild, 1993 ;Wild et al., 2000 ). We now further hypothesize that the human vLMC may also project directly to the nucleus retroambiguus considering the evidence provided by the current experiment that both larynx controlling regions integrate respiratory motor control.

Respiratory motor control of species-typical vocalisations
While non-human primates have relatively poor ability to learn novel vocal behaviours ( Hage and Nieder, 2016 ;Nieder and Mooney, 2019 ), they can strategically select them to deploy their innate repertoire of vocalisation ( Pierce, 1985 ). This behaviour requires some degree of voluntary control over the timing of laryngeal-respiratory action in animals that lack the mechanisms provided by the human vLMC/dLMC Jürgens (1974) .
In human and non-human primates alike, the control of innate vocalisations is controlled separately by a pathway which circumvents motor cortex ( Jürgens, 2002 ). The periaqueductal gray (PAG) of the brainstem organises responses to affective stimuli, and as such it receives extensive inputs from the limbic system ( Dujardin and Jürgens, 2005 ) and has outputs to a plurality of lower motor nuclei relevant to vocalisation providing integrated control over multiple muscle groups ( Thoms and Jürgens, 1987 ). In monkeys ( Saimiri sciureus ), electrical stimulation of the PAG elicits fully formed species-typical vocalisations, including both the laryngeal and respiratory components Pratt, 1979a , 1979b ).
A region of cingulate cortex provides cortical control over the PAG. Lesions to the cingulate cortex prevent the initiation of operantly conditioned vocalizations, but spare responses to stimuli that would normally elicit an innate vocalisation ( Aitken, 1981 ;Sutton et al., 1981Sutton et al., , 1974. The homologous pathway in humans is activated during verbal expressions of emotion ( Barrett et al., 2004 ;Wattendorf et al., 2013 ). This cingulate-PAG axis provides a mechanism for coordinated laryngeal-respiratory control of innate species-typical vocalisations, which is conserved across primates. In humans, this pathway exists alongside the novel adaptions in motor cortex which support the flexibility of human vocal behaviour.

Selective pressures on respiratory motor control
Strong vocal production learning abilities are uncommon in mammals, though they have been documented in a handful of clades. The list of strong mammalian vocal learners is notably skewed towards species Fig. 3. Schematic of descending motor pathway. Cortico-bulbar pathways (dotted line) have been observed projecting to the nucleus ambiguus though it is not presently known whether these axons originate from the vLMC, the dLMC or both. We have hypothesized that both areas project to the nucleus retroambiguus as well. whose evolutionary path has placed particular constraints on respiratory motor control. VPL is most abundantly observed among aquatic mammals including cetaceans ( Janik, 2014 ;King and Sayigh, 2013 ;Noad et al., 2000 ) and pinnipeds ( Ralls et al., 1985 ;Ravignani et al., 2016 ;Stansbury and Janik, 2019 ) which must coordinate breathing with bouts of diving to manage the supply of oxygen, buoyancy, and ambient ocean pressure ( Kooyman, 1973 ;Lillie et al., 2017 ;Roos et al., 2016 ). Some species of elephants have demonstrated vocal production learning ( Poole et al., 2005 ;Stoeger et al., 2012 ), which may be related to the unique demands of respiratory snorkelling West (2001) as well as the possibly aquatic ancestry of these species ( Gaeth et al., 1999 ). Bats exhibit a range of socially communicative vocalizations in addition to echolocation ( Knörschild, 2014 ;Vernes and Wilkinson, 2019 ;Vernes, 2016 ). Notably, bats not only integrate respiratory control with echo location calls that are used for navigation, but the muscles of respiration may interact directly with the muscular control of winged flight ( Lancaster et al., 1995 ;Suthers et al., 1972 ). In this company, humans appear to be the odd-mammal out in lacking a clear selective pressure for enhanced respiratory motor control beyond its use in communication ( Verhaegen et al., 2002 ).

Limitations
The present study was based on a relatively small sample of singers due to the constraints of ultra-high field fMRI. We have mitigated the low power associated with this small sample size by utilising Partial Least Squares analysis, which considerably improves the sensitivity and reliability of statistical maps ( Grady et al., 2020 ). PLS analyses with sample sizes in the range of the present study have reliability equivalent to a standard univariate analysis of a moderately large sample. This advantage is derived in part from analysing networks of correlated brain regions rather than a mass of independent voxels. However, Grady et al. caution that this should not be used as a substitute for thoughtful experimental design and well-motivated sample sizes. The present study tested a-priori spatially specific hypotheses such that the trade-off between sample size and spatial resolution was consistent with the aims of the experiment.

Conclusion
The dLMC and vLMC are two larynx motor areas in the human brain that are important cortical structures for the voluntary control of the voice. We observed that both areas are also active during whistling, despite the reduced laryngeal involvement in that mode of sound production. We suggest that neither the dLMC nor the vLMC are strictly laryngeal, and that both may integrate laryngeal and respiratory motor control. Some clue to the separate functions of these brain regions may be found in the complex cytoarchitecture of the vLMC, which appears to be intermediate to primary motor and primary somatosensory cortex. Regardless, coordination with respiratory motor control appears to be a ubiquitous partner to laryngeal motor control as indeed the larynx sits in the airway and any action of the larynx is likely to affect respiratory effort. This has implications for our understanding of human brain evolution to the extent that it alters our understanding of a well-documented specialisation for speech . manuscript, and organised laryngoscopic investigations. AR consulted on experimental design and data analysis. CM provided critical comments on the manuscript. RG contributed the 3T dataset. SAK consulted on experimental design, edited, and provided critical comments on the manuscript.

Supplementary materials
Supplementary material associated with this article can be found, in the online version, at doi: 10.1016/j.neuroimage.2021.118326 .