Elsevier

NeuroImage

Volume 178, September 2018, Pages 574-582
NeuroImage

Neural representation of vowel formants in tonotopic auditory cortex

https://doi.org/10.1016/j.neuroimage.2018.05.072Get rights and content

Highlights

  • Functional MRI was used to investigate the cortical encoding of vowels.

  • Vowels are distinguished by peaks in their frequency spectra called formants.

  • Vowels differentially activated tonotopic regions corresponding to their formants.

  • Neural encoding of vowels is scaffolded on tonotopy.

Abstract

Speech sounds are encoded by distributed patterns of activity in bilateral superior temporal cortex. However, it is unclear whether speech sounds are topographically represented in cortex, or which acoustic or phonetic dimensions might be spatially mapped. Here, using functional MRI, we investigated the potential spatial representation of vowels, which are largely distinguished from one another by the frequencies of their first and second formants, i.e. peaks in their frequency spectra. This allowed us to generate clear hypotheses about the representation of specific vowels in tonotopic regions of auditory cortex. We scanned participants as they listened to multiple natural tokens of the vowels [ɑ] and [i], which we selected because their first and second formants overlap minimally. Formant-based regions of interest were defined for each vowel based on spectral analysis of the vowel stimuli and independently acquired tonotopic maps for each participant. We found that perception of [ɑ] and [i] yielded differential activation of tonotopic regions corresponding to formants of [ɑ] and [i], such that each vowel was associated with increased signal in tonotopic regions corresponding to its own formants. This pattern was observed in Heschl's gyrus and the superior temporal gyrus, in both hemispheres, and for both the first and second formants. Using linear discriminant analysis of mean signal change in formant-based regions of interest, the identity of untrained vowels was predicted with ∼73% accuracy. Our findings show that cortical encoding of vowels is scaffolded on tonotopy, a fundamental organizing principle of auditory cortex that is not language-specific.

Introduction

Cortical encoding of speech sounds has been shown to depend on distributed representations in auditory regions on Heschl's gyrus (HG) and the superior temporal gyrus (STG). Studies using functional MRI (Formisano et al., 2008; Obleser et al., 2010; Kilian-Hütten et al., 2011; Bonte et al., 2014; Arsenault and Buchsbaum, 2015; Evans and Davis, 2015; Zhang et al., 2016) and intracranial electrocorticography (Chang et al., 2010; Pasley et al., 2012; Chan et al., 2014; Mesgarani et al., 2014; Leonard et al., 2016; Moses et al., 2016) have shown that phonemes can be reconstructed and discriminated by machine learning algorithms based on the activity of multiple voxels or electrodes in these regions. Neural data can distinguish between vowels (Formisano et al., 2008; Obleser et al., 2010; Bonte et al., 2014; Mesgarani et al., 2014) and between consonants (Chang et al., 2010; Mesgarani et al., 2014; Arsenault and Buchsbaum, 2015; Evans and Davis, 2015), and there is evidence that phonemic representations in these regions are categorical and reflect the contribution of top-down information (Chang et al., 2010; Kilian-Hütten et al., 2011; Bidelman et al., 2013; Mesgarani et al., 2014; Leonard et al., 2016).

However, little is known regarding the spatial organization of cortical responses that underlie this distributed encoding, even in cases where hypotheses can readily be made based on known principles of auditory cortical organization. The most prominent organizing principle of core auditory regions is tonotopy, whereby there are several continuous gradients between regions in which neurons preferentially respond to lower or higher frequencies (Talavage et al., 2004; Woods et al., 2009; Humphries et al., 2010; Da Costa et al., 2011; Dick et al., 2012; Saenz and Langers, 2013; De Martino et al., 2015). Tonotopic organization also extends to auditory regions beyond the core on the lateral surface of the STG and beyond (Striem-Amit et al., 2011; Moerel et al., 2012, 2013; Dick et al., 2017).

Vowels are pulse-resonance sounds in which the vocal tract acts as a filter, imposing resonances on the glottal pulses, which appear as peaks on the frequency spectrum. These peaks are referred to as formants, and vowels are distinguished from one another largely in terms of the locations of their first and second formants (Peterson and Barney, 1952), which are quite consistent across speakers despite variation in the pitches of their voices, and across pitches within each individual speaker. Because formants are defined in terms of peak frequencies, we hypothesized that vowels may be discriminable based on neural activity in tonotopic regions corresponding to the formants that characterize them.

In animal studies, perception of vowels is associated with increased firing rates of frequency-selective neurons in primary auditory cortex (Versnel and Shamma, 1998; Mesgarani et al., 2008). In humans, natural sounds are encoded by multiple spectrotemporal representations that differ in spatial and temporal resolution (Moerel et al., 2012, 2013; Santoro et al., 2014) such that spectral and temporal modulations relevant for speech processing can be reconstructed from functional MRI data acquired during presentation of natural sounds (Santoro et al., 2017). Therefore it can be predicted that the cortical encoding of vowels, as a special case of natural sounds, would follow the same principles. However, the cortical representation of vowel formants in tonotopic regions has not previously been demonstrated. Magnetoencephalography (MEG) studies have shown differences in source localization between distinct vowels (Obleser et al., 2003, 2004; Scharinger et al., 2011), but findings have been inconsistent across studies (Manca and Grimaldi, 2016), so it is unclear whether any observed differences reflect tonotopic encoding of formants. Neuroimaging studies have almost never reported activation differences between different vowels in univariate subtraction-based analyses (e.g. Formisano et al., 2008; Obleser et al., 2010). As noted above, the imaging and electrocorticography studies that have demonstrated neural discrimination between vowels have done so on the basis of distributed representations (e.g. Formisano et al., 2008; Mesgarani et al., 2014). The patterns of voxels or electrodes contributing to these classifications have been reported to be spatially dispersed (Mesgarani et al., 2014; Zhang et al., 2016).

To determine whether vowel formants are encoded by tonotopic auditory regions, we used functional MRI to map tonotopic auditory cortex in twelve healthy participants, then presented blocks of the vowels [ɑ] (the first vowel in ‘father’) and [i] (as in ‘peak’) in the context of an irrelevant speaker identity change detection task. We examined neural responses to the two vowels in regions of interest where voxels' best frequencies corresponded to their specific formants, to determine whether vowel identity could be reconstructed from formant-related activation.

Section snippets

Participants

Twelve neurologically normal participants were recruited from the University of Arizona community in Tucson, Arizona (age 32.0 ± 5.9 (sd) years, range 26–44 years; 7 male, 5 female; all right-handed; all native speakers of English; education 17.8 ± 1.6 years, range 16–20 years). All participants passed a standard hearing screening (American Speech-Language-Hearing Association, 1997).

All participants gave written informed consent and were compensated for their time. The study was approved by the

Behavioral data

In the tonotopy task, participants detected 69.2 ± 19.4% of the instances of laughter (range 27.5–92.5%) embedded in the stimuli, while making a median of 25.5 false alarms (range 1–71) in total across the two runs. In the vowel task, participants detected 98.8 ± 1.4% of the oddball vowels (range 95–100%), while making a median of 2 false alarms (range 0–7) in total across the three runs. These results indicate that all participants maintained attention to the stimuli throughout the experiment.

Tonotopic maps

Discussion

The aim of this study was to determine whether vowels are encoded in tonotopic auditory regions in terms of their formants. We found strong evidence that this is the case. In particular, the significant interaction of ROI-defining vowel by presented vowel indicates that [ɑ] and [i] differentially activated tonotopic regions with best frequencies corresponding to their specific formants. This pattern held independently in HG and the STG, in the left and right hemispheres, and in regions

Funding

This research was supported in part by the National Institute on Deafness and Other Communication Disorders at the National Institutes of Health (grant number R01 DC013270) and the National Science Foundation (grant number DGE-1746060).

Acknowledgements

We gratefully acknowledge the assistance of Ed Bedrick, Andrew DeMarco, Shannon Knapp, Andrew Lotto, Marty Sereno, Scott Squire, Brad Story, Griffin Taylor, and Andrew Wedel, and we thank all of the individuals who participated in the study.

References (70)

  • M. Steinschneider et al.

    Enhanced physiologic discriminability of stop consonants with prolonged formant transitions in awake monkeys based on the tonotopic organization of primary auditory cortex

    Hear. Res.

    (2011)
  • M. Steinschneider et al.

    Tonotopic organization of responses reflecting stop consonant place of articulation in primary auditory cortex (A1) of the monkey

    Brain Res.

    (1995)
  • K.J. Worsley et al.

    A general statistical analysis for fMRI data

    Neuroimage

    (2002)
  • E.J. Allen et al.

    Representations of pitch and timbre variation in human auditory cortex

    J. Neurosci.

    (2017)
  • American Speech-Language-Hearing Association

    Guidelines for Audiologic Screening

    (1997)
  • J.S. Arsenault et al.

    Distributed neural representations of phonological features during speech perception

    J. Neurosci.

    (2015)
  • D. Bates et al.

    Fitting linear mixed-effects models using lme4

    J. Stat. Software

    (2015)
  • P. Belin et al.

    The Montreal Affective Voices: a validated set of nonverbal affect bursts for research on auditory affective processing

    Behav. Res. Meth.

    (2008)
  • P. Boersma

    Praat, a system for doing phonetics by computer

    Glot Int.

    (2001)
  • M. Bonte et al.

    Task-dependent decoding of speaker and vowel identity from auditory cortical response patterns

    J. Neurosci.

    (2014)
  • A.M. Chan et al.

    Speech-specific tuning of neurons in human superior temporal gyrus

    Cereb. Cortex

    (2014)
  • E.F. Chang et al.

    Categorical speech representation in human superior temporal gyrus

    Nat. Neurosci.

    (2010)
  • S. Da Costa et al.

    Human primary auditory cortex follows the shape of Heschl's gyrus

    J. Neurosci.

    (2011)
  • S. Da Costa et al.

    Tuning in to sound: frequency-selective attentional filter in human primary auditory cortex

    J. Neurosci.

    (2013)
  • F. De Martino et al.

    High-resolution mapping of myeloarchitecture in vivo: localization of auditory areas in the human brain

    Cereb. Cortex

    (2015)
  • F. Dick et al.

    In vivo functional and myeloarchitectonic mapping of human primary auditory areas

    J. Neurosci.

    (2012)
  • F.K. Dick et al.

    Extensive tonotopic mapping across auditory cortex is recapitulated by spectrally directed attention and systematically related to cortical myeloarchitecture

    J. Neurosci.

    (2017)
  • E. Diesch et al.

    Magnetic fields elicited by tones and vowel formants reveal tonotopy and nonlinear summation of cortical activation

    Psychophysiology

    (1997)
  • C.T. Engineer et al.

    Cortical activity patterns predict speech discrimination ability

    Nat. Neurosci.

    (2008)
  • S. Evans et al.

    Hierarchical organization of auditory and motor representations in speech perception: evidence from searchlight similarity analysis

    Cereb. Cortex

    (2015)
  • J.J. Faraway

    Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models

    (2016)
  • B. Fischl et al.

    Automatically parcellating the human cerebral cortex

    Cereb. Cortex

    (2004)
  • E. Formisano et al.

    “Who” is saying “what”? Brain-based decoding of human voice and speech

    Science

    (2008)
  • J. Fritz et al.

    Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex

    Nat. Neurosci.

    (2003)
  • C. Honey et al.

    Neural resolution of formant frequencies in the primary auditory cortex of rats

    PLoS One

    (2015)
  • Cited by (0)

    View full text