Elsevier

NeuroImage

Volume 55, Issue 2, 15 March 2011, Pages 713-723
NeuroImage

Multiple brain signatures of integration in the comprehension of degraded speech

https://doi.org/10.1016/j.neuroimage.2010.12.020Get rights and content

Abstract

When listening to speech under adverse conditions, expectancies resulting from semantic context can have a strong impact on comprehension. Here we ask how minimal variations in semantic context (cloze probability) affect the unfolding comprehension of acoustically degraded speech. Three main results are observed in the brain electric response. First, auditory evoked responses to a degraded sentence's onset (N100) correlate with participants' comprehension scores, but are generally more vigorous for more degraded sentences. Second, a pronounced N400 in response to low-cloze sentence-final words, reflecting the integration effort of words into context, increases linearly with improving speech intelligibility. Conversely, transient enhancement in Gamma band power (γ, ~ 40–70 Hz) during high-cloze sentence-final words (~ 600 ms) reflects top-down-facilitated integration. This γ-band effect also varies parametrically with signal quality. Third, a negative correlation of N100 amplitude at sentence onset and the later γ-band response is found in moderately degraded speech. This reflects two partly distinct neural strategies when dealing with moderately degraded speech; a more “bottom-up,” resource-allocating, and effortful versus a more “top-down,” associative and facilitatory strategy. Results also emphasize the non-redundant contributions of phase-locked (evoked) and non-phase-locked (induced) oscillatory brain dynamics in auditory EEG.

Research Highlights

► Semantic context affects the unfolding comprehension of acoustically degraded speech. ► Evoked N400 and induced Gamma responses reflect partly complementary processes. ► Early N100 amplitude and late Gamma-band correlate negatively. ► EEG suggests distinct neural strategies of effortful versus facilitated comprehension.

Introduction

The influence of semantic context in listening to, analysing and comprehending speech signals is well known. However, the mechanisms behind context-aided speech comprehension have remained surprisingly opaque. This holds especially true for speech comprehension in noisy environments where semantic context is thought to be highly relevant (e.g. Miller et al., 1951, Stickney and Assmann, 2001).

Increasing efforts are made to study the functional neuroanatomy behind this complex sensory–perceptive–cognitive process of speech comprehension in a more integrative approach. Various recent psycholinguistic (Shinn-Cunningham and Wang, 2008, van Linden and Vroomen, 2007) and neuroscientific (Davis et al., 2007, Davis and Johnsrude, 2007, Hannemann et al., 2007, Obleser et al., 2007, Sivonen et al., 2006, van Atteveldt et al., 2007) studies of speech-receptive function were devoted to those factors in speech communication that lie beyond the bottom-up mechanisms of the auditory pathways in analysing the signal itself. Besides semantic meaning in the narrow sense, a whole family of influences facilitate and disambiguate speech comprehension, especially under acoustically compromised conditions, amongst them talker familiarity (e.g., Eisner and McQueen, 2005); prior experience and knowledge (e.g., Hannemann et al., 2007); and prosodic and emotional cues (e.g., Gandour et al., 2004). The main focus of this investigation is on the potential relationship of early, sensory-driven processes, as reflected in early EEG components (N100) and later cognitive–linguistic integration processes, as reflected in the sentence-final N400 and the Gamma-band response.

In the present study, we aimed at narrowing down the possible top-down influences and constrained the sentence context to a minimal and constant one. We employed the well-studied measure of cloze probability. Originally developed as a measure of text readability (Taylor, 1953), it is an empirical measure for lexical expectancy of a given word in a context. Variations of it have been adopted into important studies of semantic processing, mostly in N400 electroencephalography (EEG) designs (Connolly et al., 1995, Federmeier et al., 2007, Gunter et al., 2000, Kutas and Hillyard, 1984). A stronger N400 in response to the sentence-final key word is elicited by a low-cloze sentence such as “she weighs the flour” compared to an (otherwise identical) high-cloze version such as “she sifts the flour”. This N400 effect is commonly interpreted to reflect higher cognitive effort to integrate the low-cloze (higher lexical competition) elements of the sentence. A previous study by Aydelott and colleagues had utilised degradation of the sentence context (but not the target word) by 1-kHz-low-pass filtering, while presenting the sentence word by word with 200-ms intervals (Aydelott et al., 2006). The study delivered important initial confirmation that acoustic degradation of the context influences the magnitude of the integration-related N400: No proper N400-like response to incongruent target words built up when the preceding sentence material had been strongly low-pass filtered. However, the current study goes a step further by using a parametric (i.e., more than twofold) variation of the signal degradation; by utilizing naturally spoken sentences; and by taking a comprehensive look at various EEG signatures (i.e., the evoked or phase-locked N100 and N400 potentials, and the induced or non-phase-locked Gamma-band response, see next paragraph).

Especially at the point of sentence meaning integration (i.e., the sentence-final keyword), we expect to observe distinct influences of semantic expectancy constraints on degraded speech in the stimulus-correlated EEG: The N400, as described above, is unanimously interpreted as a signature of integration effort; we therefore expect an enhanced N400 at low-expectancy words. As outlined below, we also expect facilitation of integration (an enhancement for high-expectancy words) being reflected in a modulation of Gamma-band (γ, > 30 Hz) oscillations.

Apart of the N400 literature, a whole line of evidence points to an increased effect (i.e., higher amplitudes in response to highly predictable sentence endings) for Gamma-band (γ, > 30 Hz) oscillations. Many studies on the top-down formation of percepts have focussed on this most prominent electrophysiological signature in the visual domain (e.g., Gruber et al., 2002, Tallon-Baudry and Bertrand, 1999), in multisensory integration (e.g., Schneider et al., 2008), but also in the auditory (e.g., Lenz et al., 2007), and the speech domain (Hannemann et al., 2007, Shahin et al., 2009). All of these indicate that facilitation of percept formation (“gestalt”) through memory representations (“adaptive resonance,” Grossberg, 2009) is accompanied by comparably focal bursts of enhanced Gamma-band synchrony (for review see e.g., Fries, 2009). This synchrony surfaces as power enhancements (compared to selected baseline periods) in the Gamma-band range when analysing the EEG data for non-phase-locked oscillations.

In order to allow for a precise spectro-temporal control over the speech degradation levels applied, we chose noise-band vocoding (e.g., Shannon et al., 1995) as a manipulation technique. It allows for exact control over the spectral detail being presented (and hence the speech intelligibility) in arbitrary levels (for review see Shannon et al., 2004). It was originally designed to mimic the spectro-temporal characteristics of the input a cochlear implant carrier would receive (Faulkner et al., 2001). Published evidence on the EEG responses to noise-vocoding in normal-hearing listeners has been lacking so far. Unlike fMRI, EEG offers to potentially reveal (and decouple the time courses of) partly differential mechanisms that underlie speech comprehension, such as more “bottom-up,” effortful resource-allocation versus rather “top-down” associative facilitation.

The current experiment utilises a simple 2 × 3 factorial design: first, simple sentences were varied in the strength of the semantic expectancy coupling within that sentence (cloze probability). Second, these sentences were subjected to multiple levels of speech degradation in order to disentangle the (possibly interactive) mechanisms of degraded signal quality (a well-controlled sensory–acoustic parameter) and cloze probability (a well-controlled cognitive–linguistic parameter) on deflections of the auditory event-related potential.

In an early time range (approximately 100 ms after sound onset), effects of the degree of speech degradation are expected on the N100 component which has been shown to index early abstraction and percept-formation stages (Krumbholz et al., 2003, Naatanen, 2001, Obleser et al., 2006), especially in the context of an active comprehension task (for task effects on the N100/N100m see Bonte et al., 2009, Obleser et al., 2004a, Poeppel et al., 1996). Also based on a recent study finding enhanced N100 amplitudes in response to degraded sound (Miettinen et al., 2010), we expect the following: the more thorough the degradation of the signal, the more neural effort is likely be allocated to encoding the acoustic signal and mapping it onto known phonological and lexical categories, leading to an enhanced N100 amplitude measured on the scalp.

For the typical locus of semantic integration effort (N400, 300–500 ms post onset of the critical word in the sentence context), we expect an enhanced N400 to the sentence-final word in low-cloze probability sentences; however, the influence of various levels of speech degradation on the build-up of the sentence context is expected to modulate the N400 (cf. Aydelott et al., 2006).

Lastly, we have good reason to expect enhanced synchronisation in the Gamma band frequency range for facilitated integration of meaning for high-cloze more than for low-cloze sentences, but its dependency on acoustic signal quality and its possible interaction with other signatures of the speech-evoked EEG will be of particular interest.

Section snippets

Participants

Thirty participants (15 females; age range 19–32 years) took part in this experiment. All were native speakers of German, had normal hearing and no history of neurological or language-related problems. They had no prior experience with noise-vocoded speech and had not taken part in any of the pilot experiments or a recent functional MRI study (Obleser and Kotz, 2010) using the same sentence material. Participants received financial compensation of 8 € per hour.

Stimulus material

Stimuli were recordings of spoken

Comprehension rating data

The rating data of all 30 participants was analysed and yielded a clear interaction pattern of factors cloze probability and intelligibility/degradation (F(1.8, 50.8) = 45.9, p < 0.0001, Greenhouse–Geiser corrected). At intermediate degradation (4-band vocoding), high-cloze sentences were rated as more comprehensible than low-cloze analogues (t(29) = −4.06, p < 0.001). There was also a strong overall effect of degradation on perceived comprehensibility (F(1.6,45) = 1262.7, p < 0.0001, Greenhouse–Geiser

Discussion

The human brain is able to resolve the semantic relations within sentences, and it will derive meaning even when the acoustic quality is less than ideal. However, the mechanisms and hierarchies of acoustic processing and semantic processing are unclear. In the current study, we set out to further disentangle the interaction of speech signal processing and semantic expectancy processing, by analysing the time course of the concomitant neural events using EEG. By analysing a whole cascade of

Conclusions

To the best of our knowledge, the current study is the first to take a comprehensive look at human brain electric signatures of speech comprehension, while parametrically varying both speech signal quality (“bottom-up”) as well as semantic constraints (“top-down“). Those two parametric variations allowed us to sample the early N100 response, the late N400 response, the non-phase-locked changes in the late Gamma-band power, as well as participant's post-trial behaviour for separate as well as

Acknowledgments

The Max Planck Society (J.O., S.K.) and the Deutsche Forschungsgemeinschaft (S.K.) supported this research. Thom Gunter kindly provided the pre-tested sentence material that formed the basis of our stimuli, and Stuart Rosen (University College London) provided the original code snippets for noise-vocoding. Mathias Barthel and Beatrice Neumann helped edit the audio material, and Conny Schmidt helped acquire the EEG data. Two anonymous reviewers helped substantially improve this manuscript.

References (73)

  • S. Makeig et al.

    Mining event-related brain dynamics

    Trends Cogn. Sci.

    (2004)
  • E. Maris et al.

    Nonparametric statistical testing of EEG- and MEG-data

    J. Neurosci. Methods

    (2007)
  • J. Obleser et al.

    Pre-lexical abstraction of speech in the auditory cortex

    Trends Cogn. Sci.

    (2009)
  • D. Poeppel et al.

    Task-induced asymmetry of the auditory evoked M100 neuromagnetic field elicited by speech sounds

    Brain Res. Cogn. Brain Res.

    (1996)
  • T. Raettig et al.

    Auditory processing of different types of pseudo-words: an event-related fMRI study

    Neuroimage

    (2008)
  • T.R. Schneider et al.

    Enhanced EEG gamma-band activity reflects multisensory semantic matching in visual-to-auditory object priming

    Neuroimage

    (2008)
  • A.J. Shahin et al.

    Brain oscillations during semantic evaluation of speech

    Brain Cogn.

    (2009)
  • P. Sivonen et al.

    Phonemic restoration in a sentence context: evidence from early and late ERP effects

    Brain Res.

    (2006)
  • C. Tallon-Baudry et al.

    Oscillatory gamma activity in humans and its role in object representation

    Trends Cogn. Sci.

    (1999)
  • N.M. van Atteveldt et al.

    Top-down task effects overrule automatic multisensory responses to letter–sound pairs in auditory association cortex

    Neuroimage

    (2007)
  • C. Van Petten et al.

    Neural localization of semantic context effects in electromagnetic and hemodynamic studies

    Brain Lang.

    (2006)
  • J.D. Warren et al.

    Human brain mechanisms for the early analysis of voices

    Neuroimage

    (2006)
  • J. Aydelott et al.

    Effects of acoustic distortion and semantic context on event-related potentials to spoken words

    Psychophysiology

    (2006)
  • J.R. Binder et al.

    Where is the semantic system? A critical review and meta-analysis of 120 functional neuroimaging studies

    Cereb. Cortex

    (2009)
  • M. Bonte et al.

    Dynamic and task-dependent encoding of speech and voice by phase reorganization of cortical oscillations

    J. Neurosci.

    (2009)
  • M.H. Davis et al.

    Hierarchical processing in spoken language comprehension

    J. Neurosci.

    (2003)
  • M.H. Davis et al.

    Lexical information drives perceptual learning of distorted speech: evidence from the comprehension of noise-vocoded sentences

    J. Exp. Psychol. Gen.

    (2005)
  • M.H. Davis et al.

    Dissociating speech perception and comprehension at reduced levels of awareness

    Proc. Natl Acad. Sci. USA

    (2007)
  • F. Eisner et al.

    The specificity of perceptual learning in speech processing

    Percept. Psychophys.

    (2005)
  • F. Eisner et al.

    Inferior frontal gyrus activation predicts individual differences in perceptual learning of cochlear-implant simulations

    J. Neurosci.

    (2010)
  • A.K. Engel et al.

    Dynamic predictions: oscillations and synchrony in top-down processing

    Nat. Rev. Neurosci.

    (2001)
  • A. Faulkner et al.

    Effects of the number of channels and speech-to-noise ratio on rate of connected discourse tracking through a simulated cochlear implant speech processor

    Ear Hear.

    (2001)
  • K.E. Fishman et al.

    Speech recognition as a function of the number of electrodes used in the SPEAK cochlear implant speech processor

    J. Speech Lang. Hear. Res.

    (1997)
  • P. Fries

    Neuronal gamma-band synchronization as a fundamental process in cortical computation

    Annu. Rev. Neurosci.

    (2009)
  • N.M. Gage et al.

    Temporal integration: reflections in the M100 of the auditory evoked field

    NeuroReport

    (2000)
  • S. Grossberg

    Cortical and subcortical predictive dynamics and learning during perception, cognition, emotion and action

    Philos. Trans. R. Soc. Lond. B Biol. Sci.

    (2009)
  • Cited by (113)

    View all citing articles on Scopus
    View full text