Multiple brain signatures of integration in the comprehension of degraded speech
Research Highlights
► Semantic context affects the unfolding comprehension of acoustically degraded speech. ► Evoked N400 and induced Gamma responses reflect partly complementary processes. ► Early N100 amplitude and late Gamma-band correlate negatively. ► EEG suggests distinct neural strategies of effortful versus facilitated comprehension.
Introduction
The influence of semantic context in listening to, analysing and comprehending speech signals is well known. However, the mechanisms behind context-aided speech comprehension have remained surprisingly opaque. This holds especially true for speech comprehension in noisy environments where semantic context is thought to be highly relevant (e.g. Miller et al., 1951, Stickney and Assmann, 2001).
Increasing efforts are made to study the functional neuroanatomy behind this complex sensory–perceptive–cognitive process of speech comprehension in a more integrative approach. Various recent psycholinguistic (Shinn-Cunningham and Wang, 2008, van Linden and Vroomen, 2007) and neuroscientific (Davis et al., 2007, Davis and Johnsrude, 2007, Hannemann et al., 2007, Obleser et al., 2007, Sivonen et al., 2006, van Atteveldt et al., 2007) studies of speech-receptive function were devoted to those factors in speech communication that lie beyond the bottom-up mechanisms of the auditory pathways in analysing the signal itself. Besides semantic meaning in the narrow sense, a whole family of influences facilitate and disambiguate speech comprehension, especially under acoustically compromised conditions, amongst them talker familiarity (e.g., Eisner and McQueen, 2005); prior experience and knowledge (e.g., Hannemann et al., 2007); and prosodic and emotional cues (e.g., Gandour et al., 2004). The main focus of this investigation is on the potential relationship of early, sensory-driven processes, as reflected in early EEG components (N100) and later cognitive–linguistic integration processes, as reflected in the sentence-final N400 and the Gamma-band response.
In the present study, we aimed at narrowing down the possible top-down influences and constrained the sentence context to a minimal and constant one. We employed the well-studied measure of cloze probability. Originally developed as a measure of text readability (Taylor, 1953), it is an empirical measure for lexical expectancy of a given word in a context. Variations of it have been adopted into important studies of semantic processing, mostly in N400 electroencephalography (EEG) designs (Connolly et al., 1995, Federmeier et al., 2007, Gunter et al., 2000, Kutas and Hillyard, 1984). A stronger N400 in response to the sentence-final key word is elicited by a low-cloze sentence such as “she weighs the flour” compared to an (otherwise identical) high-cloze version such as “she sifts the flour”. This N400 effect is commonly interpreted to reflect higher cognitive effort to integrate the low-cloze (higher lexical competition) elements of the sentence. A previous study by Aydelott and colleagues had utilised degradation of the sentence context (but not the target word) by 1-kHz-low-pass filtering, while presenting the sentence word by word with 200-ms intervals (Aydelott et al., 2006). The study delivered important initial confirmation that acoustic degradation of the context influences the magnitude of the integration-related N400: No proper N400-like response to incongruent target words built up when the preceding sentence material had been strongly low-pass filtered. However, the current study goes a step further by using a parametric (i.e., more than twofold) variation of the signal degradation; by utilizing naturally spoken sentences; and by taking a comprehensive look at various EEG signatures (i.e., the evoked or phase-locked N100 and N400 potentials, and the induced or non-phase-locked Gamma-band response, see next paragraph).
Especially at the point of sentence meaning integration (i.e., the sentence-final keyword), we expect to observe distinct influences of semantic expectancy constraints on degraded speech in the stimulus-correlated EEG: The N400, as described above, is unanimously interpreted as a signature of integration effort; we therefore expect an enhanced N400 at low-expectancy words. As outlined below, we also expect facilitation of integration (an enhancement for high-expectancy words) being reflected in a modulation of Gamma-band (γ, > 30 Hz) oscillations.
Apart of the N400 literature, a whole line of evidence points to an increased effect (i.e., higher amplitudes in response to highly predictable sentence endings) for Gamma-band (γ, > 30 Hz) oscillations. Many studies on the top-down formation of percepts have focussed on this most prominent electrophysiological signature in the visual domain (e.g., Gruber et al., 2002, Tallon-Baudry and Bertrand, 1999), in multisensory integration (e.g., Schneider et al., 2008), but also in the auditory (e.g., Lenz et al., 2007), and the speech domain (Hannemann et al., 2007, Shahin et al., 2009). All of these indicate that facilitation of percept formation (“gestalt”) through memory representations (“adaptive resonance,” Grossberg, 2009) is accompanied by comparably focal bursts of enhanced Gamma-band synchrony (for review see e.g., Fries, 2009). This synchrony surfaces as power enhancements (compared to selected baseline periods) in the Gamma-band range when analysing the EEG data for non-phase-locked oscillations.
In order to allow for a precise spectro-temporal control over the speech degradation levels applied, we chose noise-band vocoding (e.g., Shannon et al., 1995) as a manipulation technique. It allows for exact control over the spectral detail being presented (and hence the speech intelligibility) in arbitrary levels (for review see Shannon et al., 2004). It was originally designed to mimic the spectro-temporal characteristics of the input a cochlear implant carrier would receive (Faulkner et al., 2001). Published evidence on the EEG responses to noise-vocoding in normal-hearing listeners has been lacking so far. Unlike fMRI, EEG offers to potentially reveal (and decouple the time courses of) partly differential mechanisms that underlie speech comprehension, such as more “bottom-up,” effortful resource-allocation versus rather “top-down” associative facilitation.
The current experiment utilises a simple 2 × 3 factorial design: first, simple sentences were varied in the strength of the semantic expectancy coupling within that sentence (cloze probability). Second, these sentences were subjected to multiple levels of speech degradation in order to disentangle the (possibly interactive) mechanisms of degraded signal quality (a well-controlled sensory–acoustic parameter) and cloze probability (a well-controlled cognitive–linguistic parameter) on deflections of the auditory event-related potential.
In an early time range (approximately 100 ms after sound onset), effects of the degree of speech degradation are expected on the N100 component which has been shown to index early abstraction and percept-formation stages (Krumbholz et al., 2003, Naatanen, 2001, Obleser et al., 2006), especially in the context of an active comprehension task (for task effects on the N100/N100m see Bonte et al., 2009, Obleser et al., 2004a, Poeppel et al., 1996). Also based on a recent study finding enhanced N100 amplitudes in response to degraded sound (Miettinen et al., 2010), we expect the following: the more thorough the degradation of the signal, the more neural effort is likely be allocated to encoding the acoustic signal and mapping it onto known phonological and lexical categories, leading to an enhanced N100 amplitude measured on the scalp.
For the typical locus of semantic integration effort (N400, 300–500 ms post onset of the critical word in the sentence context), we expect an enhanced N400 to the sentence-final word in low-cloze probability sentences; however, the influence of various levels of speech degradation on the build-up of the sentence context is expected to modulate the N400 (cf. Aydelott et al., 2006).
Lastly, we have good reason to expect enhanced synchronisation in the Gamma band frequency range for facilitated integration of meaning for high-cloze more than for low-cloze sentences, but its dependency on acoustic signal quality and its possible interaction with other signatures of the speech-evoked EEG will be of particular interest.
Section snippets
Participants
Thirty participants (15 females; age range 19–32 years) took part in this experiment. All were native speakers of German, had normal hearing and no history of neurological or language-related problems. They had no prior experience with noise-vocoded speech and had not taken part in any of the pilot experiments or a recent functional MRI study (Obleser and Kotz, 2010) using the same sentence material. Participants received financial compensation of 8 € per hour.
Stimulus material
Stimuli were recordings of spoken
Comprehension rating data
The rating data of all 30 participants was analysed and yielded a clear interaction pattern of factors cloze probability and intelligibility/degradation (F(1.8, 50.8) = 45.9, p < 0.0001, Greenhouse–Geiser corrected). At intermediate degradation (4-band vocoding), high-cloze sentences were rated as more comprehensible than low-cloze analogues (t(29) = −4.06, p < 0.001). There was also a strong overall effect of degradation on perceived comprehensibility (F(1.6,45) = 1262.7, p < 0.0001, Greenhouse–Geiser
Discussion
The human brain is able to resolve the semantic relations within sentences, and it will derive meaning even when the acoustic quality is less than ideal. However, the mechanisms and hierarchies of acoustic processing and semantic processing are unclear. In the current study, we set out to further disentangle the interaction of speech signal processing and semantic expectancy processing, by analysing the time course of the concomitant neural events using EEG. By analysing a whole cascade of
Conclusions
To the best of our knowledge, the current study is the first to take a comprehensive look at human brain electric signatures of speech comprehension, while parametrically varying both speech signal quality (“bottom-up”) as well as semantic constraints (“top-down“). Those two parametric variations allowed us to sample the early N100 response, the late N400 response, the non-phase-locked changes in the late Gamma-band power, as well as participant's post-trial behaviour for separate as well as
Acknowledgments
The Max Planck Society (J.O., S.K.) and the Deutsche Forschungsgemeinschaft (S.K.) supported this research. Thom Gunter kindly provided the pre-tested sentence material that formed the basis of our stimuli, and Stuart Rosen (University College London) provided the original code snippets for noise-vocoding. Mathias Barthel and Beatrice Neumann helped edit the audio material, and Conny Schmidt helped acquire the EEG data. Two anonymous reviewers helped substantially improve this manuscript.
References (73)
- et al.
Oscillatory neuronal dynamics during language comprehension
Prog. Brain Res.
(2006) - et al.
The effects of phonological and semantic features of sentence-ending words on visual event-related brain potentials
Electroencephalogr. Clin. Neurophysiol.
(1995) - et al.
Hearing speech sounds: top-down influences on the interface between audition and speech perception
Hear. Res.
(2007) - et al.
Multiple effects of sentential constraint on word processing
Brain Res.
(2007) - et al.
Hemispheric roles in the perception of speech prosody
Neuroimage
(2004) - et al.
The planum temporale as a computational hub
Trends Neurosci.
(2002) - et al.
Top-down knowledge supports the retrieval of lexical information from degraded speech
Brain Res.
(2007) - et al.
Electrophysiology reveals semantic memory use in language comprehension
Trends Cogn. Sci.
(2000) - et al.
What's that sound? Matches with auditory long-term memory induce gamma activity in human EEG
Int. J. Psychophysiol.
(2007) - et al.
Localizing the distributed language network responsible for the N400 measured by MEG during auditory sentence processing
Brain Res.
(2006)
Mining event-related brain dynamics
Trends Cogn. Sci.
Nonparametric statistical testing of EEG- and MEG-data
J. Neurosci. Methods
Pre-lexical abstraction of speech in the auditory cortex
Trends Cogn. Sci.
Task-induced asymmetry of the auditory evoked M100 neuromagnetic field elicited by speech sounds
Brain Res. Cogn. Brain Res.
Auditory processing of different types of pseudo-words: an event-related fMRI study
Neuroimage
Enhanced EEG gamma-band activity reflects multisensory semantic matching in visual-to-auditory object priming
Neuroimage
Brain oscillations during semantic evaluation of speech
Brain Cogn.
Phonemic restoration in a sentence context: evidence from early and late ERP effects
Brain Res.
Oscillatory gamma activity in humans and its role in object representation
Trends Cogn. Sci.
Top-down task effects overrule automatic multisensory responses to letter–sound pairs in auditory association cortex
Neuroimage
Neural localization of semantic context effects in electromagnetic and hemodynamic studies
Brain Lang.
Human brain mechanisms for the early analysis of voices
Neuroimage
Effects of acoustic distortion and semantic context on event-related potentials to spoken words
Psychophysiology
Where is the semantic system? A critical review and meta-analysis of 120 functional neuroimaging studies
Cereb. Cortex
Dynamic and task-dependent encoding of speech and voice by phase reorganization of cortical oscillations
J. Neurosci.
Hierarchical processing in spoken language comprehension
J. Neurosci.
Lexical information drives perceptual learning of distorted speech: evidence from the comprehension of noise-vocoded sentences
J. Exp. Psychol. Gen.
Dissociating speech perception and comprehension at reduced levels of awareness
Proc. Natl Acad. Sci. USA
The specificity of perceptual learning in speech processing
Percept. Psychophys.
Inferior frontal gyrus activation predicts individual differences in perceptual learning of cochlear-implant simulations
J. Neurosci.
Dynamic predictions: oscillations and synchrony in top-down processing
Nat. Rev. Neurosci.
Effects of the number of channels and speech-to-noise ratio on rate of connected discourse tracking through a simulated cochlear implant speech processor
Ear Hear.
Speech recognition as a function of the number of electrodes used in the SPEAK cochlear implant speech processor
J. Speech Lang. Hear. Res.
Neuronal gamma-band synchronization as a fundamental process in cortical computation
Annu. Rev. Neurosci.
Temporal integration: reflections in the M100 of the auditory evoked field
NeuroReport
Cortical and subcortical predictive dynamics and learning during perception, cognition, emotion and action
Philos. Trans. R. Soc. Lond. B Biol. Sci.
Cited by (113)
Effects of noise and noise reduction on audiovisual speech perception in cochlear implant users: An ERP study
2023, Clinical Neurophysiology