Abstract
The visual cues involved in auditory speech processing are not restricted to information from lip movements but also include head or chin gestures and facial expressions such as eyebrow movements. The fact that visual gestures precede the auditory signal implicates that visual information may influence the auditory activity. As visual stimuli are very close in time to the auditory information for audiovisual syllables, the cortical response to them usually overlaps with that for the auditory stimulation; the neural dynamics underlying the visual facilitation for continuous speech therefore remain unclear. In this study, we used a three-word phrase to study continuous speech processing. We presented video clips with even (without emphasis) phrases as the frequent stimuli and with one word visually emphasized by the speaker as the non-frequent stimuli. Negativity in the resulting ERPs was detected after the start of the emphasizing articulatory movements but before the auditory stimulus, a finding that was confirmed by the statistical comparisons of the audiovisual and visual stimulation. No such negativity was present in the control visual-only condition. The propagation of this negativity was observed between the visual and fronto-temporal electrodes. Thus, in continuous speech, the visual modality evokes predictive coding for the auditory speech, which is analysed by the cerebral cortex in the context of the phrase even before the arrival of the corresponding auditory signal.
Similar content being viewed by others
References
Arnal LH, Morillon B, Kell CA, Giraud AL (2009) Dual neural routing of visual facilitation in speech processing. J Neurosci 29(43):13445–13453
Arnal LH, Wyart V, Giraud AL (2011) Transitions in neural oscillations reflect prediction errors generated in audiovisual speech. Nat Neurosci 14(6):797–801
Barker J, Berthommier F (1999) Evidence of correlation between acoustic and visual features of speech. In: Ohala JJ, Hasegawa Y, Ohala M, Granville D, Bailey AC (eds) 14th international congress of phonetic sciences, San Francisco, USA, 1999. the congress organizers at the Linguistics Department, University of California, Berkeley, pp 199–202
Barkhuysen P, Krahmer E, Swerts M (2008) The interplay between the auditory and visual modality for end-of-utterance detection. J Acoust Soc Am 123(1):354–365
Barone P, Deguine O (2011) Multisensory processing in cochlear implant listeners. In: Zeng FG, Fay R, Popper A (eds) Springer handbook of auditory research. auditory prostheses: cochlear implants and beyond. Springer, New York, pp 365–382
Besle J, Fort A, Delpuech C, Giard MH (2004) Bimodal speech: early suppressive visual effects in human auditory cortex. Eur J Neurosci 20(8):2225–2234
Besle J, Fort A, Giard MH (2005) Is the auditory sensory memory sensitive to visual information? Exp Brain Res 166(3–4):337–344
Besle J, Fischer C, Bidet-Caulet A, Lecaignard F, Bertrand O, Giard MH (2008) Visual activation and audiovisual interactions in the auditory cortex during speech perception: intracranial recordings in humans. J Neurosci 28(52):14301–14310
Besle J, Bertrand O, Giard MH (2009) Electrophysiological (EEG, sEEG, MEG) evidence for multiple audiovisual interactions in the human auditory cortex. Hear Res 258(1–2):143–151
Calvert GA, Bullmore ET, Brammer MJ, Campbell R, Williams SC, McGuire PK, Woodruff PW, Iversen SD, David AS (1997) Activation of auditory cortex during silent lipreading. Science 276(5312):593–596
Campanella S, Gaspard C, Debatisse D, Bruyer R, Crommelinck M, Guerit JM (2002) Discrimination of emotional facial expressions in a visual oddball task: an ERP study. Biol Psychol 59(3):171–186
Campbell R (2008) The processing of audio-visual speech: empirical and neural bases. Philos Trans R Soc Lond B Biol Sci 363(1493):1001–1010
Cappe C, Thut G, Romei V, Murray MM (2010) Auditory-visual multisensory interactions in humans: timing, topography, directionality, and sources. J Neurosci 30(38):12572–12580
Carpenter J, Bithell J (2000) Bootstrap confidence intervals: when, which, what? a practical guide for medical statisticians. Stat Med 19(9):1141–1164
Cavé C, Guaïtella I, Bertrand R, Santi S, Harlay F (1996) Espesser R about the relationship between eyebrow movements and Fo variations. ICSLP, Philadelphia, pp 2175–2178
Chandrasekaran B, Krishnan A, Gandour JT (2007) Mismatch negativity to pitch contours is influenced by language experience. Brain Res 1128(1):148–156. doi:10.1016/j.brainres.2006.10.064
Chandrasekaran C, Trubanova A, Stillittano S, Caplier A, Ghazanfar AA (2009) The natural statistics of audiovisual speech. PLoS Comput Biol 5(7):e1000436
Chatterjee M, Peng SC (2008) Processing F0 with cochlear implants: modulation frequency discrimination and speech intonation recognition. Hear Res 235(1–2):143–156
Cohen J (1992) A power primer. psychol Bull 112(1):155–159
Colin C, Radeau M, Soquet A, Demolin D, Colin F, Deltenre P (2002) Mismatch negativity evoked by the McGurk–MacDonald effect: a phonetic representation within short-term memory. Clin Neurophysiol 113(4):495–506
Colombo L, Deguchi C, Boureux M, Sarlo M, Besson M (2011) Detection of pitch violations depends upon the familiarity of intonational contour of sentences. Cortex 47(5):557–568
Davis C, Kislyuk D, Kim J, Sams M (2008) The effect of viewing speech on auditory speech processing is different in the left and right hemispheres. Brain Res 1242:151–161
de Gelder B, Bocker KB, Tuomainen J, Hensen M, Vroomen J (1999) The combined perception of emotion from voice and face: early interaction revealed by human electric brain responses. Neurosci Lett 260(2):133–136
Donnelly PJ, Guo BZ, Limb CJ (2009) Perceptual fusion of polyphonic pitch in cochlear implant users. J Acoust Soc Am 126(5):EL128–EL133
Foxton JM, Riviere LD, Barone P (2010) Cross-modal facilitation in speech prosody. Cognition 115(1):71–78
Friston K, Kiebel S (2009) Predictive coding under the free-energy principle. Philos Trans R Soc Lond B Biol Sci 364(1521):1211–1221
Friston K, Harrison L, Daunizeau J, Kiebel S, Phillips C, Trujillo-Barreto N, Henson R, Flandin G, Mattout J (2008) Multiple sparse priors for the M/EEG inverse problem. Neuroimage 39(3):1104–1120. doi:10.1016/j.neuroimage.2007.09.048
Ghazanfar AA, Chandrasekaran C, Logothetis NK (2008) Interactions between the superior temporal sulcus and auditory cortex mediate dynamic face/voice integration in rhesus monkeys. J Neurosci 28(17):4457–4469
Guaitella I, Santi S, Lagrue B, Cave C (2009) Are eyebrow movements linked to voice variations and turn-taking in dialogue? an experimental investigation. Lang Speech 52(Pt 2–3):207–222
Hadar U, Steiner TJ, Rose FC (1984) Involvement of head movement in speech production and its implications for language pathology. Adv Neurol 42:247–261
Halgren E, Baudena P, Clarke JM, Heit G, Marinkovic K, Devaux B, Vignal JP, Biraben A (1995) Intracerebral potentials to rare target and distractor auditory and visual stimuli. II. medial, lateral and posterior temporal lobe. Electroencephalogr Clin Neurophysiol 94(4):229–250
Hertrich I, Mathiak K, Lutzenberger W, Menning H, Ackermann H (2007) Sequential audiovisual interactions during speech perception: a whole-head MEG study. Neuropsychologia 45(6):1342–1354
Jiang J, Alwan A, Keating PA, Auer ET, Bernstein LE (2002) On the relationship between face movements, tongue movements and speech acoustics. EURASIP J Appl Signal Process 11:1174–1188
Kang E, Lee DS, Kang H, Hwang CH, Oh SH, Kim CS, Chung JK, Lee MC (2006) The neural correlates of cross-modal interaction in speech perception during a semantic decision task on sentences: a PET study. Neuroimage 32(1):423–431. doi:10.1016/j.neuroimage.2006.03.016
Kilian-Hutten N, Vroomen J, Formisano E (2011) Brain activation during audiovisual exposure anticipates future perception of ambiguous speech. Neuroimage 57(4):1601–1607. doi:10.1016/j.neuroimage.2011.05.043
Kimura M (2012) Visual mismatch negativity and unintentional temporal-context-based prediction in vision. Int J Psychophysiol 83(2):144–155. doi:10.1016/j.ijpsycho.2011.11.010
Kislyuk DS, Mottonen R, Sams M (2008) Visual processing affects the neural basis of auditory discrimination. J Cogn Neurosci 20(12):2175–2184
Lakatos P, Chen CM, O’Connell MN, Mills A, Schroeder CE (2007) Neuronal oscillations and multisensory interaction in primary auditory cortex. Neuron 53(2):279–292
Li X, Yang Y, Ren G (2009) Immediate integration of prosodic information from speech and visual information from pictures in the absence of focused attention: a mismatch negativity study. Neuroscience 161(1):59–66
Maris E, Oostenveld R (2007) Nonparametric statistical testing of EEG- and MEG-data. J Neurosci Methods 164(1):177–190
Marx M, James C, Foxton J, Capber A, Fraysse B, Barone P, Deguine O (2013) Prosodic cues in cochlear implant users. In: 20th IFOS world congress, Seoul, June 2013
McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Nature 264(5588):746–748
Mottonen R, Krause CM, Tiippana K, Sams M (2002) Processing of changes in visual speech in the human auditory cortex. Brain Res Cogn Brain Res 13(3):417–425
Munhall KG, Jones JA, Callan DE, Kuratate T, Vatikiotis-Bateson E (2004) Visual prosody and speech intelligibility: head movement improves auditory speech perception. Psychol Sci 15(2):133–137
Naatanen R, Paavilainen P, Rinne T, Alho K (2007) The mismatch negativity (MMN) in basic research of central auditory processing: a review. Clin Neurophysiol 118(12):2544–2590
Ponton CW, Bernstein LE, Auer ET Jr (2009) Mismatch negativity with visual-only and audiovisual speech. Brain Topogr 21(3–4):207–215
Proverbio AM, Riva F (2009) RP and N400 ERP components reflect semantic violations in visual processing of human actions. Neurosci Lett 459(3):142–146
Quian Quiroga R, Garcia H (2003) Single-trial event-related potentials with wavelet denoising. Clin Neurophysiol 114(2):376–390
Reale RA, Calvert GA, Thesen T, Jenison RL, Kawasaki H, Oya H, Howard MA, Brugge JF (2007) Auditory-visual processing represented in the human superior temporal gyrus. Neuroscience 145(1):162–184
Ross ED, Monnot M (2008) Neurology of affective prosody and its functional-anatomic organization in right hemisphere. Brain Lang 104(1):51–74. doi:10.1016/j.bandl.2007.04.007
Ross LA, Saint-Amour D, Leavitt VM, Javitt DC, Foxe JJ (2007) Do you see what I am saying? exploring visual enhancement of speech comprehension in noisy environments. Cereb Cortex 17(5):1147–1153
Saint-Amour D, De Sanctis P, Molholm S, Ritter W, Foxe JJ (2007) Seeing voices: high-density electrical mapping and source-analysis of the multisensory mismatch negativity evoked during the McGurk illusion. Neuropsychologia 45(3):587–597
Sams M, Alho K, Naatanen R (1984) Short-term habituation and dishabituation of the mismatch negativity of the ERP. Psychophysiology 21(4):434–441
Sams M, Aulanko R, Hamalainen M, Hari R, Lounasmaa OV, Lu ST, Simola J (1991) Seeing speech: visual information from lip movements modifies activity in the human auditory cortex. Neurosci Lett 127(1):141–145
Scarborough R, Keating P, Mattys SL, Cho T, Alwan A (2009) Optical phonetics and visual perception of lexical and phrasal stress in English. Lang Speech 52(Pt 2–3):135–175
Schroeder CE, Foxe J (2005) Multisensory contributions to low-level, ‘unisensory’ processing. Curr Opin Neurobiol 15(4):454–458
Schwartz JL, Savariaux C (2013) Data and simulations about audiovisual asynchrony and predictability in speech perception. the 12th international conference on auditory-visual speech processing, Annecy, France, 2013
Schwartz JL, Berthommier F, Savariaux C (2004) Seeing to hear better: evidence for early audio-visual interactions in speech identification. Cognition 93(2):B69–B78
Stein BE, London N, Wilkinson LK, Price DD (1996) Enhancement of perceived visual intensity by auditory stimuli: a psychophysical analysis. J Cogn Neurosci 8(6):497–506. doi:10.1162/jocn.1996.8.6.497
Stekelenburg JJ, Vroomen J (2007) Neural correlates of multisensory integration of ecologically valid audiovisual events. J Cogn Neurosci 19(12):1964–1973
Stekelenburg JJ, Vroomen J (2009) Neural correlates of audiovisual motion capture. Exp Brain Res 198(2–3):383–390
Strelnikov K (2007) Can mismatch negativity be linked to synaptic processes? a glutamatergic approach to deviance detection. Brain Cogn 65(3):244–251
Strelnikov K (2008) Activation-verification in continuous speech processing. interaction of cognitive strategies as a possible theoretical approach. J Neurolinguist 21:1–17
Strelnikov K (2010) Neuroimaging and neuroenergetics: brain activations as information-driven reorganization of energy flows. Brain Cogn 72(3):449–456
Sumby WH, Pollack I (1954) Visual contribution to speech intelligibility in noise. J Acoust Soc Am 26(2):212–215
Tales A, Newton P, Troscianko T, Butler S (1999) Mismatch negativity in the visual modality. Neuroreport 10(16):3363–3367
Ullsperger P, Erdmann U, Freude G, Dehoff W (2006) When sound and picture do not fit: mismatch negativity and sensory interaction. Int J Psychophysiol 59(1):3–7
van Wassenhove V, Grant KW, Poeppel D (2005) Visual speech speeds up the neural processing of auditory speech. Proc Natl Acad Sci USA 102(4):1181–1186
Vatikiotis-Bateson E, Yehia HC (2002) Speaking mode variability in multimodal speech production. IEEE Trans Neural Networks 13(4):894–899
Acknowledgments
We thank E. Barbeau for help in the first pilot study and C. Marlot for the bibliography. This study was supported by the Human Frontier Science Program (to JMF), the DRCI Toulouse (Direction de la Recherche Clinique et de l’Innovation to KS and MM), the ANR (ANR Plasmody ANR-11-BSHS2-0008 (to BP), and the recurrent funding of the CNRS (to BP).
Author information
Authors and Affiliations
Corresponding author
Additional information
Kuzma Strelnikov and Jessica Foxton are joint first authors
This is one of several papers published together in Brain Topography on the ‘‘Special Issue: Auditory Cortex 2012”
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Strelnikov, K., Foxton, J., Marx, M. et al. Brain Prediction of Auditory Emphasis by Facial Expressions During Audiovisual Continuous Speech. Brain Topogr 28, 494–505 (2015). https://doi.org/10.1007/s10548-013-0338-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10548-013-0338-2