Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Selective cortical representation of attended speaker in multi-talker speech perception

Abstract

Humans possess a remarkable ability to attend to a single speaker’s voice in a multi-talker background1,2,3. How the auditory system manages to extract intelligible speech under such acoustically complex and adverse listening conditions is not known, and, indeed, it is not clear how attended speech is internally represented4,5. Here, using multi-electrode surface recordings from the cortex of subjects engaged in a listening task with two simultaneous speakers, we demonstrate that population responses in non-primary human auditory cortex encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient spectral and temporal features of the attended speaker, as if subjects were listening to that speaker alone. A simple classifier trained solely on examples of single speakers can decode both attended words and speaker identity. We find that task performance is well predicted by a rapid increase in attention-modulated neural selectivity across both single-electrode and population-level cortical responses. These findings demonstrate that the cortical representation of speech does not merely reflect the external acoustic environment, but instead gives rise to the perceptual aspects relevant for the listener’s intended goal.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Acoustic and neural reconstructed spectrograms for speech from a single speaker or a mixture of speakers.
Figure 2: Quantifying the attentional modulation of neural responses.
Figure 3: Decoding spoken words and the identity of the attended speaker.
Figure 4: Attentional modulation of individual electrode sites.

Similar content being viewed by others

References

  1. Cherry, E. C. Some experiments on the recognition of speech, with one and with two ears. J. Acoust. Soc. Am. 25, 975–979 (1953)

    Article  ADS  Google Scholar 

  2. Shinn-Cunningham, B. G. Object-based auditory and visual attention. Trends Cogn. Sci. 12, 182–186 (2008)

    Article  Google Scholar 

  3. Bregman, A. S. Auditory Scene Analysis: The Perceptual Organization of Sound (MIT Press, 1994)

    Google Scholar 

  4. Kerlin, J., Shahin, A. & Miller, L. Attentional gain control of ongoing cortical speech representations in a “cocktail party”. J. Neurosci. 30, 620–628 (2010)

    Article  CAS  Google Scholar 

  5. Besle, J. et al. Tuning of the human neocortex to the temporal dynamics of attended events. J. Neurosci. 31, 3176–3185 (2011)

    Article  CAS  Google Scholar 

  6. Bee, M. & Micheyl, C. The cocktail party problem: what is it? How can it be solved? And why should animal behaviorists study it? J. Comparative Psychol. 122, 235–252 (2008)

    Article  Google Scholar 

  7. Shinn-Cunningham, B. G. & Best, V. Selective attention in normal and impaired hearing. Trends Amplif. 12, 283–299 (2008)

    Article  Google Scholar 

  8. Scott, S. K., Rosen, S., Beaman, C. P., Davis, J. P. & Wise, R. J. S. The neural processing of masked speech: evidence for different mechanisms in the left and right temporal lobes. J. Acoust. Soc. Am. 125, 1737–1743 (2009)

    Article  ADS  Google Scholar 

  9. Elhilali, M., Xiang, J., Shamma, S. A. & Simon, J. Z. Interaction between attention and bottom-up saliency mediates the representation of foreground and background in an auditory scene. PLoS Biol. 7, e1000129 (2009)

    Article  Google Scholar 

  10. Chang, E. F. et al. Categorical speech representation in human superior temporal gyrus. Nature Neurosci. 13, 1428–1432 (2010)

    Article  CAS  Google Scholar 

  11. Crone, N. E., Boatman, D., Gordon, B. & Hao, L. Induced electrocorticographic gamma activity during auditory perception. Clin. Neurophysiol. 112, 565–582 (2001)

    Article  CAS  Google Scholar 

  12. Steinschneider, M., Fishman, Y. I. & Arezzo, J. C. Spectrotemporal analysis of evoked and induced electroencephalographic responses in primary auditory cortex (A1) of the awake monkey. Cereb. Cortex 18, 610–625 (2008)

    Article  Google Scholar 

  13. Scott, S. K. & Johnsrude, I. S. The neuroanatomical and functional organization of speech perception. Trends Neurosci. 26, 100–107 (2003)

    Article  CAS  Google Scholar 

  14. Hackett, T. A. Information flow in the auditory cortical network. Hear. Res. 271, 133–146 (2011)

    Article  Google Scholar 

  15. Bolia, R. S., Nelson, W. T., Ericson, M. A. & Simpson, B. D. A speech corpus for multitalker communications research. J. Acoust. Soc. Am. 107, 1065–1066 (2000)

    Article  CAS  ADS  Google Scholar 

  16. Brungart, D. S. Informational and energetic masking effects in the perception of two simultaneous talkers. J. Acoust. Soc. Am. 109, 1101–1109 (2001)

    Article  CAS  ADS  Google Scholar 

  17. Mesgarani, N., David, S. V., Fritz, J. B. & Shamma, S. A. Influence of context and behavior on stimulus reconstruction from neural activity in primary auditory cortex. J. Neurophysiol. 102, 3329–3339 (2009)

    Article  Google Scholar 

  18. Bialek, W., Rieke, F., de Ruyter van Steveninck, R. R. & Warland, D. Reading a neural code. Science 252, 1854–1857 (1991)

    Article  CAS  ADS  Google Scholar 

  19. Pasley, B. N. et al. Reconstructing speech from human auditory cortex. PLoS Biol. 10, e1001251 (2012)

    Article  CAS  Google Scholar 

  20. Garofolo, J. S. et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus (Linguistic Data Consortium, 1993)

  21. Rifkin, R., Yeo, G. & Poggio, T. Regularized least-squares classification. Nato Science Series Sub Series III Computer and Systems Sciences 190, 131–154 (2003)

    Google Scholar 

  22. Formisano, E., De Martino, F., Bonte, M. & Goebel, R. “Who” is saying “what”? Brain-based decoding of human voice and speech. Science 322, 970–973 (2008)

    Article  CAS  ADS  Google Scholar 

  23. Staeren, N., Renvall, H., De Martino, F., Goebel, R. & Formisano, E. Sound categories are represented as distributed patterns in the human auditory cortex. Curr. Biol. 19, 498–502 (2009)

    Article  CAS  Google Scholar 

  24. Shamma, S. A., Elhilali, M. & Micheyl, C. Temporal coherence and attention in auditory scene analysis. Trends Neurosci. 34, 114–123 (2010)

    Article  Google Scholar 

  25. Darwin, C. J. Auditory grouping. Trends Cogn. Sci. 1, 327–333 (1997)

    Article  CAS  Google Scholar 

  26. Warren, R. M. Perceptual restoration of missing speech sounds. Science 167, 392–393 (1970)

    Article  CAS  ADS  Google Scholar 

  27. Kidd, G., Jr, Arbogast, T. L., Mason, C. R. & Gallun, F. J. The advantage of knowing where to listen. J. Acoust. Soc. Am. 118, 3804–3815 (2005)

    Article  ADS  Google Scholar 

  28. Shen, W., Olive, J. & Jones, D. Two protocols comparing human and machine phonetic discrimination performance in conversational speech. INTERSPEECH 1630–1633. (2008)

  29. Cooke, M., Hershey, J. R. & Rennie, S. J. Monaural speech separation and recognition challenge. Comput. Speech Lang. 24, 1–15 (2010)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank A. Ren for technical help, and C. Micheyl, S. Shamma and C. Schreiner for critical discussion and reading of the manuscript. E.F.C. was funded by National Institutes of Health grants R00-NS065120, DP2-OD00862, R01-DC012379, and the Ester A. and Joseph Klingenstein Foundation.

Author information

Authors and Affiliations

Authors

Contributions

N.M. and E.F.C. designed the experiment, collected the data, evaluated results and wrote the manuscript.

Corresponding author

Correspondence to Edward F. Chang.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Figures

This file contains Supplementary Figures 1-3. (PDF 1436 kb)

PowerPoint slides

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mesgarani, N., Chang, E. Selective cortical representation of attended speaker in multi-talker speech perception. Nature 485, 233–236 (2012). https://doi.org/10.1038/nature11020

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature11020

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing