Skip to main content

The visual element in phonological perception and learning

  • Chapter
Phonology in Context

Part of the book series: Palgrave Advances in Linguistics ((PADLL))

Abstract

Research in the field of phonology has long been dominated by a focus on only one source or modality of input — auditory (i.e., what we hear). However, in face-to-face communication, a significant source of information about the sounds a speaker is producing comes from visual cues such as the lip movements associated with these sounds. Studies on the contribution of these cues to the understanding of individual speech sounds by native listeners including the hearing impaired date back several decades. Only recently has this source of input been explored for its value to second-language (L2) learners.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Benguerel, A.-R, & Pichora-Fuller, M. K. (1982). Coarticulation effects in lipreading. Journal of Speech and Hearing Research, 25, 600–7.

    Google Scholar 

  • Berger, K. W. (1972). Speechreading: Principles and methods. Baltimore: National Education Press.

    Google Scholar 

  • Bradlow, A. R., Pisoni, D. B., Yamada, R. A., & Tohkura, Y. (1997). Training Japanese listeners to identify English/r/and/l/: IV. Some effects of perceptual learning on speech production. Journal of the Acoustical Society of America, 101, 2299–310.

    Google Scholar 

  • Bradlow, A. R., Torretta, G. M., & Pisoni, D. B. (1996). Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics. Speech Communication, 20; 255–72.

    Article  Google Scholar 

  • Brown, J. W. (1990). Overview. In A. B. Scheibel, & A. F. Wechsler (Eds.), Neurobiology of higher cognitive function (pp. 357–65). New York: The Guilford Press.

    Google Scholar 

  • Burnham, D. (1998). Harry McGurk and the McGurk effect. In D. Burnham, J. Robert-Ribes, & E. Vatikiotis-Bateson (Eds.), Proceedings of Auditory-Visual Speech Processing’98 (pp. 1–2). Sydney, Australia: Causal Productions PTY, Ltd.

    Google Scholar 

  • Callan, D., Callan, A., & Vatikiotis-Bateson, E. (2001). Neural areas underlying the processing of visual speech information under conditions of degraded auditory information. In D. W. Massaro, J. Light, & K. Geraci (Eds.), Proceedings of Auditory-Visual Speech Processing 2001 (pp. 45–9). Sydney, Australia: Causal Productions PTY, Ltd.

    Google Scholar 

  • Campbell, R. (1987). Lip-reading and immediate memory processes or on thinking impure thoughts. In B. Dodd, & R. Campbell (Eds.), Hearing by eye: The psychology of lipreading (pp. 243–55). London: Erlbaum.

    Google Scholar 

  • Chun, D. M., Hardison, D. M., & Pennington, M. C. (2004, May). Technologies for prosody in context: Past and future of L2 research and practice. Paper presented in the Colloquium on the State-of-the-Art in L2 Phonology Research at the annual conference of American Association for Applied Linguistics. Portland, Oregon.

    Google Scholar 

  • Cosi, P., Cohen, M. A., & Massaro, D. W. (2002). Baldini: Baldi speaks Italian! In J. H. L. Hansen, & B. Pellom (Eds.), International Conference on Spoken Language Processing 2002 (pp. 2349–52). Sydney, Australia: Causal Productions PTY, Ltd.

    Google Scholar 

  • Daniloff, R. G., & Moll, K. (1968). Coarticulation of lip rounding. Journal of Speech and Hearing Research, 11, 707–21.

    Google Scholar 

  • Demorest, M. E., Bernstein, L. E., & DeHaven, G. P. (1996). Generalizability of speechreading performance on nonsense syllables, words, and sentences: Subjects with normal hearing. Journal of Speech and Hearing Research, 39, 697–713.

    Google Scholar 

  • de Sa, V., & Ballard, D. H. (1997). Perceptual learning from cross-modal feedback. In R. L. Goldstone, P. G. Schyns, & D. L. Medin (Eds.), The Psychology of Learning and Motivation (Vol. 36, pp. 309–51). San Diego, CA: Academic Press.

    Google Scholar 

  • Dodd, B. (1979). Lip-reading in infants: Attention to speech presented in-and out-of-synchrony. Cognitive Psychology, 11, 478–84.

    Google Scholar 

  • Dodd, B. (1987). The acquisition of lip-reading skills by normally hearing children. In B. Dodd, & R. Campbell (Eds.), Hearing by eye: The psychology of lip-reading (pp. 163–75). London: Erlbaum.

    Google Scholar 

  • Goto, H. (1971). Auditory perception by normal Japanese adults of the sounds “1” and “r.” Neuropsychologia, 9, 317–23.

    Google Scholar 

  • Grosjean, F. (1980). Spoken word recognition processes and the gating paradigm. Perception & Psychophysics, 28, 267–83.

    Google Scholar 

  • Hardison, D. M. (1999). Bimodal speech perception by native and nonnative speakers of English: Factors influencing the McGurk effect. Language Learning, 49, 213–83.

    Google Scholar 

  • Hardison, D. M. (2000). The neurocognitive foundation of second-language speech: A proposed scenario of bimodal development. In B. Swierzbin, F. Morris, M. E. Anderson, C. A. Klee, & E. Tarone (Eds.), Social and cognitive factors in second language acquisition (pp. 312–25). Somerville, MA: Cascadilla Press.

    Google Scholar 

  • Hardison, D. M. (2003). Acquisition of second-language speech: Effects of visual cues, context, and talker variability. Applied Psycholinguistics, 24, 495–522.

    Google Scholar 

  • Hardison, D. M. (2004). Generalization of computer-assisted prosody training: Quantitative and qualitative findings. Language Learning & Technology, 8, 34–52. Available at http://llt.msu.edu/vol8numl/hardison.

    Google Scholar 

  • Hardison, D. M. (2005a). Contextualized computer-based L2 prosody training: Evaluating the effects of discourse context and video input. CALICO Journal, 22, 175–90.

    Google Scholar 

  • Hardison, D. M. (2005b) Second-language spoken word identification: Effects of perceptual training, visual cues, and phonetic environment. Applied Psycholinguistics, 26, 579–96.

    Google Scholar 

  • Hardison, D. M. (2005c). Variability in bimodal spoken language processing by native and nonnative speakers of English: A closer look at effects of speech style. Speech Communication, 46, 73–93.

    Google Scholar 

  • Hardison, D. M. (2006). Effects of familiarity with faces and voices on L2 spoken language processing: Components of memory traces. Paper presented at Interspeech 2006 — International Conference on Spoken Language Processing, Pittsburgh, PA.

    Google Scholar 

  • Hazan, V., Sennema, A., & Faulkner, A. (2002). Audiovisual perception in L2 learners. In J. H. L. Hansen, & B. Pellom (Eds.), International Conference on Spoken Language Processing 2002 (pp. 1685–8). Sydney, Australia: Causal Productions PTY, Ltd.

    Google Scholar 

  • Hintzman, D. L. (1986). “Schema abstraction” in a multiple-trace memory model. Psychological Review, 93, 411–28.

    Google Scholar 

  • Homa, D., & Cultice, J. (1984). Role of feedback, category size, and stimulus distortion on the acquisition and utilization of ill-defined categories. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 83–94.

    Google Scholar 

  • Johnson, K., & Mullennix, J. W. (Eds.) (1997). Talker variability in speech processing. San Diego, CA: Academic Press.

    Google Scholar 

  • Kipp, M. (2001). Anvil — A generic annotation tool for multimodal dialogue. In Proceedings of the 7th European Conference on Speech Communication and Technology (pp. 1367–70). Aalborg, Denmark: Eurospeech.

    Google Scholar 

  • Kirk, K. I., Pisoni, D. B., & Lachs, L. (2002). Audiovisual integration of speech by children and adults with cochlear implants. In J. H. L. Hansen, & B. Pellom (Eds.), International Conference on Spoken Language Processing 2002 (pp. 1689–92). Sydney, Australia: Causal Productions PTY, Ltd.

    Google Scholar 

  • Kricos, P. B., & Lesner, S. A. (1982). Differences in visual intelligibility across talkers. Volta Review, 84, 219–25.

    Google Scholar 

  • Kuhl, P. K., & Meltzoff, A. N. (1984). The intermodal representation of speech in infants. Infant Behavior & Development, 7, 361–81.

    Google Scholar 

  • Lansing, C. R., & McConkie, G. W. (1999). Attention to facial regions in segmental and prosodic visual speech perception tasks. Journal of Speech, Language, and Hearing Research, 24, 526–39.

    Google Scholar 

  • Legerstee, M. (1990). Infants use multimodal information to imitate speech sounds. Infant Behavior & Development, 1, 343–54.

    Google Scholar 

  • Lewkowicz, D. J. (2002). Perception and integration of audiovisual speech in human infants. In J. H. L. Hansen, & B. Pellom (Eds.), International Conference on Spoken Language Processing 2002 (pp. 1701–4). Sydney, Australia: Causal Productions PTY, Ltd.

    Google Scholar 

  • Lively, S. E., Logan, J. S., & Pisoni, D. B. (1993). Training Japanese listeners to identify English/r/and/l/. II: The role of phonetic environment and talker variability in learning new perceptual categories. Journal of the Acoustical Society of America, 94, 1242–55.

    Google Scholar 

  • MacDonald, J., & McGurk, H. (1978). Visual influences on speech perception processes. Perception & Psychophysics, 24, 253–7.

    Google Scholar 

  • Marslen-Wilson, W., & Welsh, A. (1978). Processing interactions and lexical access during word recognition in continuous speech. Cognitive Psychology, 10, 29–63.

    Google Scholar 

  • Massaro, D. W. (1987). Speech perception by ear and eye: A paradigm for psychological inquiry. Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Massaro, D. W. (1998). Perceiving talking faces: From speech perception to a behavioral principle. Cambridge: MIT Press.

    Google Scholar 

  • Massaro, D. W., Cohen, M. M., Beskow, J., & Cole, R. A. (2000). Developing and evaluating conversational agents. In J. Cassell, J. Sullivan, S. Prevost, & E. Churchill (Eds.), Embodied conversational agents (pp. 287–318). Cambridge: MIT Press.

    Google Scholar 

  • Massaro, D. W., Cohen, M. M. & Gesi, A. T. (1993). Long-term training, transfer, and retention in learning to lipread. Perception & Psychophysics, 53, 549–62.

    Article  Google Scholar 

  • Massaro, D. W., Cohen, M. M., Gesi, A., Heredia, R., & Tsuzaki, M. (1993). Bimodal speech perception: An examination across languages. Journal of Phonetics, 21, 445–78.

    Google Scholar 

  • McGurk, H. (1998). Developmental psychology and the vision of speech: Inaugural lecture by Professor Harry McGurk, 2nd March 1988. In D. Burnham, J. Robert- Ribes, & E. Vatikiotis-Bateson (Eds.), Proceedings of Auditory-Visual Speech Processing’98 (pp. 3–20). Sydney, Australia: Causal Productions PTY Ltd.

    Google Scholar 

  • McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746–8.

    Google Scholar 

  • Meltzoff, A. N., & Kuhl, P. K. (1994). Faces and speech: Intermodal processing of biologically relevant signals in infants and adults. In D. J. Lewkowicz, & R. Lickliter (Eds.), The development of intersensory perception: Comparative perspectives (pp. 335–69). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Meltzoff, A. N., & Moore, M. K. (1993). Why faces are special to infants — On connecting the attraction of faces and infants’ ability for imitation and cross-modal processing. In B. de Boysson-Bardies, S. de Schonen, P. Jusczyk, P. McNeilage, & J. Morton (Eds.), Developmental neurocognition: Speech and face processing in the first year of life (pp. 211–25). Dordrecht: Kluwer Academic.

    Chapter  Google Scholar 

  • Miller, G. A., & Nicely, P. E. (1955). An analysis of perceptual confusions among some English consonants. Journal of the Acoustical Society of America, 27, 338–52.

    Google Scholar 

  • Mills, A. E. (1987). The development of phonology in the blind child. In B. Dodd & R. Campbell (Eds.), Hearing by eye: The psychology of lip-reading (pp. 145–61). London: Erlbaum.

    Google Scholar 

  • Munhall, K. G., Jones, J. A., Callan, D. E., Kuratate, T., & Vatikiotis-Bateson, E. (2004). Visual prosody and speech intelligibility. Psychological Science, 15, 133–7.

    Google Scholar 

  • Munhall, K. G., & Tohkura, Y. (1998). Audiovisual gating and the time course of speech perception. Journal of the Acoustical Society of America, 104, 530–9.

    Google Scholar 

  • Quittner, A., Smith L., Osberger, M., Mitchell, T., & Katz, D. (1994). The impact of audition on the development of visual attention. Psychological Science, 5, 347–53.

    Google Scholar 

  • Remez, R. E., Pardo, J. S., Piorkowski, R. L., & Rubin, P. E. (2001). On the bistability of sine wave analogues of speech. Psychological Science, 12, 24–29.

    Google Scholar 

  • Rizzolatti, G., & Arbib, M. (1998). Language within our grasp. Trends in Neurosciences, 21, 188–94.

    Google Scholar 

  • Rolls, E. (1989). The representation and storage of information in neuronal networks in the primate cerebral cortex and hippocampus. In R. Durbin, C. Miall, & G. Mitchison (Eds.), The computing neuron (pp. 125–59). Reading, MA: Addison-Wesley.

    Google Scholar 

  • Rosenblum, L. D. (2002). The perceptual basis for audiovisual speech integration. In J. H. L. Hansen, & B. Pellom (Eds.), International Conference on Spoken Language Processing 2002 (pp. 1461–4). Sydney, Australia: Causal Productions PTY, Ltd.

    Google Scholar 

  • Sams, M., Aulanko, R., Hämäläinen, M., Hari, R., Lounasmaa, O. V., Lu, S.-T., & Simola, J. (1991). Seeing speech: Visual information from lip movements modifies activity in the human auditory cortex. Neuroscience Letters, 127, 141–5.

    Google Scholar 

  • Scott, S. K. (2003). How might we conceptualize speech perception? The view from neurobiology. Journal of Phonetics, 31, 417–22.

    Google Scholar 

  • Sekiyama, K. (1997). Cultural and linguistic factors in audiovisual speech processing: The McGurk effect in Chinese subjects. Perception & Psychophysics, 59, 73–80.

    Google Scholar 

  • Sekiyama, K., & Sugita, Y. (2002). Auditory-visual speech perception examined by brain imaging and reaction time. In J. H. L. Hansen, & B. Pellom (Eds.), International Conference on Spoken Language Processing 2002 (pp. 1693–6). Sydney, Australia: Causal Productions PTY, Ltd.

    Google Scholar 

  • Sekiyama, K., & Tohkura, Y. (1991). McGurk effect in non-English listeners: Few visual effects for Japanese subjects hearing Japanese syllables of high auditory intelligibility. Journal of the Acoustical Society of America, 90, 1797–805.

    Google Scholar 

  • Stein, B. E., London, N., Wilkinson, L. K., & Price, D. D. (1996). Enhancement of perceived visual intensity by auditory stimuli: A psychophysical analysis. Journal of Cognitive Neuroscience, 8, 497–506.

    Google Scholar 

  • Sueyoshi, A., & Hardison, D. M. (2005). The role of gestures as visual cues in listening comprehension by second-language learners. Language Learning, 55, 671–709.

    Google Scholar 

  • Summerfield, Q. (1979). Use of visual information for phonetic perception. Phonetica, 36, 314–31.

    Google Scholar 

  • Tyler, L., & Wessels, J. (1985). Is gating an on-line task? Evidence from naming latency data. Perception & Psychophysics, 38, 217–222.

    Google Scholar 

  • Vatikiotis-Bateson, E., Eigsti, I-M., Yano, S., & Munhall, K. G. (1998). Eye movement of perceivers during audiovisual speech perception. Perception & Psychophysics, 60, 926–40.

    Google Scholar 

  • Walden, B. E., Erdman, S. A., Montgomery, A. A., Schwartz, D. M., & Prosek, R. A. (1981). Some effects of training on speech recognition by hearing-impaired adults. Journal of Speech and Hearing Research, 24, 207–16.

    Google Scholar 

  • Walden, B. E., Prosek, R. A., Montgomery, A. A., Scherr, C. K., & Jones, C. J. (1977). Effects of training on the visual recognition of consonants. Journal of Speech and Hearing Research, 20, 130–45.

    Google Scholar 

  • Walker, S., Bruce, V., & O’Malley, C. (1995). Facial identity and facial speech processing: Familiar faces and voices in the McGurk effect. Perception & Psychophysics, 57, 1124–33.

    Google Scholar 

  • Walton, G. E., & Bower, T. G. R. (1993). Amodal representation of speech in infants. Infant Behavior & Development, 16, 233–43.

    Google Scholar 

  • Watson, C. S., Qiu, W. W., Chamberlain, M. M., & Li, X. (1996). Auditory and visual speech perception: Confirmation of a modality-independent source of individual differences in speech recognition. Journal of the Acoustical Society of America, 100, 1153–62.

    Google Scholar 

  • Welch, R. B., & Warren, D. H. (1980). Immediate perceptual response to intersensory discrepancy. Psychological Bulletin, 88, 638–67.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Copyright information

© 2007 Debra M. Hardison

About this chapter

Cite this chapter

Hardison, D.M. (2007). The visual element in phonological perception and learning. In: Pennington, M.C. (eds) Phonology in Context. Palgrave Advances in Linguistics. Palgrave Macmillan, London. https://doi.org/10.1057/9780230625396_6

Download citation

Publish with us

Policies and ethics