Abstract
Research in the field of phonology has long been dominated by a focus on only one source or modality of input — auditory (i.e., what we hear). However, in face-to-face communication, a significant source of information about the sounds a speaker is producing comes from visual cues such as the lip movements associated with these sounds. Studies on the contribution of these cues to the understanding of individual speech sounds by native listeners including the hearing impaired date back several decades. Only recently has this source of input been explored for its value to second-language (L2) learners.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Benguerel, A.-R, & Pichora-Fuller, M. K. (1982). Coarticulation effects in lipreading. Journal of Speech and Hearing Research, 25, 600–7.
Berger, K. W. (1972). Speechreading: Principles and methods. Baltimore: National Education Press.
Bradlow, A. R., Pisoni, D. B., Yamada, R. A., & Tohkura, Y. (1997). Training Japanese listeners to identify English/r/and/l/: IV. Some effects of perceptual learning on speech production. Journal of the Acoustical Society of America, 101, 2299–310.
Bradlow, A. R., Torretta, G. M., & Pisoni, D. B. (1996). Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics. Speech Communication, 20; 255–72.
Brown, J. W. (1990). Overview. In A. B. Scheibel, & A. F. Wechsler (Eds.), Neurobiology of higher cognitive function (pp. 357–65). New York: The Guilford Press.
Burnham, D. (1998). Harry McGurk and the McGurk effect. In D. Burnham, J. Robert-Ribes, & E. Vatikiotis-Bateson (Eds.), Proceedings of Auditory-Visual Speech Processing’98 (pp. 1–2). Sydney, Australia: Causal Productions PTY, Ltd.
Callan, D., Callan, A., & Vatikiotis-Bateson, E. (2001). Neural areas underlying the processing of visual speech information under conditions of degraded auditory information. In D. W. Massaro, J. Light, & K. Geraci (Eds.), Proceedings of Auditory-Visual Speech Processing 2001 (pp. 45–9). Sydney, Australia: Causal Productions PTY, Ltd.
Campbell, R. (1987). Lip-reading and immediate memory processes or on thinking impure thoughts. In B. Dodd, & R. Campbell (Eds.), Hearing by eye: The psychology of lipreading (pp. 243–55). London: Erlbaum.
Chun, D. M., Hardison, D. M., & Pennington, M. C. (2004, May). Technologies for prosody in context: Past and future of L2 research and practice. Paper presented in the Colloquium on the State-of-the-Art in L2 Phonology Research at the annual conference of American Association for Applied Linguistics. Portland, Oregon.
Cosi, P., Cohen, M. A., & Massaro, D. W. (2002). Baldini: Baldi speaks Italian! In J. H. L. Hansen, & B. Pellom (Eds.), International Conference on Spoken Language Processing 2002 (pp. 2349–52). Sydney, Australia: Causal Productions PTY, Ltd.
Daniloff, R. G., & Moll, K. (1968). Coarticulation of lip rounding. Journal of Speech and Hearing Research, 11, 707–21.
Demorest, M. E., Bernstein, L. E., & DeHaven, G. P. (1996). Generalizability of speechreading performance on nonsense syllables, words, and sentences: Subjects with normal hearing. Journal of Speech and Hearing Research, 39, 697–713.
de Sa, V., & Ballard, D. H. (1997). Perceptual learning from cross-modal feedback. In R. L. Goldstone, P. G. Schyns, & D. L. Medin (Eds.), The Psychology of Learning and Motivation (Vol. 36, pp. 309–51). San Diego, CA: Academic Press.
Dodd, B. (1979). Lip-reading in infants: Attention to speech presented in-and out-of-synchrony. Cognitive Psychology, 11, 478–84.
Dodd, B. (1987). The acquisition of lip-reading skills by normally hearing children. In B. Dodd, & R. Campbell (Eds.), Hearing by eye: The psychology of lip-reading (pp. 163–75). London: Erlbaum.
Goto, H. (1971). Auditory perception by normal Japanese adults of the sounds “1” and “r.” Neuropsychologia, 9, 317–23.
Grosjean, F. (1980). Spoken word recognition processes and the gating paradigm. Perception & Psychophysics, 28, 267–83.
Hardison, D. M. (1999). Bimodal speech perception by native and nonnative speakers of English: Factors influencing the McGurk effect. Language Learning, 49, 213–83.
Hardison, D. M. (2000). The neurocognitive foundation of second-language speech: A proposed scenario of bimodal development. In B. Swierzbin, F. Morris, M. E. Anderson, C. A. Klee, & E. Tarone (Eds.), Social and cognitive factors in second language acquisition (pp. 312–25). Somerville, MA: Cascadilla Press.
Hardison, D. M. (2003). Acquisition of second-language speech: Effects of visual cues, context, and talker variability. Applied Psycholinguistics, 24, 495–522.
Hardison, D. M. (2004). Generalization of computer-assisted prosody training: Quantitative and qualitative findings. Language Learning & Technology, 8, 34–52. Available at http://llt.msu.edu/vol8numl/hardison.
Hardison, D. M. (2005a). Contextualized computer-based L2 prosody training: Evaluating the effects of discourse context and video input. CALICO Journal, 22, 175–90.
Hardison, D. M. (2005b) Second-language spoken word identification: Effects of perceptual training, visual cues, and phonetic environment. Applied Psycholinguistics, 26, 579–96.
Hardison, D. M. (2005c). Variability in bimodal spoken language processing by native and nonnative speakers of English: A closer look at effects of speech style. Speech Communication, 46, 73–93.
Hardison, D. M. (2006). Effects of familiarity with faces and voices on L2 spoken language processing: Components of memory traces. Paper presented at Interspeech 2006 — International Conference on Spoken Language Processing, Pittsburgh, PA.
Hazan, V., Sennema, A., & Faulkner, A. (2002). Audiovisual perception in L2 learners. In J. H. L. Hansen, & B. Pellom (Eds.), International Conference on Spoken Language Processing 2002 (pp. 1685–8). Sydney, Australia: Causal Productions PTY, Ltd.
Hintzman, D. L. (1986). “Schema abstraction” in a multiple-trace memory model. Psychological Review, 93, 411–28.
Homa, D., & Cultice, J. (1984). Role of feedback, category size, and stimulus distortion on the acquisition and utilization of ill-defined categories. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 83–94.
Johnson, K., & Mullennix, J. W. (Eds.) (1997). Talker variability in speech processing. San Diego, CA: Academic Press.
Kipp, M. (2001). Anvil — A generic annotation tool for multimodal dialogue. In Proceedings of the 7th European Conference on Speech Communication and Technology (pp. 1367–70). Aalborg, Denmark: Eurospeech.
Kirk, K. I., Pisoni, D. B., & Lachs, L. (2002). Audiovisual integration of speech by children and adults with cochlear implants. In J. H. L. Hansen, & B. Pellom (Eds.), International Conference on Spoken Language Processing 2002 (pp. 1689–92). Sydney, Australia: Causal Productions PTY, Ltd.
Kricos, P. B., & Lesner, S. A. (1982). Differences in visual intelligibility across talkers. Volta Review, 84, 219–25.
Kuhl, P. K., & Meltzoff, A. N. (1984). The intermodal representation of speech in infants. Infant Behavior & Development, 7, 361–81.
Lansing, C. R., & McConkie, G. W. (1999). Attention to facial regions in segmental and prosodic visual speech perception tasks. Journal of Speech, Language, and Hearing Research, 24, 526–39.
Legerstee, M. (1990). Infants use multimodal information to imitate speech sounds. Infant Behavior & Development, 1, 343–54.
Lewkowicz, D. J. (2002). Perception and integration of audiovisual speech in human infants. In J. H. L. Hansen, & B. Pellom (Eds.), International Conference on Spoken Language Processing 2002 (pp. 1701–4). Sydney, Australia: Causal Productions PTY, Ltd.
Lively, S. E., Logan, J. S., & Pisoni, D. B. (1993). Training Japanese listeners to identify English/r/and/l/. II: The role of phonetic environment and talker variability in learning new perceptual categories. Journal of the Acoustical Society of America, 94, 1242–55.
MacDonald, J., & McGurk, H. (1978). Visual influences on speech perception processes. Perception & Psychophysics, 24, 253–7.
Marslen-Wilson, W., & Welsh, A. (1978). Processing interactions and lexical access during word recognition in continuous speech. Cognitive Psychology, 10, 29–63.
Massaro, D. W. (1987). Speech perception by ear and eye: A paradigm for psychological inquiry. Hillsdale, NJ: Erlbaum.
Massaro, D. W. (1998). Perceiving talking faces: From speech perception to a behavioral principle. Cambridge: MIT Press.
Massaro, D. W., Cohen, M. M., Beskow, J., & Cole, R. A. (2000). Developing and evaluating conversational agents. In J. Cassell, J. Sullivan, S. Prevost, & E. Churchill (Eds.), Embodied conversational agents (pp. 287–318). Cambridge: MIT Press.
Massaro, D. W., Cohen, M. M. & Gesi, A. T. (1993). Long-term training, transfer, and retention in learning to lipread. Perception & Psychophysics, 53, 549–62.
Massaro, D. W., Cohen, M. M., Gesi, A., Heredia, R., & Tsuzaki, M. (1993). Bimodal speech perception: An examination across languages. Journal of Phonetics, 21, 445–78.
McGurk, H. (1998). Developmental psychology and the vision of speech: Inaugural lecture by Professor Harry McGurk, 2nd March 1988. In D. Burnham, J. Robert- Ribes, & E. Vatikiotis-Bateson (Eds.), Proceedings of Auditory-Visual Speech Processing’98 (pp. 3–20). Sydney, Australia: Causal Productions PTY Ltd.
McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746–8.
Meltzoff, A. N., & Kuhl, P. K. (1994). Faces and speech: Intermodal processing of biologically relevant signals in infants and adults. In D. J. Lewkowicz, & R. Lickliter (Eds.), The development of intersensory perception: Comparative perspectives (pp. 335–69). Hillsdale, NJ: Erlbaum.
Meltzoff, A. N., & Moore, M. K. (1993). Why faces are special to infants — On connecting the attraction of faces and infants’ ability for imitation and cross-modal processing. In B. de Boysson-Bardies, S. de Schonen, P. Jusczyk, P. McNeilage, & J. Morton (Eds.), Developmental neurocognition: Speech and face processing in the first year of life (pp. 211–25). Dordrecht: Kluwer Academic.
Miller, G. A., & Nicely, P. E. (1955). An analysis of perceptual confusions among some English consonants. Journal of the Acoustical Society of America, 27, 338–52.
Mills, A. E. (1987). The development of phonology in the blind child. In B. Dodd & R. Campbell (Eds.), Hearing by eye: The psychology of lip-reading (pp. 145–61). London: Erlbaum.
Munhall, K. G., Jones, J. A., Callan, D. E., Kuratate, T., & Vatikiotis-Bateson, E. (2004). Visual prosody and speech intelligibility. Psychological Science, 15, 133–7.
Munhall, K. G., & Tohkura, Y. (1998). Audiovisual gating and the time course of speech perception. Journal of the Acoustical Society of America, 104, 530–9.
Quittner, A., Smith L., Osberger, M., Mitchell, T., & Katz, D. (1994). The impact of audition on the development of visual attention. Psychological Science, 5, 347–53.
Remez, R. E., Pardo, J. S., Piorkowski, R. L., & Rubin, P. E. (2001). On the bistability of sine wave analogues of speech. Psychological Science, 12, 24–29.
Rizzolatti, G., & Arbib, M. (1998). Language within our grasp. Trends in Neurosciences, 21, 188–94.
Rolls, E. (1989). The representation and storage of information in neuronal networks in the primate cerebral cortex and hippocampus. In R. Durbin, C. Miall, & G. Mitchison (Eds.), The computing neuron (pp. 125–59). Reading, MA: Addison-Wesley.
Rosenblum, L. D. (2002). The perceptual basis for audiovisual speech integration. In J. H. L. Hansen, & B. Pellom (Eds.), International Conference on Spoken Language Processing 2002 (pp. 1461–4). Sydney, Australia: Causal Productions PTY, Ltd.
Sams, M., Aulanko, R., Hämäläinen, M., Hari, R., Lounasmaa, O. V., Lu, S.-T., & Simola, J. (1991). Seeing speech: Visual information from lip movements modifies activity in the human auditory cortex. Neuroscience Letters, 127, 141–5.
Scott, S. K. (2003). How might we conceptualize speech perception? The view from neurobiology. Journal of Phonetics, 31, 417–22.
Sekiyama, K. (1997). Cultural and linguistic factors in audiovisual speech processing: The McGurk effect in Chinese subjects. Perception & Psychophysics, 59, 73–80.
Sekiyama, K., & Sugita, Y. (2002). Auditory-visual speech perception examined by brain imaging and reaction time. In J. H. L. Hansen, & B. Pellom (Eds.), International Conference on Spoken Language Processing 2002 (pp. 1693–6). Sydney, Australia: Causal Productions PTY, Ltd.
Sekiyama, K., & Tohkura, Y. (1991). McGurk effect in non-English listeners: Few visual effects for Japanese subjects hearing Japanese syllables of high auditory intelligibility. Journal of the Acoustical Society of America, 90, 1797–805.
Stein, B. E., London, N., Wilkinson, L. K., & Price, D. D. (1996). Enhancement of perceived visual intensity by auditory stimuli: A psychophysical analysis. Journal of Cognitive Neuroscience, 8, 497–506.
Sueyoshi, A., & Hardison, D. M. (2005). The role of gestures as visual cues in listening comprehension by second-language learners. Language Learning, 55, 671–709.
Summerfield, Q. (1979). Use of visual information for phonetic perception. Phonetica, 36, 314–31.
Tyler, L., & Wessels, J. (1985). Is gating an on-line task? Evidence from naming latency data. Perception & Psychophysics, 38, 217–222.
Vatikiotis-Bateson, E., Eigsti, I-M., Yano, S., & Munhall, K. G. (1998). Eye movement of perceivers during audiovisual speech perception. Perception & Psychophysics, 60, 926–40.
Walden, B. E., Erdman, S. A., Montgomery, A. A., Schwartz, D. M., & Prosek, R. A. (1981). Some effects of training on speech recognition by hearing-impaired adults. Journal of Speech and Hearing Research, 24, 207–16.
Walden, B. E., Prosek, R. A., Montgomery, A. A., Scherr, C. K., & Jones, C. J. (1977). Effects of training on the visual recognition of consonants. Journal of Speech and Hearing Research, 20, 130–45.
Walker, S., Bruce, V., & O’Malley, C. (1995). Facial identity and facial speech processing: Familiar faces and voices in the McGurk effect. Perception & Psychophysics, 57, 1124–33.
Walton, G. E., & Bower, T. G. R. (1993). Amodal representation of speech in infants. Infant Behavior & Development, 16, 233–43.
Watson, C. S., Qiu, W. W., Chamberlain, M. M., & Li, X. (1996). Auditory and visual speech perception: Confirmation of a modality-independent source of individual differences in speech recognition. Journal of the Acoustical Society of America, 100, 1153–62.
Welch, R. B., & Warren, D. H. (1980). Immediate perceptual response to intersensory discrepancy. Psychological Bulletin, 88, 638–67.
Editor information
Editors and Affiliations
Copyright information
© 2007 Debra M. Hardison
About this chapter
Cite this chapter
Hardison, D.M. (2007). The visual element in phonological perception and learning. In: Pennington, M.C. (eds) Phonology in Context. Palgrave Advances in Linguistics. Palgrave Macmillan, London. https://doi.org/10.1057/9780230625396_6
Download citation
DOI: https://doi.org/10.1057/9780230625396_6
Publisher Name: Palgrave Macmillan, London
Print ISBN: 978-1-4039-3537-3
Online ISBN: 978-0-230-62539-6
eBook Packages: Palgrave Language & Linguistics CollectionEducation (R0)