The visual element in phonological perception and learning

Hardison, Debra M.

doi:10.1057/9780230625396_6

Debra M. Hardison

Part of the book series: Palgrave Advances in Linguistics ((PADLL))

286 Accesses
6 Citations

Abstract

Research in the field of phonology has long been dominated by a focus on only one source or modality of input — auditory (i.e., what we hear). However, in face-to-face communication, a significant source of information about the sounds a speaker is producing comes from visual cues such as the lip movements associated with these sounds. Studies on the contribution of these cues to the understanding of individual speech sounds by native listeners including the hearing impaired date back several decades. Only recently has this source of input been explored for its value to second-language (L2) learners.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Benguerel, A.-R, & Pichora-Fuller, M. K. (1982). Coarticulation effects in lipreading. Journal of Speech and Hearing Research, 25, 600–7.
Google Scholar
Berger, K. W. (1972). Speechreading: Principles and methods. Baltimore: National Education Press.
Google Scholar
Bradlow, A. R., Pisoni, D. B., Yamada, R. A., & Tohkura, Y. (1997). Training Japanese listeners to identify English/r/and/l/: IV. Some effects of perceptual learning on speech production. Journal of the Acoustical Society of America, 101, 2299–310.
Google Scholar
Bradlow, A. R., Torretta, G. M., & Pisoni, D. B. (1996). Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics. Speech Communication, 20; 255–72.
Article Google Scholar
Brown, J. W. (1990). Overview. In A. B. Scheibel, & A. F. Wechsler (Eds.), Neurobiology of higher cognitive function (pp. 357–65). New York: The Guilford Press.
Google Scholar
Burnham, D. (1998). Harry McGurk and the McGurk effect. In D. Burnham, J. Robert-Ribes, & E. Vatikiotis-Bateson (Eds.), Proceedings of Auditory-Visual Speech Processing’98 (pp. 1–2). Sydney, Australia: Causal Productions PTY, Ltd.
Google Scholar
Callan, D., Callan, A., & Vatikiotis-Bateson, E. (2001). Neural areas underlying the processing of visual speech information under conditions of degraded auditory information. In D. W. Massaro, J. Light, & K. Geraci (Eds.), Proceedings of Auditory-Visual Speech Processing 2001 (pp. 45–9). Sydney, Australia: Causal Productions PTY, Ltd.
Google Scholar
Campbell, R. (1987). Lip-reading and immediate memory processes or on thinking impure thoughts. In B. Dodd, & R. Campbell (Eds.), Hearing by eye: The psychology of lipreading (pp. 243–55). London: Erlbaum.
Google Scholar
Chun, D. M., Hardison, D. M., & Pennington, M. C. (2004, May). Technologies for prosody in context: Past and future of L2 research and practice. Paper presented in the Colloquium on the State-of-the-Art in L2 Phonology Research at the annual conference of American Association for Applied Linguistics. Portland, Oregon.
Google Scholar
Cosi, P., Cohen, M. A., & Massaro, D. W. (2002). Baldini: Baldi speaks Italian! In J. H. L. Hansen, & B. Pellom (Eds.), International Conference on Spoken Language Processing 2002 (pp. 2349–52). Sydney, Australia: Causal Productions PTY, Ltd.
Google Scholar
Daniloff, R. G., & Moll, K. (1968). Coarticulation of lip rounding. Journal of Speech and Hearing Research, 11, 707–21.
Google Scholar
Demorest, M. E., Bernstein, L. E., & DeHaven, G. P. (1996). Generalizability of speechreading performance on nonsense syllables, words, and sentences: Subjects with normal hearing. Journal of Speech and Hearing Research, 39, 697–713.
Google Scholar
de Sa, V., & Ballard, D. H. (1997). Perceptual learning from cross-modal feedback. In R. L. Goldstone, P. G. Schyns, & D. L. Medin (Eds.), The Psychology of Learning and Motivation (Vol. 36, pp. 309–51). San Diego, CA: Academic Press.
Google Scholar
Dodd, B. (1979). Lip-reading in infants: Attention to speech presented in-and out-of-synchrony. Cognitive Psychology, 11, 478–84.
Google Scholar
Dodd, B. (1987). The acquisition of lip-reading skills by normally hearing children. In B. Dodd, & R. Campbell (Eds.), Hearing by eye: The psychology of lip-reading (pp. 163–75). London: Erlbaum.
Google Scholar
Goto, H. (1971). Auditory perception by normal Japanese adults of the sounds “1” and “r.” Neuropsychologia, 9, 317–23.
Google Scholar
Grosjean, F. (1980). Spoken word recognition processes and the gating paradigm. Perception & Psychophysics, 28, 267–83.
Google Scholar
Hardison, D. M. (1999). Bimodal speech perception by native and nonnative speakers of English: Factors influencing the McGurk effect. Language Learning, 49, 213–83.
Google Scholar
Hardison, D. M. (2000). The neurocognitive foundation of second-language speech: A proposed scenario of bimodal development. In B. Swierzbin, F. Morris, M. E. Anderson, C. A. Klee, & E. Tarone (Eds.), Social and cognitive factors in second language acquisition (pp. 312–25). Somerville, MA: Cascadilla Press.
Google Scholar
Hardison, D. M. (2003). Acquisition of second-language speech: Effects of visual cues, context, and talker variability. Applied Psycholinguistics, 24, 495–522.
Google Scholar
Hardison, D. M. (2004). Generalization of computer-assisted prosody training: Quantitative and qualitative findings. Language Learning & Technology, 8, 34–52. Available at http://llt.msu.edu/vol8numl/hardison.
Google Scholar
Hardison, D. M. (2005a). Contextualized computer-based L2 prosody training: Evaluating the effects of discourse context and video input. CALICO Journal, 22, 175–90.
Google Scholar
Hardison, D. M. (2005b) Second-language spoken word identification: Effects of perceptual training, visual cues, and phonetic environment. Applied Psycholinguistics, 26, 579–96.
Google Scholar
Hardison, D. M. (2005c). Variability in bimodal spoken language processing by native and nonnative speakers of English: A closer look at effects of speech style. Speech Communication, 46, 73–93.
Google Scholar
Hardison, D. M. (2006). Effects of familiarity with faces and voices on L2 spoken language processing: Components of memory traces. Paper presented at Interspeech 2006 — International Conference on Spoken Language Processing, Pittsburgh, PA.
Google Scholar
Hazan, V., Sennema, A., & Faulkner, A. (2002). Audiovisual perception in L2 learners. In J. H. L. Hansen, & B. Pellom (Eds.), International Conference on Spoken Language Processing 2002 (pp. 1685–8). Sydney, Australia: Causal Productions PTY, Ltd.
Google Scholar
Hintzman, D. L. (1986). “Schema abstraction” in a multiple-trace memory model. Psychological Review, 93, 411–28.
Google Scholar
Homa, D., & Cultice, J. (1984). Role of feedback, category size, and stimulus distortion on the acquisition and utilization of ill-defined categories. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 83–94.
Google Scholar
Johnson, K., & Mullennix, J. W. (Eds.) (1997). Talker variability in speech processing. San Diego, CA: Academic Press.
Google Scholar
Kipp, M. (2001). Anvil — A generic annotation tool for multimodal dialogue. In Proceedings of the 7th European Conference on Speech Communication and Technology (pp. 1367–70). Aalborg, Denmark: Eurospeech.
Google Scholar
Kirk, K. I., Pisoni, D. B., & Lachs, L. (2002). Audiovisual integration of speech by children and adults with cochlear implants. In J. H. L. Hansen, & B. Pellom (Eds.), International Conference on Spoken Language Processing 2002 (pp. 1689–92). Sydney, Australia: Causal Productions PTY, Ltd.
Google Scholar
Kricos, P. B., & Lesner, S. A. (1982). Differences in visual intelligibility across talkers. Volta Review, 84, 219–25.
Google Scholar
Kuhl, P. K., & Meltzoff, A. N. (1984). The intermodal representation of speech in infants. Infant Behavior & Development, 7, 361–81.
Google Scholar
Lansing, C. R., & McConkie, G. W. (1999). Attention to facial regions in segmental and prosodic visual speech perception tasks. Journal of Speech, Language, and Hearing Research, 24, 526–39.
Google Scholar
Legerstee, M. (1990). Infants use multimodal information to imitate speech sounds. Infant Behavior & Development, 1, 343–54.
Google Scholar
Lewkowicz, D. J. (2002). Perception and integration of audiovisual speech in human infants. In J. H. L. Hansen, & B. Pellom (Eds.), International Conference on Spoken Language Processing 2002 (pp. 1701–4). Sydney, Australia: Causal Productions PTY, Ltd.
Google Scholar
Lively, S. E., Logan, J. S., & Pisoni, D. B. (1993). Training Japanese listeners to identify English/r/and/l/. II: The role of phonetic environment and talker variability in learning new perceptual categories. Journal of the Acoustical Society of America, 94, 1242–55.
Google Scholar
MacDonald, J., & McGurk, H. (1978). Visual influences on speech perception processes. Perception & Psychophysics, 24, 253–7.
Google Scholar
Marslen-Wilson, W., & Welsh, A. (1978). Processing interactions and lexical access during word recognition in continuous speech. Cognitive Psychology, 10, 29–63.
Google Scholar
Massaro, D. W. (1987). Speech perception by ear and eye: A paradigm for psychological inquiry. Hillsdale, NJ: Erlbaum.
Google Scholar
Massaro, D. W. (1998). Perceiving talking faces: From speech perception to a behavioral principle. Cambridge: MIT Press.
Google Scholar
Massaro, D. W., Cohen, M. M., Beskow, J., & Cole, R. A. (2000). Developing and evaluating conversational agents. In J. Cassell, J. Sullivan, S. Prevost, & E. Churchill (Eds.), Embodied conversational agents (pp. 287–318). Cambridge: MIT Press.
Google Scholar
Massaro, D. W., Cohen, M. M. & Gesi, A. T. (1993). Long-term training, transfer, and retention in learning to lipread. Perception & Psychophysics, 53, 549–62.
Article Google Scholar
Massaro, D. W., Cohen, M. M., Gesi, A., Heredia, R., & Tsuzaki, M. (1993). Bimodal speech perception: An examination across languages. Journal of Phonetics, 21, 445–78.
Google Scholar
McGurk, H. (1998). Developmental psychology and the vision of speech: Inaugural lecture by Professor Harry McGurk, 2nd March 1988. In D. Burnham, J. Robert- Ribes, & E. Vatikiotis-Bateson (Eds.), Proceedings of Auditory-Visual Speech Processing’98 (pp. 3–20). Sydney, Australia: Causal Productions PTY Ltd.
Google Scholar
McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746–8.
Google Scholar
Meltzoff, A. N., & Kuhl, P. K. (1994). Faces and speech: Intermodal processing of biologically relevant signals in infants and adults. In D. J. Lewkowicz, & R. Lickliter (Eds.), The development of intersensory perception: Comparative perspectives (pp. 335–69). Hillsdale, NJ: Erlbaum.
Google Scholar
Meltzoff, A. N., & Moore, M. K. (1993). Why faces are special to infants — On connecting the attraction of faces and infants’ ability for imitation and cross-modal processing. In B. de Boysson-Bardies, S. de Schonen, P. Jusczyk, P. McNeilage, & J. Morton (Eds.), Developmental neurocognition: Speech and face processing in the first year of life (pp. 211–25). Dordrecht: Kluwer Academic.
Chapter Google Scholar
Miller, G. A., & Nicely, P. E. (1955). An analysis of perceptual confusions among some English consonants. Journal of the Acoustical Society of America, 27, 338–52.
Google Scholar
Mills, A. E. (1987). The development of phonology in the blind child. In B. Dodd & R. Campbell (Eds.), Hearing by eye: The psychology of lip-reading (pp. 145–61). London: Erlbaum.
Google Scholar
Munhall, K. G., Jones, J. A., Callan, D. E., Kuratate, T., & Vatikiotis-Bateson, E. (2004). Visual prosody and speech intelligibility. Psychological Science, 15, 133–7.
Google Scholar
Munhall, K. G., & Tohkura, Y. (1998). Audiovisual gating and the time course of speech perception. Journal of the Acoustical Society of America, 104, 530–9.
Google Scholar
Quittner, A., Smith L., Osberger, M., Mitchell, T., & Katz, D. (1994). The impact of audition on the development of visual attention. Psychological Science, 5, 347–53.
Google Scholar
Remez, R. E., Pardo, J. S., Piorkowski, R. L., & Rubin, P. E. (2001). On the bistability of sine wave analogues of speech. Psychological Science, 12, 24–29.
Google Scholar
Rizzolatti, G., & Arbib, M. (1998). Language within our grasp. Trends in Neurosciences, 21, 188–94.
Google Scholar
Rolls, E. (1989). The representation and storage of information in neuronal networks in the primate cerebral cortex and hippocampus. In R. Durbin, C. Miall, & G. Mitchison (Eds.), The computing neuron (pp. 125–59). Reading, MA: Addison-Wesley.
Google Scholar
Rosenblum, L. D. (2002). The perceptual basis for audiovisual speech integration. In J. H. L. Hansen, & B. Pellom (Eds.), International Conference on Spoken Language Processing 2002 (pp. 1461–4). Sydney, Australia: Causal Productions PTY, Ltd.
Google Scholar
Sams, M., Aulanko, R., Hämäläinen, M., Hari, R., Lounasmaa, O. V., Lu, S.-T., & Simola, J. (1991). Seeing speech: Visual information from lip movements modifies activity in the human auditory cortex. Neuroscience Letters, 127, 141–5.
Google Scholar
Scott, S. K. (2003). How might we conceptualize speech perception? The view from neurobiology. Journal of Phonetics, 31, 417–22.
Google Scholar
Sekiyama, K. (1997). Cultural and linguistic factors in audiovisual speech processing: The McGurk effect in Chinese subjects. Perception & Psychophysics, 59, 73–80.
Google Scholar
Sekiyama, K., & Sugita, Y. (2002). Auditory-visual speech perception examined by brain imaging and reaction time. In J. H. L. Hansen, & B. Pellom (Eds.), International Conference on Spoken Language Processing 2002 (pp. 1693–6). Sydney, Australia: Causal Productions PTY, Ltd.
Google Scholar
Sekiyama, K., & Tohkura, Y. (1991). McGurk effect in non-English listeners: Few visual effects for Japanese subjects hearing Japanese syllables of high auditory intelligibility. Journal of the Acoustical Society of America, 90, 1797–805.
Google Scholar
Stein, B. E., London, N., Wilkinson, L. K., & Price, D. D. (1996). Enhancement of perceived visual intensity by auditory stimuli: A psychophysical analysis. Journal of Cognitive Neuroscience, 8, 497–506.
Google Scholar
Sueyoshi, A., & Hardison, D. M. (2005). The role of gestures as visual cues in listening comprehension by second-language learners. Language Learning, 55, 671–709.
Google Scholar
Summerfield, Q. (1979). Use of visual information for phonetic perception. Phonetica, 36, 314–31.
Google Scholar
Tyler, L., & Wessels, J. (1985). Is gating an on-line task? Evidence from naming latency data. Perception & Psychophysics, 38, 217–222.
Google Scholar
Vatikiotis-Bateson, E., Eigsti, I-M., Yano, S., & Munhall, K. G. (1998). Eye movement of perceivers during audiovisual speech perception. Perception & Psychophysics, 60, 926–40.
Google Scholar
Walden, B. E., Erdman, S. A., Montgomery, A. A., Schwartz, D. M., & Prosek, R. A. (1981). Some effects of training on speech recognition by hearing-impaired adults. Journal of Speech and Hearing Research, 24, 207–16.
Google Scholar
Walden, B. E., Prosek, R. A., Montgomery, A. A., Scherr, C. K., & Jones, C. J. (1977). Effects of training on the visual recognition of consonants. Journal of Speech and Hearing Research, 20, 130–45.
Google Scholar
Walker, S., Bruce, V., & O’Malley, C. (1995). Facial identity and facial speech processing: Familiar faces and voices in the McGurk effect. Perception & Psychophysics, 57, 1124–33.
Google Scholar
Walton, G. E., & Bower, T. G. R. (1993). Amodal representation of speech in infants. Infant Behavior & Development, 16, 233–43.
Google Scholar
Watson, C. S., Qiu, W. W., Chamberlain, M. M., & Li, X. (1996). Auditory and visual speech perception: Confirmation of a modality-independent source of individual differences in speech recognition. Journal of the Acoustical Society of America, 100, 1153–62.
Google Scholar
Welch, R. B., & Warren, D. H. (1980). Immediate perceptual response to intersensory discrepancy. Psychological Bulletin, 88, 638–67.
Google Scholar

Download references

Authors

Debra M. Hardison
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Elizabethtown College, Pennsylvania, USA
Martha C. Pennington (Distinguished Professor of Linguistics) (Distinguished Professor of Linguistics)

Copyright information

About this chapter

Cite this chapter

Hardison, D.M. (2007). The visual element in phonological perception and learning. In: Pennington, M.C. (eds) Phonology in Context. Palgrave Advances in Linguistics. Palgrave Macmillan, London. https://doi.org/10.1057/9780230625396_6

Download citation

DOI: https://doi.org/10.1057/9780230625396_6
Publisher Name: Palgrave Macmillan, London
Print ISBN: 978-1-4039-3537-3
Online ISBN: 978-0-230-62539-6
eBook Packages: Palgrave Language & Linguistics CollectionEducation (R0)

Publish with us

Policies and ethics