Elsevier

Journal of Phonetics

Volume 54, January 2016, Pages 151-168
Journal of Phonetics

Research Article
Imitation of second language sounds in relation to L2 perception and production

https://doi.org/10.1016/j.wocn.2015.10.003Get rights and content

Highlights

  • L2 imitation most likely bypasses some effect of phonological categorization.

  • The difficulty of L2 imitation may depend on the types of stimuli.

  • English speakers imitate L2 tones more accurately than reading them aloud.

  • Korean speakers are less accurate in imitating L2 consonants than in reading them.

  • A model cascading perception and production closely predicts imitation errors.

Abstract

This study reports findings from two experiments on second language learners, comparing their performance in an Imitation task to that in Identification and Read-Aloud tasks. Experiment 1 targeted English speakers׳ learning of Mandarin tones, while Experiment 2 investigated Korean speakers׳ learning of English consonants. The results of Experiment 1 showed that the Imitation task was generally easier for English speakers than the Identification and Read-Aloud tasks, suggesting that imitation was performed without some of the skills required by the other two tasks. As for Experiment 2, the Koreans were consistently less accurate in Imitation than in Read-Aloud, while their Imitation was more accurate than their Identification when the L2 sounds have a close counterpart in Korean. The results from both experiments revealed that the accuracy in Imitation was not always constrained by that in the Identification and Read-Aloud tasks. Hence it can be inferred that L2 imitation may not involve all the skills required by the other two tasks and probably bypasses some aspect of phonological encoding. More detailed predictions of the error patterns in the Imitation task based on Perception, Production, and Cascade models were compared. It was found that English speakers׳ confusion patterns in imitating Mandarin tones correlated to the same degree with the predictions of all three models, which may be because the learners׳ difficulty in perception and production was largely similar. On the other hand, Korean learners׳ errors in imitating English consonants were overall more accurately predicted by a Cascade model, suggesting that both perceptual and production imprecision aggravated imitation performance. The comparison of these two experiments corroborates that various factors may affect the relationships between L2 imitation, perception, and production.

Introduction

One distinctive aspect of speech and language is that perceivers are also normally producers. In regular communication speakers and listeners are constantly changing their roles and they act cooperatively, with the speakers trying to be understood by the listeners and the listeners trying to understand the speakers. Even though speaking and listening obviously require very different skills, speech perception and production must be connected in a manner that permits congruency between these two modalities. To explore the link between these two domains, many studies have employed a speech imitation task, in which the performers listen to an audio stimulus and reproduce it (Flege & Eefting (1988), Fowler, Brown, Sabadini, & Weihing (2003), Mitterer & Ernestus (2008), Mitterer & Müsseler (2013), Shockley, Sabadini, & Fowler (2004)). By comparing the target stimulus and the performers׳ reproduction, these studies have attempted to conceive a processing route from speech perception to production and establish the relationships between them.

One main question concerning the processing mechanisms involved in speech imitation is whether audio perception and motor production are mediated by higher-level phonological encoding. If so, the reproduction should conform to the participants׳ linguistic categories, and sub-phonemic differences in the stimuli should be filtered out in the phonological module and should not be imitated. If, on the other hand, speech imitation bypasses phonological categorization and operates on a more direct link from perception to production, participants should be able to faithfully imitate the sub-phonemic details of the audio stimuli.

A number of researchers supported the latter proposition by showing that their participants could imitate fine phonetic differences that are not contrastive in their native language. For example, Fowler et al. (2003) and Shockley et al. (2004) found that English speakers lengthened the VOT of their stop consonants when imitating stimuli with artificially-lengthened VOT values. Lehiste and Shockey (1980) showed that English speakers׳ repetition of a synthesized bead-beat continuum closely mimicked the vowel duration increment in the stimuli, in contrast to the strong categorical trend in their labeling of the continuum. Other studies such as Chistovich, Fant, deSerpa-Leitfio, and Tjernlund (1966) and Repp and Williams (1985) also provided partial evidence that imitation is not mediated by phonological categorization, finding categories in their imitation that did not correspond to any of their linguistic categories. These findings, together, suggest that the perceived signal is probably not categorized into a phonological prototype before it is reproduced, because sub-phonemic differences in the stimuli are often preserved in the imitation.

In contrast, another group of researchers have shown that their participants imitated the properties that are phonologically relevant in their native language, but not necessarily the fine phonetic details. For instance, Mitterer and Ernestus (2008) examined the imitation of stop consonants with 0, −38, and −64 ms of VOT by native speakers of Dutch, in which only the presence or absence of pre-voicing is phonologically contrastive but not the amount of pre-voicing. They found that the Dutch participants only imitated the VOT difference between the 0 vs. −38 ms contrast but not between the −38 vs. −64 ms contrast, and thus the authors regarded this as evidence for the mediation of phonological categorization in imitation. In addition, studies on the imitation of English intonation have also lent support for the involvement of phonological encoding in the process. For instance, Pierrehumbert and Steele (1989) examined English speakers׳ imitation of a short sentence varying in the F0 peak delay between the L+H and L+H intonation patterns. The participants׳ reproductions were mostly bimodal rather than continuous in terms of the peak timings, suggesting that their production patterns were strongly affected by a categorization process associated with the existence of phonologically specific intonational categories. Similar results for vowel categories have been found by Viechnicki (2002) as well.

The discrepancy in the findings of these imitation studies may not be surprising, since they differed considerably in the types of contrast, stimuli complexity, and task procedures. What is clear, though, is that researchers have been investigating the involvement of phonological categorization in speech imitation with various methods, yet have not reached a definite conclusion.

Another question often pursued in the literature is how speech imitation relates to auditory perception and motor production. From one perspective, imitation can be seen as a production task adopting auditory instead of orthographic prompts; however, one might equally view imitation as a perception task utilizing a verbal rather than a written response. Hence the comparison of an imitation task with a perception and a production task would offer insight on the relative contribution of perceptual and production factors in imitation. Some studies have suggested that the participants׳ imitation patterns largely resemble those in a perception task. For example, Schouten (1977) found that his participants׳ formant frequency clusters in a vowel imitation task closely corresponded to their perceptual categories in an identification task. In another study, Jia, Strange, Wu, Collado, and Guan (2006) showed that their participants׳ accuracy in a vowel imitation task and that in an identification task significantly correlated. These findings demonstrated that the participants׳ imitation performance is very similar to their perceptual identification. Other studies, on the other hand, showed a high degree of resemblance between an imitation and a production task. For instance, Repp & Williams (1985), Repp & Williams (1987) found that their participants׳ imitation of an /i/-/æ/ vowel continuum displayed a few clusters of formant frequencies, which corresponded closely to the same participants׳ production of the vowel continuum without an audio model. A common limitation of these former studies, however, is that they only compared imitation to either a perception or a production task, but not to both. Hence they may not offer a complete view of the relationships between imitation, perception, and production.

In summary, at least two major questions remain unresolved in the literature on speech imitation: (1) whether phonological encoding is employed in the process of imitation, and (2) whether perception or production exerts a stronger influence on imitation. The present study aims to further investigate these two issues on a different population from the studies reviewed above: second language (L2) learners.

L2 imitation merits thorough investigation because it not only permits an examination of both L2 perception and production but also illustrates how these two modalities are coordinated in a single act. Additionally, since L2 perception and production have often been found to diverge from each other (e.g. Baker & Trofimovich, 2006; Bohn & Flege, 1997; Bradlow, Pisoni, Akahane-Yamada, & Tohkura, 1997; Flege, 1993; Flege, Bohn, & Jang, 1997; Ingram & Park, 1997; Sheldon & Strange, 1982), a comparison of second language learners׳ imitation with their perception and production patterns would reveal more clearly the relative contribution of perception, production, and other processing mechanisms in speech imitation. The studies done by Flege & Eefting (1987), Flege & Eefting (1988) are among the few that closely examined L2 imitation in relation to L2 perception and production. The authors recruited Spanish speakers who started learning English before the age of 6 to imitate and label a /da/-/ta/ continuum ranging from −60 to 90 ms VOT. In addition, these learners were asked to read a list of English words starting with /t/ and /d/. The results showed that these learners did not faithfully imitate the VOT increment in the stimuli but instead produced three response concentrations indicative of the lead, short-lag, and long-lag stop categories. Hence the authors interpreted these results as support for the involvement of phonological encoding in L2 imitation, and concluded that these Spanish speakers have established a new phonetic category for the English long-lag stop.

Whereas Flege and Eefting (1988) used the imitation task as a means to tap into L2 learners׳ phonological representation, the present study focused primarily on the nature of the imitation task and its relationships with perception and production. Thus this study differed from Flege & Eefting (1987), Flege & Eefting (1988) in the following aspects. First, Flege and Eefting used different stimuli in their labeling and imitation tasks (a stop continuum) and in their reading task (English word list), making it hard to compare across the three tasks. In contrast, the current study used the same nonsense words containing the target L2 phonemes in all three tasks, facilitating direct comparison between tasks. Second, Flege and Eefting (1988) held the presumption that linguistic categorization was employed in imitation. The current study, on the other hand, treated the involvement of phonological encoding in imitation as a research question and took caution not to bias the results in one way or another. Third, even though the results in Flege and Eefting (1988) supported the employment of phonological categorization in L2 imitation of stops, these authors acknowledged that other imitation studies on fricatives and vowels yielded diverging outcomes. In view of this, the present study looked into the imitation of L2 tones as well as consonants in order to arrive at more generalizable conclusions.

Specifically, the current study reports the results from two experiments. Experiment 1 examined English speakers׳ learning of Mandarin Chinese tones, and Experiment 2 investigated Korean speakers׳ acquisition of English consonants. What these two experiments had in common is that they both comprised the same three tasks with nearly identical procedures: (1) an Identification task that requires the ability to perceive the stimulus and to associate the signal with a phonological category as represented by an L2 label; (2) a Read-Aloud task that involves the skills to associate a phonological category represented by an L2 label with the corresponding motor commands and implement these commands to articulate the sound; and (3) an Imitation task that employs auditory perception and motor production. Whether the Imitation task involves phonological categorization as in the Identification and Read-Aloud tasks would be investigated by comparing the participants׳ performance in Imitation with that in the other two tasks. We hypothesized that if imitation employs phonological encoding in addition to perception and production, it would require the acquisition of more skills than those used in either Identification or Read-Aloud. As a result, the participants׳ performance in the Imitation task should be constrained by their accuracy in the other two tasks. On the other hand, if imitation bypasses some aspects of phonological encoding, the learners׳ accuracy in Imitation would not necessarily fall behind that in the other two tasks, but might actually be better. In addition to examining the learners׳ relative accuracy, their error distributions in the three tasks were compared to identify the sources of difficulty in their imitation of L2 sounds. To the extent that Identification and Read-Aloud tasks create different error patterns, the similarity between the learners׳ error patterns in the Imitation task and those in Identification and Read-Aloud tasks would reveal whether their difficulty in imitation has a predominantly perceptual or articulatory basis, or whether it is a joint effect of perceptual errors compounded by production inaccuracy.

Section snippets

Experiment 1: English speakers׳ Identification, Read-Aloud, and Imitation of Mandarin tones

In this experiment we examined English speakers׳ performance in Identification, Read-Aloud, and Imitation of the four Mandarin tones: T1 (high-level), T2, (rising), T3 (low-dipping), and T4 (falling). Mandarin is a tonal language, in which words with identical segments but different pitch contours differ in meanings. In English, on the other hand, pitch is recruited to carry out discourse functions at the post-lexical level and only indirectly differentiates lexical meanings by indicating the

Experiment 2: Korean speakers׳ Identification, Read-Aloud, and Imitation of English consonants

In Experiment 2 we investigated Korean speakers׳ identification, reading, and imitation of four English stops /p b t d/ and four fricatives /f v θ ð/. These consonants were selected because they bear different degrees of resemblance to Korean categories. Korean has both bilabial and alveolar stops, although it maintains a three-way laryngeal distinction, i.e. tense, lax, aspirated (Kim, 1965), as opposed to the two-way contrast in English. As for fricatives, Korean is described as having the

General discussion

The current study examined two groups of second language learners׳ performance in Identification, Read-Aloud, and Imitation tasks. These three tasks were selected because they involve some overlapping skills. Identification is perception-oriented while Read-Aloud is production-oriented, and both of these tasks require higher-level linguistic categorization. The Imitation task shares auditory perception with Identification and shares motor production with Read-Aloud. Its involvement of

Conclusion

This study reports findings from two experiments that adopted an L2 Imitation task as well as matched Identification and Read-Aloud tasks. While the raw accuracy and error rates of these two experiments are not directly comparable due to potential confounds related to the participants׳ language backgrounds and learning experiences, consistent patterns across experiments have shed some light on the invariant mechanisms in L2 imitation. The results from both experiments lead to the inference that

Acknowledgements

This work was supported by the NSF (Grant #BCS-04406540).

References (73)

  • A.S. Abramson

    The noncategorical perception of tone categories in Thai

  • A.S. Abramson et al.

    Voice-timing perception in Spanish word-initial stops

    Haskins Laboratories Status Report on Speech Research

    (1972)
  • W. Baker et al.

    Perceptual paths to accurate production of L2 vowels: The role of individual differences

    IRAL – International Review of Applied Linguistics in Language Teaching

    (2006)
  • J.R. Benkí

    Analysis of English nonsense syllable recognition in noise

    Phonetica

    (2003)
  • Bent, T. (2005). Perception and production of non-native prosodic categories (Ph.D. dissertation), Northwestern...
  • C.T. Best

    A direct realist view of cross-language speech perception

  • C.T. Best et al.

    Nonnative and second-language speech perception: Commonalities and complementarities

  • O.S. Bohn et al.

    Perception and production of a new vowel category by adult second language learners

  • G. Borden et al.

    Producing relatively unfamiliar speech gestures: A synthesis of perceptual targets and production rules

    Haskins Laboratories Status Report on Speech Research SR-66

    (1981)
  • A.R. Bradlow et al.

    Training Japanese listeners to identify English /r/ and /l/: IV. Some effects of perceptual learning on speech production

    The Journal of the Acoustical Society of America

    (1997)
  • D. Burnham et al.

    The perception of tones and phones

  • L. Chistovich et al.

    Mimicking and perception of synthetic vowels. Royal Institute of Technology Speech Transmission Lab (Sweden)

    Quarterly Progress and Status Report

    (1966)
  • A. Christophe et al.

    Phonological phrase boundaries constrain lexical access I. Adult data

    Journal of Memory and Language

    (2004)
  • E. Dupoux et al.

    A destressing “deafness” in French?

    Journal of Memory and Language

    (1997)
  • E. Dupoux et al.

    A robust method to study stress “deafness”

    The Journal of the Acoustical Society of America

    (2001)
  • E. Dupoux et al.

    Persistent stress ‘deafness’: The case of French learners of Spanish

    Cognition

    (2008)
  • J.E. Flege

    Age of learning affects the authenticity of voice‐onset time (VOT) in stop consonants produced in a second language

    The Journal of the Acoustical Society of America

    (1991)
  • J.E. Flege

    Production and perception of a novel, second-language phonetic contrast

    Journal of the Acoustical Society of America

    (1993)
  • J.E. Flege

    Second-language speech learning: Theory, findings, and problems

  • J.E. Flege et al.

    Effects of experience on non-native speakers׳ production and perception of English vowels

    Journal of Phonetics

    (1997)
  • J.E. Flege et al.

    Production and perception of English stops by native Spanish speakers

    Journal of Phonetics

    (1987)
  • J.E. Flege et al.

    Imitation of a VOT continuum by native speakers of English and Spanish: Evidence for phonetic category formation

    The Journal of the Acoustical Society of America

    (1988)
  • J.E. Flege et al.

    The effect of experience on adults׳ acquisition of a second language

    Studies in Second Language Acquisition

    (2001)
  • J.E. Flege et al.

    Native Italian speakers׳ perception and production of English vowels

    The Journal of the Acoustical Society of America

    (1999)
  • J.E. Flege et al.

    Interaction between the native and second language phonetic subsystems

    Speech Communication

    (2003)
  • C.A. Fowler et al.

    Rapid access to speech gestures in perception: Evidence from choice and simple response time tasks

    Journal of Memory and Language

    (2003)
  • O. Fujimura et al.

    Perception of stop consonants with conflicting transitional cues: A cross-linguistic study

    Language and Speech

    (1978)
  • S. Gass

    Development of speech perception and speech production abilities in adult second language learners

    Applied Psycholinguistics

    (1984)
  • H. Goto

    Auditory perception by normal Japanese adults of the sounds ‘‘L’’ and ‘‘R’’

    Neuropsychologia

    (1971)
  • P.A. Hallé et al.

    Identification and discrimination of Mandarin Chinese tones by Mandarin Chinese vs. French listeners

    Journal of Phonetics

    (2004)
  • Y.C. Hao

    The application of the Speech Learning Model to the L2 acquisition of Mandarin tones

    Proceedings of Tonal Aspects of Languages

    (2014)
  • K.H. Kang et al.

    Phonological systems in bilinguals: Age of learning effects on the stop consonant systems of Korean–English bilinguals

    The Journal of the Acoustical Society of America

    (2006)
  • P.K. Kuhl

    A new view of language acquisition

    Proceedings of the National Academy of Sciences

    (2000)
  • J.C. Ingram et al.

    Cross-language vowel perception and production by Japanese and Korean learners of English

    Journal of Phonetics

    (1997)
  • G. Jia et al.

    Perception and production of English vowels by Mandarin speakers: Age-related differences vary with amount of L2 exposure

    The Journal of the Acoustical Society of America

    (2006)
  • C.W. Kim

    On the autonomy of the tensity feature in stop classification (with special reference to Korean stops)

    Word

    (1965)
  • Cited by (30)

    • A non-contrastive cue in spontaneous imitation: Comparing mono- and bilingual imitators

      2021, Journal of Phonetics
      Citation Excerpt :

      If this is the case, the bilingual speakers will lengthen their VOT in the long VOT condition and raise their post-voiceless-stop f0 in the high f0 condition. This outcome, however, is not very likely in this study because the task in the current study involves English lexical items, unlike Hao and de Jong (2016) whose stimuli were nonsense words composed of simple syllables. The task and the stimuli of the current study are likely to put the participants in the phonological mode of perception as the stimulus complexity and the task demands are known to influence the processing of speech sounds (e.g., Strange, 2011).

    • Asymmetric memory for birth language perception versus production in young international adoptees

      2021, Cognition
      Citation Excerpt :

      In identification, it is necessary to know where the boundaries between the segmental or tonal categories fall, in order to be able to assign the input token correctly. In imitation, explicit categorization may help if it is available (Flege & Eefting, 1988), but imitation can also occur without it (Hao & de Jong, 2016; Llisterri, 1995; Llompart & Reinisch, 2018; Sheldon & Strange, 1982). Our findings are consistent with a scenario in which category boundary knowledge, and other category-defining information such as the relative weighting of phonetic cues, is available to these adoptees only once explicit training has awakened the traces of their stored earlier linguistic experience.

    • Development of Mandarin tones and segments by Korean learners: From naïve listeners to novice learners

      2021, Journal of Phonetics
      Citation Excerpt :

      However, this is not to say that learners will never benefit from auditory input during imitation. As shown in Hao and de Jong (2016), more advanced learners performed better in tone imitation than in a read-aloud task, indicating the beneficial role of auditory input. It is plausible that learners in advanced stages have established separate representations of T2 and T3 and additional auditory cues may have made it easier to flesh out the articulatory details of the tones in production.

    • The effect of input prompts on the relationship between perception and production of non-native sounds

      2020, Journal of Phonetics
      Citation Excerpt :

      In the present study, we compare a learner’s perception to their performance on different types of production tasks, one involving perception but not orthography and the other involving orthography but not perception, to identify the influence of input prompts cuing production on the perception-production relationship of non-native sounds. The previous results demonstrate that the effect of input prompts (auditory vs. orthographic) on non-native sound production is not uniform (e.g., Erdener & Burnham, 2005; Hao & de Jong, 2016). This suggests that the effect of input prompts on the perception-production relationship may also vary for different non-native sounds.

    View all citing articles on Scopus
    View full text