Exposure to a second language in infancy alters speech production

Megha Sundara; Nancy Ward; Barbara Conboy; Patricia K. Kuhl

doi:10.1017/S1366728919000853

Exposure to a second language in infancy alters speech production

Published online by Cambridge University Press: 29 January 2020

Megha Sundara ,

Nancy Ward ,

Barbara Conboy and

Patricia K. Kuhl

Show author details

Megha Sundara*: Affiliation:
Department of Linguistics, University of California, Los Angeles
Nancy Ward: Affiliation:
Department of Linguistics, University of California, Los Angeles
Barbara Conboy: Affiliation:
Department of Communication Sciences and Disorders, University of Redlands
Patricia K. Kuhl: Affiliation:
Institute for Learning & Brain Sciences, University of Washington
*: Address for correspondence: Megha Sundara, E-mail:megha.sundara@humnet.ucla.edu

Article contents

Abstract
Introduction
Experiment 1
Experiment 2
Experiment 3
Comparing bilingual, monolingual and monolingual infants with short term exposure to Spanish
General discussion
Footnotes
References

Rights & Permissions

Abstract

We evaluated the impact of exposure to a second language on infants’ emerging speech production skills. We compared speech produced by three groups of 12-month-old infants while they interacted with interlocutors who spoke to them in Spanish and English: monolingual English-learning infants who had previously received 5 hours of exposure to a second language (Spanish), English- and Spanish-learning simultaneous bilinguals, and monolingual English-learning infants without any exposure to Spanish. Our results showed that the monolingual English-learning infants with short-term exposure to Spanish and the bilingual infants, but not the monolingual English-learning infants without exposure to Spanish, flexibly matched the prosody of their babbling to that of a Spanish- or English-speaking interlocutor. Our findings demonstrate the nature and extent of benefits for language learning from early exposure to two languages. We discuss the implications of these findings for language organization in infants learning two languages.

Keywords

speech production bilingual English Spanish short term bilingual experience infants

Type: Research Article
Information: Bilingualism: Language and Cognition , Volume 23 , Issue 5 , November 2020 , pp. 978 - 991

DOI: https://doi.org/10.1017/S1366728919000853 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright: Copyright © The Author(s) 2020

Introduction

Infants demonstrate sophisticated speech perception abilities soon after birth. Like their monolingual peers, bilingual infants in the first year of life also discriminate their two languages (Byers-Heinlein, Burns & Werker, Reference Byers-Heinlein, Burns and Werker2010; Bosch & Sebastián-Gallés, Reference Bosch and Sebastián-Gallés2001), vowels in those languages (Albareda-Castellot, Pons & Sebastian-Galles, Reference Albareda-Castellot, Pons and Sebastian-Galles2011; Sundara & Scutellaro, Reference Sundara and Scutellaro2011), the consonants of both languages (Burns, Yoshida, Hill & Werker, Reference Burns, Yoshida, Hill and Werker2007; García-Sierra, Rivera-Gaxiola, Percaccio, Conboy, Romo, Klarman, Ortiz & Kuhl, Reference García-Sierra, Rivera-Gaxiola, Percaccio, Conboy, Romo, Klarman, Ortiz and Kuhl2011; Sundara, Polka & Molnar, Reference Sundara, Polka and Molnar2008; Ferjan Ramirez, Ramirez, Clarke, Taulu & Kuhl, Reference Ferjan Ramírez, Ramírez, Clarke, Taulu and Kuhl2017) and segment speech (Polka, Orena, Sundara & Worrall, Reference Polka, Orena, Sundara and Worrall2017), attesting to the high degree of behavioral and neural plasticity in early acquisition.

Acquiring a language involves not only becoming a native listener, but also becoming a native speaker. How and when does infants’ speech production become language-specific? It is uncontroversial that native language experience alters speech production at or after the one-word stage (Locke, Reference Locke1983). Thus, the earliest words of children speaking English or German (e.g., Kehoe, Reference Kehoe2015) are shorter, with more coda consonants when compared to those of children speaking Spanish (e.g., Roark & Demuth, Reference Roark, Demuth, Howell, Fish and Keith-Lucas2000; Lleó, Reference Lleó2006), Italian (Ingram, Reference Ingram1981) or Farsi (Keshavarz & Ingram, Reference Kesharvarz and Ingram2002), reflecting the tendencies in these languages.

In contrast, prior to the one-word stage, infants’ babbling has traditionally been thought to exhibit features that are independent of specific language experience (Oller, Reference Oller2000; see Buder, Warlaumont & Oller, Reference Buder, Warlaumont, Oller, Peter and MacLeod2013 for a review). Thus, infants exposed to many different languages produce quasi-vowels and glottal stops right after birth. Between 1 and 4 months of age, infants’ cooing typically involves the production of back consonants with vowels that are partially resonant. Subsequently, infants begin to produce fully resonant vowels between 3 and 8 months. At this point, infants begin to combine vowels with consonants, first with slow transitions, producing marginal syllables, then with faster transitions, producing canonical syllables (a repetitious string of syllables, as in “bababa”) between 5 and 10 months.

Empirical evidence over the past several decades has only partially supported the idea that babbling is independent of language experience. On the one hand, consistent with this account, the earliest productions of both monolingual and bilingual infants exhibit these universal tendencies (Oller & Eilers, Reference Oller and Eilers1982; Oller, Weiman, Doyle & Ross, Reference Oller, Wieman, Doyle and Ross1976; Oller, Eilers, Urbano & Cobo-Lewis, Reference Oller, Eilers, Urbano and Cobo-Lewis1997; Whalen, Levitt, Hsiao & Smorodinsky, Reference Whalen, Levitt, Hsiao and Smorodinsky1995). The similarities between the earliest productions of infants learning different languages have been proposed to be rooted in the anatomical constraints as well as immature speech motor control of infants. However, at least some research on monolingual infants learning different languages has shown that the distribution of segments, whether consonants or vowels, is shaped by the ambient language, even within the first year of life (Boysson-Bardies, Halle, Sagart & Durand, Reference Boysson-Bardies, Halle, Sagart and Durand1989; Boysson-Bardies & Vihman, Reference Boysson-Bardies and Vihman1991; Boysson-Bardies, Vihman, Roug-Hellichius, Durand. Landberg & Arao, Reference Boysson-Bardies, Vihman, Roug-Hellichius, Durand, Landberg, Arao, Ferguson, Menn and Stoel-Gammon1992; Rvachew, Alhaidary, Mattock & Polka, Reference Rvachew, Alhaidary, Mattock and Polka2008; but see also Levitt & Utman, Reference Levitt and Utman1992; Rvachew, Mattock, Polka & Ménard, Reference Rvachew, Mattock, Polka and Menard2006; and Lee, Davis & MacNeilage, Reference Lee, Davis and MacNeilage2010).

More compelling evidence that babbling in the first year of life is affected by language experience comes from research assessing the supra-segmental characteristics of infant babbling. Several researchers have suggested that infants reproduce the prosodic characteristics of their ambient language before its segmental patterns (e.g., Crystal, Reference Crystal, Fletcher and Garman1979; Levitt & Wang, Reference Levitt and Wang1991). Cross-linguistic research shows that monolingual infants between 8-12-months of age begin to produce the characteristic intonation (Whalen, Levitt & Wang, Reference Whalen, Levitt and Wang1991), syllable, and word-form shapes (Levitt & Utman, Reference Levitt and Utman1992; Levitt & Wang, Reference Levitt and Wang1991; Lleó, Prinz, El Mogharbel & Maldonado, Reference Lleó, Prinz, El Mogharbel, Maldonado, Johnson and Gilbert1996) of the specific language to which they are exposed. Perhaps this is not surprising given that the human fetus responds to the supra-segmental properties of speech input by about 30 weeks of gestation (Kisilevsky, Hains, Lee, Xie, Huang, Ye, Zhang & Wang, Reference Kisilevsky, Hains, Lee, Xie, Huang, Ye, Zhang and Wang2003), and even newborns’ cries reflect the prosodic characteristics of their mother's language (Mampe, Friederici, Christophe & Wermke, Reference Mampe, Friederici, Christophe and Wermke2009). Even in the absence of such prenatal experience, the rhythm of manual babbling by 7-month-olds learning sign language is altered by their ambient language experience (Petitto, Holowka, Sergio, Levy & Ostry, Reference Petitto, Holowka, Sergio, Levy and Ostry2004; Petitto, Holowka, Sergio & Ostry, Reference Petitto, Holowka, Sergio and Ostry2001).

In sum, it is controversial how early infants exposed to a specific language begin to produce speech patterns characteristic of that language. Researchers comparing babbling of monolingual infants must contend with considerable variation that comes from comparing behaviors across different infants. By comparing the babbling of infants with exposure to two languages, researchers can treat each infant as a control for herself, allowing for more nuanced comparisons. Thus, ultimately, the most compelling evidence for whether language experience affects infant babbling is likely to come from bilingual infants.

In this paper we present infant babbling data from bilingual English- and Spanish-learning infants (Experiment 1), and monolingual English-learning infants with and without short term exposure to Spanish (Experiments 3 and 2, respectively). With data from these three groups we show that, by 12 months, infants can alter their babbling to match the prosody of English and Spanish interlocutors, but only if they have had at least some prior exposure to both languages.

Experiment 1

There have been only a few studies focused on investigating whether babbling by bilingual infants has language specific characteristics. In one longitudinal single-subject investigation, a French–English bilingual 10-month-old was reported to produce more multisyllabic utterances and fewer sounds per syllable when interacting with a French-speaking interlocutor than with an English-speaking one (Maneva & Genesee, Reference Maneva, Genesee, Skarbela, Fish and Do2002). Similarly, a Spanish–English bilingual 12- to 13-month-old was reported to produce fewer coda consonants in Spanish contexts than in English contexts (Andruski, Cassielles & Nathan, Reference Andruski, Casielles and Nathan2014). However, a study with a larger group of English- and French-learning bilingual 13.5-month-olds failed to find differences in the production of consonants between the two language contexts (Poulin-Dubois & Goodz, Reference Poulin-Dubois, Goodz, Cenoz and Genesee2001, see also Zlatić, MacNeilage, Matyear & Davis, Reference Zlatić, MacNeilage, Matyear and Davis1997 for a twin sibling study where the two language contexts were not separated). One possible reason for these differences across studies is that younger infants have more nuanced control over the production of prosody rather than the segmental characteristics of the ambient language, given constraints on their developing anatomy and motor control. This could explain why Maneva and Genesee (Reference Maneva, Genesee, Skarbela, Fish and Do2002) and Andruski et al. (Reference Andruski, Casielles and Nathan2014) found differences in the prosodic features of babbling, but Poulin-Dubois and Goodz did not find differences in the segmental features of babbling. However, given that Maneva and Genesee (Reference Maneva, Genesee, Skarbela, Fish and Do2002) and Andruski et al. (Reference Andruski, Casielles and Nathan2014) each tested only one infant, their findings may not generalize to a larger population.

Besides being the strongest test case for language-specific effects on speech production, an investigation of babbling by infants exposed to more than one language can also shed light on a second controversy: how are the two languages of bilinguals represented? There is now a consensus that infants growing up bilingual do not start out with a fused representation of their two languages (e.g., Genesee, Reference Genesee1989; De Houwer, Reference De Houwer1990), although the extent to which the two languages develop autonomously or interdependently continues to be debated (e.g., Hammer, Hoff, Uchikoshi, Gillanders, Castro & Sandilos, Reference Hammer, Hoff, Uchikoshi, Gillanders, Castro and Sandilos2014). Early differentiation of the two languages is supported by research showing that newborns, whether monolingual (Byers-Heinlein et al., Reference Byers-Heinlein, Burns and Werker2010; Mehler, Juscyzk, Lambertz, Halstead, Bertoncini & Amiel-Tison, Reference Mehler, Jusczyk, Lambertz, Halsted, Bertoncini and Amiel-Tison1988; Moon, Panneton-Cooper & Fifer, Reference Moon, Panneton-Cooper and Fifer1993; Nazzi, Bertoncini & Mehler, Reference Nazzi, Bertoncini and Mehler1998) or bilingual (Byers-Heinlein et al., Reference Byers-Heinlein, Burns and Werker2010), are able to distinguish between prosodically dissimilar languages. By 4 to 5 months, monolingual and bilingual infants are also able to differentiate their native language from a prosodically similar language (Bahrick & Pickens, Reference Bahrick and Pickens1988; Bosch & Sebastián-Gallés, Reference Bosch and Sebastián-Gallés1997, Reference Bosch and Sebastián-Gallés2001; Nazzi, Juscyzk & Johnson, Reference Nazzi, Jusczyk and Johnson2000). This early ability to discriminate languages is likely to support the separation of the two native languages of the bilingual infant.

Empirical evidence from older bilingual children is also consistent with a differentiated representation of the two languages early in verbal development. There is evidence for differentiated systems in bilingual children's productions at the emerging socio-pragmatic (e.g., Genesee, Nicoladis & Paradis, Reference Genesee, Nicoladis and Paradis1995), syntactic (e.g., Meisel, Reference Meisel and Meisel1990; Paradis & Genesee, Reference Paradis and Genesee1996), semantic (e.g., Quay, Reference Quay1995), as well as word level (e.g., Ingram, Reference Ingram1981; Lleó, Reference Lleó2002, Reference Lleó2006; Lleó, Kuchenbrandt, Kehoe & Trujillo, Reference Lleó, Kuchenbrandt, Kehoe, Trujillo and Müller2003). Moreover, at least some bilingual infants differentiate the shapes of their earliest words across the two languages by their first birthday (Vihman, Reference Vihman2016). These abilities continue to develop such that, by their second year, bilingual infants are well able to differentiate the prosodic shapes of their utterances (e.g., Lleó, Reference Lleó2002) as well as vary the number of closed syllables across their two languages (Ingram, Reference Ingram1981; Kehoe, Reference Kehoe2015; Lleó et al., Reference Lleó, Kuchenbrandt, Kehoe, Trujillo and Müller2003).

In contrast, research on the developing sound system of bilinguals, even beyond the first word stage, has produced mixed results. Some studies have reported that bilingual children older than 2 do not produce language-specific differences in vowels and consonants (e.g., Kehoe, Lleó & Rakow, Reference Kehoe, Lleó and Rakow2004). Others have found evidence that bilingual children produce language-specific differences in some, but not all segments (e.g., Johnson & Wilson, Reference Johnson and Wilson2002; Kehoe et al., Reference Kehoe, Lleó and Rakow2004). Still others have found that children produce differences in the segments of their two languages at all ages (e.g., Ingram, Reference Ingram1981; Johnson & Lancaster, Reference Johnson and Lancaster1998; Khattab, Reference Khattab, Solé, Recasens and Romero2003; Paradis, Reference Paradis2001). In sum, we can see that language-specific influences are more likely to emerge in the prosodic rather than segmental characteristics of early speech of both monolingual and bilingual infants.

In Experiment 1, we compared the prosodic properties of babbling produced by bilingual Spanish- and English-learning 12-month-olds while they interacted with a Spanish- or an English-speaking interlocutor, in order to establish the early precursors of bilingual speech production. Babbling in the two sessions was characterized using two measures – proportion of multisyllabic utterances and proportion of utterances produced with closed syllables – because of differences in English and Spanish in these prosodic properties. Conversational English is predominantly monosyllabic — about 80% of words in English have one syllable (Cutler & Carter, Reference Cutler and Carter1987). In contrast, 80% of words in Spanish have more than one syllable (Roark & Demuth, Reference Roark, Demuth, Howell, Fish and Keith-Lucas2000). English and Spanish also differ in how often syllables end in a consonant — in English, roughly 60% of syllables end in consonants compared to only 25% in Spanish (Roark & Demuth, Reference Roark, Demuth, Howell, Fish and Keith-Lucas2000). We expected that, if babbling at 12 months is language-specific, bilingual infants should produce longer utterances with more open syllables when interacting with a Spanish-speaking interlocutor compared to an English-speaking interlocutor. This would also provide evidence that bilingual infants differentiate between their two languages in speech production.

Methods

Subjects

Ten bilingual 12-month-old infants (average age: 368 days, Range: 353-394 days; 5 girls) participated in the study. All infants were reported by their parents to be full-term (38-42 weeks gestation) and healthy on the date of testing with no history of ear infections or speech or hearing difficulties. Infants were included in the bilingual group only if they were learning both languages at home and their daily language input was at least 20%, but no more than 80% in Spanish (average: 41%; Range: 20-80), based on a detailed language questionnaire administered to parents (Bosch & Sebastián-Gallés, Reference Bosch and Sebastián-Gallés2001; Sundara & Scutellaro, Reference Sundara and Scutellaro2011). English and Spanish short-form versions of the MacArthur-Bates Communicative Development Inventory (CDI) (Fenson, Pethick, Renda, Cox, Dale & Reznick, Reference Fenson, Pethick, Renda, Cox, Dale and Reznick2000; Jackson-Maldonado, Marchman & Fernald, Reference Jackson-Maldonado, Marchman and Fernald2013) were administered to the parents of the infants to measure early language and gestural communication skills. These results are summarized in Appendix A.

Design and Procedure

All infants came to the lab for one visit and participated in two consecutive 30-minute recording sessions; one of the sessions was conducted exclusively in Spanish and the other in English. If necessary, infants were given a break between the two sessions. Recordings were done in a sound-attenuated booth. The first recording session was always with the infant's parent (four in Spanish; six in English), and the second session with a bilingual research assistant (in the other language). The language of the session with the parent was the one habitually used by that parent with the infant.

The parent and the research assistant were in the room for both sessions. This allowed the infant to become comfortable with the lab set-up as well as the research assistant. During the parent's session, the research assistant was instructed not to talk to the infant and to only interact non-verbally when approached by the infant. Similarly, in the research assistant's session, the parent was instructed not to talk to the infant and to interact non-verbally only when approached by the infant. Both adults were asked to cease talking when the infant was vocalizing. The adult interlocutors were provided with a set of quiet toys.

Using the Audio-Technica ATW-T701 wireless microphone system, a stereo recording was made for each babbling session (sampling rate = 44.1 kHz; 16-bit resolution) with the adult interlocutor on one track, and the infant on the other. The microphone was attached to the side of a vest that the infant wore. The adult interlocutor wore a similar microphone attached to his/her clothing. All recordings were made using the Pro-Tools software.

Coding

All acoustic analyses were done in PRAAT, using a combination of waveforms and spectrograms (Boersma & Weenink, Reference Boersma and Weenink2010). Each of the babbling sessions was first segmented into “utterances.” Utterances were defined as a string of infant vocalizations that were separated by at least 700 ms of silence, with no more than 450 ms of silence within the utterance (Levitt & Wang, Reference Levitt and Wang1991). Next, based on the criteria described in Oller (Reference Oller, Lindblom and Zetterstrom1986) and Rvachew, Creighton, Feldman, and Sauve (Reference Rvachew, Creighton, Feldman and Sauve2002), each syllable in every utterance was classified into one of four categories – fully resonant vowel, canonical syllable, marginal syllable and other, i.e., non-speech sounds.

Fully resonant vowels were defined as “vowel-like utterances with at least two measurable formants and resonances above 1200 Hz, in addition to resonances in the lower frequency range.” Canonical syllables were syllables that contained a non-glottal consonant, with transitions lasting 25-120 ms, and containing a fully resonant vowel. Syllables also had to be between 100 and 500 ms in duration to be classified as canonical. Marginal syllables were syllables with a consonant and fully resonant vowel that failed to meet any one of the criteria for canonical syllables. All coding was consistent with the criteria described by Rvachew et al. (Reference Rvachew, Creighton, Feldman and Sauve2002).

The “non-speech” utterances included quasi-resonant vowels, squeals, cries, whispers, raspberries, and utterances with abnormal phonation. Non-speech utterances were excluded from the analysis. Because we were only assessing the prosodic, not the segmental, characteristics of the babbling, we also excluded the utterances with only fully resonant vowels that lacked consonants. The final dataset included all utterances with canonical and marginal syllables (Average duration:1.12s). The results are similar for marginal and canonical syllables; hence we do not report them separately.

Utterances were extracted from the recordings, so that transcribers were blind to the language spoken by the interlocutor, and independently coded by two transcribers. A third transcriber, who was also blind to the language spoken by the interlocutor, adjudicated in case of disagreements. For an utterance to be included in the analysis, two out of three transcribers needed to agree that the utterance was speech-like. All speech-like utterances were then coded as mono- or multisyllabic. Monosyllabic utterances were those with only one fully resonant vowel within the utterance. Multisyllabic utterances contained two or more fully resonant vowels separated by consonants. Each syllable was also coded as open (V, CV) or closed (VC, CVC).

Statistical analysis

As recommended by Jaeger (Reference Jaeger2008), we used linear mixed logistic regression models to analyze two binary outcomes, whether or not an utterance (a) was multisyllabic, or (b) had a closed syllable. In principle, bilingual infants could have produced utterances with multiple closed syllables, in which case our coding would underestimate infants’ ability to produce closed syllables. In fact, less than 1% of the utterances had more than one closed syllable separated by (less than 450ms long) silence (11 out of 1417, 0.8%); thus, we could not analyze the number of closed syllables as a continuous dependent variable because of the extremely small number of utterances with more than one closed syllable. The binary coding allowed us to analyze both dependent variables using mixed logistic regression models.

We modeled the log odds (logit) of each of the two outcomes, e.g., a bilingual infant producing a multisyllabic utterance, as a function of the language of the interlocutor (Spanish vs. English) weighted by the total number of utterances produced by that child in that specific session. The weighting was included to adequately represent the substantial variation in the speech output across infants (e.g., the number of utterances produced in one session ranged from 9 to 212). With this weighting individual infants influenced the results to the extent proportionate to the quantity of their outputFootnote ¹.

Additionally, a random intercept for each infant, and a random slope for the language of the interlocutor, were also included. The random intercept allowed us to model variability across infants’ anatomical development as well as speech motor control that might influence their overall ability to produce developmentally later-acquired monosyllabic utterances and closed syllables. The random slope was included to allow for differences in the degree to which each bilingual infant was able to alter her speech production in the two languages. This could either be due to absolute differences in the amount of input each infant received in Spanish and English in her daily life, or due to differences in the infants’ uptake of that language input.

The final model was determined by backward stepwise comparison. Each effect was removed from the model, one at a time, and the log likelihood of the two resulting models that were in a subset relationship were compared using a Likelihood Ratio test. This was done to determine if the inclusion of factors significantly improved model fit. All analyses were implemented in R (R Core Development Team, 2013) using the lme4 package (Bates, Maechler, Bolker & Walker, Reference Bates, Maechler, Bolker and Walker2015).

Results

The final sample included 1417 utterances consisting of 3835 syllables (see Table 1 for a breakdown by the language of the interlocutor). Across the two languages, half the multisyllabic utterances had just two syllables. There was no significant difference in the number of utterances bilingual infants produced in the Spanish (Average: 77; Range: 20-212) and the English session (Average: 65; Range: 9-187), t(9), = 1.6, p = 0.14, d = 0.20. There was also no significant difference in the number of utterances bilingual infants produced with their parents (Average: 70; Range: 9-187) and the research assistant (Average: 72; Range: 20-212), t(9), = −0.25, p = 0.8, d = −0.03, indicating that the infants were comfortable in the lab set-up. The results for length of utterance and syllable shape are presented in Figures 1 and 2, respectively.

Table 1. Distribution of utterances produced by infants with Spanish- and English-speaking interlocutors in Experiments 1-3.

Fig. 1. Distribution of multisyllabic utterances produced by 12-month-old infants in the bilingual (n = 10), monolingual English (n = 10), and monolingual English with short term exposure to Spanish (n = 10) groups.

Fig. 2. Distribution of utterances with closed syllables produced by 12-month-old infants in the bilingual (n = 10), monolingual English (n = 10), and monolingual English with short term exposure to Spanish (n = 10) groups.

Length of utterance

Overall, 79% of the utterances in the Spanish session (Range = 64-100) and 70% of the utterances in the English session (Range = 52-89) were multisyllabic. Further, 9 out of 10 bilingual infants produced more multisyllabic utterances with the Spanish-speaking interlocutor than with the English-speaking interlocutor.

To evaluate these differences, a mixed effects logistic regression model was fitted to predict whether the syllable was multisyllabic. The final model included the fixed effects of the language of the interlocutor and the number of utterances produced in that session (Table 2). The random slope for language of interlocutor was not included in the final model because it did not significantly improve model fit, χ ²(2) = 3.96, p = 0.14Footnote ².

Table 2. Summary of fixed effects for bilingual and monolingual infants.

The significant positive intercept indicates that, overall, infants were likely to produce more multisyllabic than monosyllabic utterances. This is unsurprising, given previous research showing that, across languages, infants produce multisyllabic utterances earlier in development than monosyllabic ones (Davis & MacNeilage, Reference Davis and MacNeilage1995; Kern & Davis, Reference Kern, Davis, Pellegrino, Marnicoladissico, Chitoran and Coupe2010; Locke, Reference Locke1983). Crucially, the language of the interlocutor significantly predicted the log odds of producing a multisyllabic utterance [χ ²(1) = 22.0, p < 0.001]. The positive estimate for language of interlocutor indicates that infants were more likely to produce a multisyllabic utterance with the Spanish-speaking interlocutor.

Syllable shape

Overall, 11% of the utterances produced by the bilingual infants in each of the two language sessions contained closed syllables (Spanish Range = 3-31; English Range = 4-16). Out of 10 subjects, 6 produced more closed syllables with the English-speaking interlocutor than with the Spanish-speaking interlocutor.

To evaluate these differences, another mixed-effects logistic regression model was fitted to predict whether an utterance had a closed syllable. The final model included the fixed effects of the language of the interlocutor and the number of utterances produced in that session (Table 2). The random slope for language of the interlocutor was included because it significantly improved model fit [χ ²(2) = 6.84, p = 0.03].

The significant negative intercept indicates that, overall, infants were more likely to produce utterances with open rather than closed syllables. Again, this is unsurprising, given previous research that infants produce open syllables earlier in development than closed syllables (Davis & MacNeilage, Reference Davis and MacNeilage1995; Kern & Davis, Reference Kern, Davis, Pellegrino, Marnicoladissico, Chitoran and Coupe2010; Locke, Reference Locke1983). The language of the interlocutor, however, did not significantly predict the log odds of producing an utterance with a closed syllable [χ ²(1) = 0.42, p = 0.51]. The finding that the random slope for the language of the interlocutor, but not the fixed effect of language of the interlocutor, was a significant contributor to model fit shows that there was a large amount of variability in the production of closed syllables in Spanish and English sessions across the bilingual infants. It also suggests that only some infants showed the ability to manipulate syllable shapes across languages.

To summarize, bilingual 12-month-olds produced more multisyllabic utterances with a Spanish- compared to an English-speaking interlocutor, a difference that is consistent with the prosody of the target language. Thus, the babbling of pre-lexical infants shows language-specific characteristics. Such language-specific differences in the production of the two languages are also consistent with the argument that there are separate representations of the two languages in bilingual infants as young as 12 months of age.

What bilingual 12-month-olds did not do, as a group, was to alter the shape of the syllables (i.e., open or closed) to match that of the language of their interlocutors. We can rule out the possibility that developmental immaturity severely limited the bilingual 12-month-olds’ ability to alter the proportion of closed syllables as a function of their ambient language because, in previous research, both monolingual and bilingual infants have been reported to produce closed syllables among their earliest words at the same age (Kehoe, Reference Kehoe2015; Lleó et al., Reference Lleó, Kuchenbrandt, Kehoe, Trujillo and Müller2003). It is, however, possible that specific exposure to Spanish, a language that has few closed syllables, limited the bilingual infants’ production of closed syllables in English. We discuss this possibility later in the paper after comparing the proportion of closed syllables produced by bilingual Spanish–English infants and monolingual English infants.

Experiment 2

In Experiment 1, we showed that bilingual 12-month-olds altered the length of their utterances, but not syllable shape, as a function of the language of the interlocutor. In Experiment 2, we tested monolingual English-learning 12-month-olds using the same set-up as in Experiment 1, to determine whether any previous exposure to Spanish is necessary for infants to be able to systematically alter their speech production when interacting with a Spanish-speaking interlocutor.

In recent years, there have been several lines of research showing that phonetic imitation, also called phonetic convergence, is a powerful learning mechanism by which children might become native speakers. Newborn infants have been shown to imitate tongue and lip gestures that form the precursors of early speech (Meltzoff & Moore, Reference Meltzoff and Moore1983, Reference Meltzoff and Moore1997; Kugiumutzakis, Reference Kugiumutzakis, Nadel and Butterworth1999). By 4.5-months of age, infants are further able to imitate vowel sounds (Kuhl & Meltzoff, Reference Kuhl and Meltzoff1996). Even adults, without conscious control, alter their speech production to sound like their interlocutors in an immediate and automatic process (e.g., Babel, Reference Babel2012; Goldinger, Reference Goldinger1998; Nielsen, Reference Nielsen2011). Experiment 2 allowed us to test whether 12-month-old monolingual English-learning infants can rapidly converge on the speech of their interlocutors, even in the absence of previous exposure to the language spoken by the interlocutors (in this case, Spanish).