High or low? Comparing high and low-variability phonetic training in adult and child second language learners

Background High talker variability (i.e., multiple voices in the input) has been found effective in training nonnative phonetic contrasts in adults. A small number of studies suggest that children also benefit from high-variability phonetic training with some evidence that they show greater learning (more plasticity) than adults given matched input, although results are mixed. However, no study has directly compared the effectiveness of high versus low talker variability in children. Methods Native Greek-speaking eight-year-olds (N = 52), and adults (N = 41) were exposed to the English /i/-/ɪ/ contrast in 10 training sessions through a computerized word-learning game. Pre- and post-training tests examined discrimination of the contrast as well as lexical learning. Participants were randomly assigned to high (four talkers) or low (one talker) variability training conditions. Results Both age groups improved during training, and both improved more while trained with a single talker. Results of a three-interval oddity discrimination test did not show the predicted benefit of high-variability training in either age group. Instead, children showed an effect in the reverse direction—i.e., reliably greater improvements in discrimination following single talker training, even for untrained generalization items, although the result is qualified by (accidental) differences between participant groups at pre-test. Adults showed a numeric advantage for high-variability but were inconsistent with respect to voice and word novelty. In addition, no effect of variability was found for lexical learning. There was no evidence of greater plasticity for phonetic learning in child learners. Discussion This paper adds to the handful of studies demonstrating that, like adults, child learners can improve their discrimination of a phonetic contrast via computerized training. There was no evidence of a benefit of training with multiple talkers, either for discrimination or word learning. The results also do not support the findings of greater plasticity in child learners found in a previous paper (Giannakopoulou, Uther & Ylinen, 2013a). We discuss these results in terms of various differences between training and test tasks used in the current work compared with previous literature.


4
Introduction 51 Phonetic training studies in adults 52 One of the most challenging aspects of learning a second language (L2) is learning to 53 accurately perceive novel phonetic categories. This is particularly difficult when the mapping 54 between phonetic properties and phonological categories is mismatched between the first 55 language (L1) and L2 (Best, 1995;Flege, 1995;Giannakopoulou et al., 2011). A substantial 56 body of literature has demonstrated that adult learners can improve their discrimination and 57 identification of non-native speech-sounds through phonetic training, but that effective 58 generalization may depend upon encountering sufficiently varied stimuli during training. For 59 example, in an early attempt to train a non-native contrast, Strange and Dittmann (1984) 60 trained native Japanese speakers on the English /r/-/l/ contrast using a discrimination task in 61 which participants made same-different judgements about stimuli from a synthetic rock-lock 62 continuum, receiving immediate trial by trial feedback. Variability was present in the form of 63 the ambiguous intermediate stimuli along the continuum, however there was a single 64 phonetic context and a single (synthesized) talker. Participants were given a variety of 7 trained adults and 12 year old native Dutch speakers, with no previous Finnish experience, to 126 discriminate the Finnish length contrast /t/-/t:/. This was a five-day study with a pre-test and 127 post-test on the first and last days and 3 days of training. Training consisted of an 128 identification paradigm (participants identified stimuli as "short t" or "long t" and received 129 feedback). Stimuli were 7-step continua created from recordings of five talkers. Pre-and 130 post-tests included identification and discrimination within and across category boundaries. 131 Although adults again out performed children overall, both age groups showed reliable 132 increases in sensitivity in the newly trained category boundaries and, critically, there were 133 again similar levels of improvement at each age. This result might appear to corroborate that 134 of Wang and Kuhl (2003), however the lack of age effects may be due to a different reason,   However adolescents and older children improved more than either 6-8 year olds or adults. 149 Shinohara and Iverson (2013,2015) interpret the increased learning in older children and 150 8 adolescents compared with adults as due to their less fossilized brain plasticity and lesser 151 interference from developed L1 phonetic units. Lesser learning in the 6-8 year olds, which 152 was unpredicted in a plasticity account, was explained as being a result of difficulties with the 153 tasks due to an immaturity of phonemic awareness. 154 One difference between the paradigms used by Schouten (2008, 2010) 155 and Shinohara andIverson (2013, 2015) is the length of training: Heeren and Schouten used 156 three training sessions whilst Shinohara and Iverson used 10. If children's early learning is 157 slower than that of adults (Snow & Hoefnagel-Hohle, 1978), this could potentially account 158 for why Shinohara and Iverson saw a plasticity benefit (at least for older children compared 159 to adults), whilst Heeren and Schouten didn't. Some evidence for this comes from a final 160 study by Giannakopoulou et al. (2013a), who also found a plasticity benefit (greater learning 161 in 7-8 year olds than in adults), but also found evidence that this maturational difference only 162 emerged after several sessions of exposure. This study used high-variability phonetic training 163 to train the tense-lax English vowel contrast /i/ versus /ɪ/ (e.g. bean-bin) with child (7-8 164 years) and adult (20-30 years) native Greek learners of L2 English. The study explored both 165 age effects and cue weightings, through the use of natural and modified duration stimuli 166 (whereby duration cues were equalised and not relevant or were reliable cues). A pre-test, 167 training, post-test paradigm was used, with training consisting of 10 sessions using an 168 identification task (identifying the correct member of a minimal pair given written stimuli) 169 with high-variability stimuli (45 minimal pairs produced by two male and two female 170 speakers). Half of the participants of each age group were trained with modified stimuli (no 171 duration cues) and half with natural stimuli. Participants were given the option to replay any 172 given stimulus and feedback was provided in the form of a video-game style animation. 173 Although Greek adults, who started with more years of L2 education, generally performed 174 better than Greek children at pre-test, high-variability perceptual training improved 175 9 performance for both groups (child and adults) and across all tasks (perceptual identification 176 and discrimination for both natural and modified stimuli conditions). However, critically, 177 children improved more than adults. Importantly, the results from Giannakopoulou et al. 178 (2013a)'s training task, which were recorded each day, suggested that children's 179 identification performance only overtook that of adults by session 7 (see Giannakopoulou et 180 al., 2013a, Figures 11-14). This suggests that plasticity benefits, which are seen in this study 181 and in that of Shinohara  greater difficulty with the learning task due to less well developed phonemic awareness, one 187 possibility is that the /i/ versus /ɪ/ contrast is somehow more salient for Greek speakers than 188 the /l/ versus /r/ contrast is for Japanese speakers, even for younger children. This may be 189 partially due to the length cue present in these stimuli: the difference between children and 190 adults was more marked for the natural stimuli compared with the modified stimuli, 191 suggesting that the children may have been particularly relying on durational cues during 192 training. This is in line with research showing that non-native listeners from many different 193 language backgrounds tend to rely heavily on the duration cue when discriminating English 194 tense-lax vowel pairs (unlike native listeners who rely more on formant frequency, e.g. Flege,195 Bohn & Jang, 1997). The reliance on durational cues may also have been exacerbated by the 196 use of written stimuli during training since English spelling provides an additional analogue 197 length cue (two letters such as ee and ea are often used to represent the longer vowel /i/ while 198 a single letter such as i is more often used to represent the shorter vowel /ɪ/) which may aid 199 learning (see also Giannakopoulou et al., 2013b incorporate a wider variety of cues into their representations, allowing them to form more 250 "associative hooks" and robust representations for the target words. Note that this explanation 251 is subtly different from the standard explanation as to why high-variability input benefits 252 phonetic learning, which is assumed to stem from learning to ignore phonemically irrelevant 253 information and thus is specifically important for generalization (whereas the benefit of word 254 learning tasks should hold for both trained and novel talkers at test).

255
Turning to children, no study has specifically explored whether high-variability input 256 is more effective than low-variability input for L2 phonetic training in children. However, 257 there is some research into word learning in L1 which suggests a role for high-variability in 258 infants. This research has been conducted with infants in the early stages of word learning 259 (around 14 months). A surprising finding with this age group is that even if they have 260 mastered a particular L1 phoneme contrast (i.e. they discriminate between relevant phoneme 261 contrasts and fail to discriminate non-native contrasts) they may have difficulties learning 262 new words that differ by this contrast. For example, Stager and Werker (1997) found that 263 when 14 month olds were exposed to two novel words which formed a minimal pair (/bɪ/ and 264 /dɪ/) paired with two novel objects, they did not later differentiate between trials in which the 265 word-object pairing was identical versus opposite to that previously seen in habituation (the 266 so-called "switch task"). This is despite the fact that children of this age were shown to be 267 able to discriminate /p/ and/d/ outside of the context of a word learning task. This effect has 268 been demonstrated many times (see Werker & Curtin, 2005, for a review), critically, 269 however, Rost and McMurray (2009) demonstrated that it is affected by the variability of the 270 exposure set. Using a similar switch task to Stager and Werker (1997), they replicated the 271 null effect when the novel words (/buk/ and /puk/ in their study) were produced by a single 272 talker, but showed that infants of the same age did differentiate between the minimal-pair 273 novel words when exposed to the novel words spoken by multiple talkers. Rost and not able to differentiate the minimal pair after exposure to these stimuli, nor were they able to 282 do so when exposed to the items spoken by a single talker but with variation in multiple 283 acoustic cues to the contrast (VOT, F0 transition, and burst amplitude). In contrast, infants 284 were able to discriminate between the minimal pairs when exposed to the items spoken by talker) cues. The intuition behind the model is that associative learning will pick up on any 292 consistent relationships across instances, meaning that both relevant and irrelevant acoustic 293 cues will become associated with an object if they are highly consistent, as is the case when 294 tokens all come from the same talker and thus are highly similar. The association of these 295 irrelevant talker cues reduces the contrast established by the phonetic cues since they are 296 shared across the words and provide evidence that they are the same. Note that a similarity 297 between this explanation and that that of Sommers (2005, 2014 In sum, there is a long standing assumption that more variable input is more beneficial 302 in L2 phonetic training, although very few published studies have actually directly tested this 303 in adults, and none have done so in children. There is also evidence that variable input is 304 beneficial in adult L2 and infant L1 vocabulary learning, which has been interpreted in terms 305 of the formation of robust, speaker independent representations.

306
The current study 307 The current experiment adds to the small literature exploring phonetic training in both of variability on learning at both the phonetic and lexical levels. In addition, using pictures 320 allowed us to avoid using orthography in training, addressing the concern that learners in  For example the number and duration of the training sessions were approximately the same, 325 training used an animation to provide feedback and a "replay" button allowed participants to 326 hear repeat stimuli if necessary. 327 However, our test of phonetic learning was a three-interval oddity discrimination task, 328 rather than an identification task, in order to avoid using orthography but still be able to test 329 both trained and untrained items. The inclusion of untrained items was important since high-330 variability is specifically predicted to benefit generalization -i.e. exposure to multiple talkers 331 should aid the ability to ignore phonetically irrelevant information. Voice novelty and word 332 novelty were manipulated separately (since it is possible that exposure to multiple speakers 333 might specifically benefit generalization across talkers, rather than generalization more 334 broadly).

335
Our primary measure was the extent to which training strengthened lexical 336 representations, since we assumed that our participants would begin the study with some 337 knowledge of the words. We chose to focus on the links between the forms and their 338 meanings and used a primed auditory lexical decision task to tap semantic representations in 339 the L2 via cross language priming (i.e. semantic priming from L2 to L1). This was adapted   377 (Additional participants were tested but we were unable to use their datasets due to their 378 dropping out of the experiment, recording errors, and other errors in testing due to difficulties 379 of testing on multiple computers in a busy school environment, and that some of our testing  Table 1. The experimental stimuli consisted of 20 real-word minimal pairs (e.g. ship-sheep) 397 and 20 non-word minimal pairs (e.g., stin-steen) containing the English tense-lax vowel 398 distinction (non-word minimal pairs were created so that they matched the real word minimal 399 pairs as closely as possible in their final syllables; see S1 Table for a list of all stimuli).

400
Participants learned the real-word minimal pairs in the training task, but were tested on both 401 these real-word items from training and non-word minimal pairs not included in training.

402
This allowed us to test both trained and novel items. English words were selected from free online databases.

408
In addition to the main experimental stimuli, a second set of stimuli were developed 409 for a primed auditory lexical decision task. In this task, primes could be either English words 410 or Greek words, and targets were either semantically-related to the prime, semantically-411 unrelated to the prime or non-words (see Table 2 for examples, and S2 prime words did not differ significantly in frequency, t(38) = -1.12, p = .27. In addition, the 427 number of nouns, verbs, and adjectives was identical in both lists.

428
The semantically-related target word for the English primes was the Greek translation.   Greek introduction: In this task, a picture of one of the minimal-pair words was 477 presented centrally on the screen, and participants heard the corresponding Greek word (see 478 Figure 2). Each minimal-pair word was presented once each in a random order. This task was 479 included to ensure that participants accessed the correct meaning for each picture since not all 480 items were concrete nouns (e.g., leap, slip etc.). No data were recorded from this task. another button in the middle of the screen to hear four possible English words which were 485 each "spoken" by one of four frogs which appeared at the bottom of the screen (see Figure 2). 486 If the participant selected the correct English word, they received positive feedback (Greek 487 translation of correct ("σωστό")) and the English word was replayed. If the wrong word was   (Table 3).  Examples of each trial type are provided in Table 2 and a screenshot of the task is provided in  The task began with eight practice trials with feedback, followed by 160 experimental 533 trials without feedback. Participants were instructed to make a word/non-word judgment for 534 the second word as quickly as possible. Responses were made using the left (nonword) and 535 right (word) arrows on the computer keyboard. Response times were measured from the onset 536 of the second word.  Since adults and children generally had very different starting points at pre-test, the data from 564 each age group were analysed separately for each task. However, since we were specifically 565 interested in age differences for phonetic discrimination, we also included additional analyses 566 comparing the age groups for the training and 3-interval oddity discrimination tasks.

567
Linear mixed effects models allow binary data to be analysed with logistic models 568 rather than as proportions, as recommended by Jaeger (2008). Our approach was to 569 automatically include all the relevant experimentally manipulated variables for each task, and 570 all the interactions between those variables, as fixed factors in a model, regardless of whether 571 they contributed significantly to the model (i.e., we did not use stepwise model comparison).

572
Since preliminary analysis suggested that the extent to which children had used the "replay" 573 button during training was positively correlated with their increase in performance from pre-574 to post-test in the 3-interval oddity discrimination task (r = .38, df = .91 p < .01), we also 575 included each participant's mean-replay-usage as a fixed factor in the models for that task 576 (note that although the correlation did not hold for adults (r = .17, df = .39 p = .27), the factor 577 was nevertheless included in both models for consistency). In addition, preliminary analyses 578 revealed that one of the two talkers used in the test stimuli (i.e. as the trained/untrained voice; 579 see Table 3) was more intelligible than the other, affecting discrimination. In order to ensure 580 that key effects were not carried by a specific talker, we included both talker and all the 581 27 interactions with talker as a fixed factor 1 . Finally, in all models, predicting variables 582 (including discrete factor codings) were centred to reduce the effects of collinearity between 583 main effects and interactions, and in order that main effects were evaluated as the average 584 effects over all levels of the other predictors (rather than at a specified reference level for 585 each factor). We do not report full statistical models. For the experimental factors, we report  The lme4 package provided p-values automatically for logistic mixed effects models 597 but not for linear mixed effect models. For models with a continuous outcome variable (i.e., 598 RTs in the lexical decision task) p-values were calculated using the lmerTest package using 1/Female 2 were not used in training, they were used as the novel voice in testing-see Table   616 3). This design meant that high-variability included three talkers that were never included in 617 the low-variability training. It is possible that stimuli produced by these talkers could be 618 easier or harder to identify than stimuli produced by the two talkers used in both training 619 conditions. To ensure a fair comparison across conditions, we only consider trials in the high-620 variability condition where the stimuli were produced by one of the two talkers who were 621 also used in the low-variability condition (i.e., trials with female3, male1, and male2 were 622 excluded; see Table 3). The proportion of correct responses in each session is shown in 623 Figure 3. For our primary analyses, the data were analysed in two logistic mixed effects 624 models predicting whether a correct response was given (1/0) on each trial. Experimental 625 factors in the model were training-session (1à10) and condition (high-variability, low-626 variability), and the interaction between them. We were also interested in the contrast 627 between age-groups, however, as can be seen in Figure 3, by the final session, adult 628 participants were at ceiling in the low-variability condition. We therefore restricted our 629 29 analyses comparing age-groups to data from the adults and children in the high-variability 630 condition only. Here we used a mixed model predicting response accuracy with fixed effects 631 of training-session, age-group and talker, and all of the interactions between them, though 632 we only report the effect of age and interactions with age).    -test (standard error in parentheses). 703

Pre-test
Post-test

704
In summary, both adults and children showed a pattern of improved knowledge of the 705 word meanings from pre-to post-test with no differences between the high-variability and 706 low-variability conditions, although it was only possible to statistically verify these patterns 707 for children due to ceiling effects in adults.       Three-Interval Oddity Discrimination Test 789 We first ran separate logistic mixed effects models for each age-group, predicting    differences between conditions at pre-test, rather than differences at post-test, and should thus 883 be treated with some caution. In contrast to the previous study by Giannakopoulou et al. 884 (2013a), children did not show reliably greater improvement from pre-to post-tests than 885 adults, i.e. we did not replicate the "plasticity" effect seen in that study.  Barcroft, 2011). Data did not support this prediction. 902 We also expected that children would show greater increases in performance than adults, at 903 least in the training and discrimination tasks, given the findings of Giannakopoulou et al.

904
(2013a). Again this was not seen in the data. In this discussion, we first consider the findings 905 from each task, focusing on the contrast between high-and low-variability input. We then 906 turn to age-related differences, considering why we do not see the same benefit for child 907 learners seen in previous studies, and the implications for theories of plasticity and 908 maturation.

910
All groups showed improvement across training sessions. Both adults and children 911 showed consistently stronger performance following low-rather than high-variability input.

912
However, for children a benefit for low-variability training only emerged in the second half 913 of training, and only with the more intelligible speaker.

914
From the perspective of phonetic discrimination, greater performance following low-915 variability training is perhaps unsurprising. First, repeated exposure to the same items 916 produced by the same talker potentially allows participants to attune to idiosyncratic cues 917 associated with that talker (Clopper & Pisoni, 2004). In addition, the fact that our talkers 918 varied on a trial by trial basis meant that trial by trial adaptation to talker was required in the 919 high-variability condition, possibly imposing a burden on learners in that condition (see 920 42 Martin et al., 1989;Nusbaum & Morin, 1992 for evidence that multi-talker stimuli are 921 difficult even for L1 processing). Given this, it is perhaps surprising that children did not 922 show a reliable benefit of low-variability until the second half of training since we might 923 actually expect that their lower working memory capacity would increase the benefit for low 924 variability (Nusbaum & Morin, 1992; see below for further discussion of this in relation to 925 the discrimination data). However, one confounding factor here is the evidence from the pre-926 training discrimination test which indicates that the low-variability children started out, by 927 chance, somewhat lower in their ability to discriminate these contrasts, making it hard to 928 evaluate differences in the first half of testing.

929
Given that our task can also be viewed as a word learning task, it is worth considering 930 how this result fits with that of Rost and McMurray (2009), who found that 14 month olds, 931 who are developing their knowledge of L1 phonetic contrasts, only learn two minimal pair 932 object labels when those words were spoken by multiple talkers, not when they were spoken 933 by a single talker. This was despite the fact that test items did not probe generalization, 934 testing with a voice familiar from exposure. Similarly, Barcroft and Sommers (2005) found 935 benefits of multiple-talker training for adults learning novel words from a foreign language, 936 and their tests included L2 to L1 translation where the test items used talkers familiar from 937 training. One possibility is that in our training task, any potential benefit of variability may

941
In the 3-interval oddity discrimination test, participants identified the odd man out 942 from a choice of three words (e.g., sheep, sheep, ship). We were interested in the extent of 943 improvement from pre-to post-test, and whether this was affected by training condition and 944 novelty (of either words or talkers). If high-variability is specifically useful in supporting 945 generalization (as argued in the phonetic training literature), we predict that high-variability 946 training should benefit generalization items. Results from adult participants were, to some 947 extent, in line with this prediction, with numerically greater improvement in the high-948 variability condition, however this difference was not statistically reliable. The lack of a 949 reliable difference between conditions may be due to the overall high performance of adult 950 participants in this test. There was some evidence of an interaction between novelty and the 951 benefit of variability. However, although a greater benefit of high-variability for more novel 952 items is predicted (i.e. because it allows the formation of generalized representations that 953 include only phonetically relevant cues and exclude irrelevant talker identity cues), the 954 interaction relied in part on a benefit for the low-variability group for familiar items with the 955 novel talker, which was not predicted. This makes the result difficult to interpret. It is notable 956 that the strongest evidence for the benefit of high-variability training has come from studies 957 using identity tests (Lively et al., 1993;Sadakata & McQueen, 2013). This type of test was 958 not possible in the current context, where we did not use orthography, but if high-variability 959 is specifically useful in the formation of category level representations, it may be that an 960 identity test is more useful for testing this type of learning.

961
As for children, surprisingly, there was reliably greater improvement following low-962 rather than high-variability training. This held regardless of the novelty of test items. One 963 concern in interpreting this result is that our low-variability group (by chance) began with 964 lower scores at pre-test. Our analyses focus on changes from pre-to post-test (i.e. we 965 examine interactions with test session), however it is possible that the pre-test difference 966 could be biasing since the high-variability group have less space for improvement (although 967 it is worth noting that our statistical analyses were not done over proportions, but using 968 logistic regression via mixed models which should be less susceptible to this problem). One 969 interpretation of this result is that, for children, the four speaker input may contain too many 970