The effect of intermittent noise on lexically-guided perceptual learning in native and non-native listening

There is ample evidence that both native and non-native listeners deal with speech variation by quickly tuning into a speaker and adjusting their phonetic categories according to the speaker’s ambiguous pronunciation. This process is called lexically-guided perceptual learning. Moreover, the presence of noise in the speech signal has previously been shown to change the word competition process by increasing the number of candidate words competing for recognition and slowing down the recognition process. Given that reliable lexical information should be available quickly to induce lexically-guided perceptual learning and that word recognition is slowed down in the presence of noise, and especially so for non-native listeners, the present study investigated whether noise interferes with lexically-guided perceptual learning in native and non-native listening. Native English and Dutch listeners were exposed to a story in English in clean speech or with stretches of noise. All the /l/ and / ô / sounds in the story were replaced with an ambiguous sound half-way between /l/ and / ô /. Although noise altered the pattern of responses for the non-native listeners in a subsequent phonetic categorization task, both native and non-native listeners demonstrated lexically-guided perceptual learning in both clean and noisy listening conditions. We argue that the robustness of perceptual learning in the presence of intermittent noise for both native and non-native listeners is additional evidence for the remarkable flexibility of native and non-native perceptual systems even in adverse listening conditions.


Introduction
There are large variations among speakers in how they produce sounds and words. This is due to differences in the speakers' accent, dialect, speaking style, and idiosyncrasies of their vocal tract or, for instance, because the speaker has a speech impediment. There is ample evidence that listeners deal with this variation by quickly tuning into a speaker, even when pronunciations are ambiguous (e.g., Norris et al., 2003;Reinisch and Holt, 2014). In order to do so, listeners use lexical or phonotactic knowledge (for a review, see Samuel and Kraljic, 2009). The mechanism through which adaptation occurs is called lexically-guided perceptual learning or lexical retuning (Norris et al., 2003).
Lexically-guided perceptual learning was first demonstrated by Norris and colleagues (2003). In their study, Dutch listeners were exposed to words in Dutch with an ambiguous sound half-way between /f/ and /s/, denoted as [f/s], in a lexical decision task. One group of listeners heard /f/-final words where the final /f/ sound was replaced or los (loose) depending on their previous exposure to witlo [f/s] or baa[f/s], respectively (Scharenborg et al., 2015). This generalization of learning to words not present in the exposure phase strongly suggests that the phonetic category adjustment occurs at the prelexical level of processing . Norris et al. (2003) showed that lexically-guided perceptual learning only occurs when the ambiguous sound is included in an existing word, and not when the ambiguous sound is embedded in a nonword. They concluded that listeners adjust their phonetic category boundaries only when their lexical knowledge can be exploited to interpret ambiguous stimuli. Cutler et al. (2008) extended this proposition showing that ambiguous sounds in non-words can also induce phonetic category retuning, but only when they are part of a legal sequence of phonemes in the listener's native language. Jesse and McQueen (2011) showed that no learning occurs in native listening when ambiguous sounds are located at the start of words, and argued that in order for lexically-guided perceptual learning to occur, lexical knowledge should be available quickly and should be reliable enough to guide retuning. Although words containing the ambiguous sounds were recognized as words (80% acceptance rate on the ambiguous items in the lexical decision task which was used in the exposure phase), the disambiguating information was available too late relative to the position of the ambiguous sound at the start of the word for lexicallyguided perceptual learning to occur. This again shows the importance of lexical information for lexically-guided perceptual learning, and suggests that lexical competition should be resolved early enough to trigger retuning.
Because of the essential role of lexical information in lexicallyguided perceptual learning, non-native listeners, who have arguably less stable and detailed lexical knowledge than native listeners (Garcia Lecumberri et al., 2010), might possibly be hampered in adapting to ambiguous sounds in a non-native language. Moreover, phonetic categories and contrasts present in the non-native language might be absent or realized differently from those in the native language of the listener (Flege, 1995), which could result in failure to recognize the ambiguous sound or not treating it as ambiguous enough to induce retuning. Proficient non-native listeners do however show lexicallyguided perceptual learning, and are able to retune their native and non-native phonetic categories at least when their native and nonnative languages are phonologically similar (Bruggeman and Cutler, 2020;Drozdova et al., 2016;Reinisch et al., 2013;Schuhmann, 2015) or when the phonological system of the non-native language is simpler than the listeners' native phonological system (Cutler et al., 2018). Both native and non-native phonetic category representations are thus rather flexible.
There are, however, limits to such flexibility. Samuel and Kraljic (2009) showed that retuning is blocked when variation in the signal can be attributed to speaker-external factors. Kraljic et al. (2008a) demonstrated that acoustic deviations due to context-dependent variability, e.g., caused by a certain dialect (e.g., the pronunciation of /s/ as /S/ when followed by /tr/ in Philadelphia English), prohibited adaptation in native listening. Similarly, no retuning emerges when the ambiguity in the signal is caused by a pen in the mouth of the speaker (Kraljic et al., 2008b). Another speaker-external factor blocking lexically-guided perceptual learning was found to be the presence of background noise. Zhang and Samuel (2014) added signal-correlated noise to their stimuli in the exposure phase, masking both the carrier sentences and the critical lexical items, but not the ambiguous sound (a sound between /f/ and /s/). In contrast to listeners who performed the same task in clean speech, no lexically-guided perceptual learning was observed for listeners exposed to the stimuli masked by noise. Zhang and Samuel (2014) hypothesized that when the speech signal is noisy and hence more variable, native listeners do not treat the ambiguous sound as a reliable cue to trigger retuning.
The presence of noise in the speech signal has also been found to change the dynamics of phonological competition in native listeners (Ben-David et al., 2011;Bradlow, 2011, 2016;Hintz and Scharenborg, 2016;McQueen and Huettig, 2012;Scharenborg et al., 2018). In the study by McQueen and Huettig (2012) participants listened to sentences which were occasionally disrupted by bursts of noise. They hypothesized that the presence of intermittent noise increased listeners' expectation of a distortion occurring, which leads to a change in the lexical competition process. This change manifested itself as an increase in the number of looks to the rhyme competitors and a decrease in the number of looks to the onset competitors in comparison to the clean listening condition. Moreover, the presence of noise has been shown to increase the time listeners need to resolve competition in spoken word recognition (e.g., Ben-David et al., 2011;Bradlow, 2011, 2016). This slowing down is due to an increase in the number of candidate words competing for recognition when noise is present (Scharenborg et al., 2018), a longer activation of the candidate words in the memory of the listeners (Brouwer and Bradlow, 2011), and a reduced activation of the candidate words (Hintz and Scharenborg, 2016). Relatedly, an eye-tracking study with cochlear implant (CI) users (Farris-Trimble et al., 2014) showed differences in the degree of peak and late competitor activation between CI users and a CI simulation group of normal hearing participants. The authors hypothesized that, similar to the participants in McQueen and Huettig (2012), CI users keep competitors active in memory longer as they are expecting degraded input, and consequently delay commitment to lexical items. Listeners are thus able to flexibly adjust their interpretation of acoustic information and consequently their spoken-word recognition processes as listening conditions change (see also Brouwer et al., 2012).
Listening in noise is typically found to be more challenging for nonnative than for native listeners (e.g., Mayo et al., 1997;Rogers et al., 2006;Scharenborg et al., 2018; see for a review Garcia Lecumberri et al., 2010;Scharenborg and van Os, 2019). Non-native listeners, therefore, may provide an ideal testing ground for establishing the interaction of two potentially crucial factors in lexically-guided perceptual learning: the characteristics of the speech signal and the lexical knowledge available to the listeners. When the speech signal contains background noise, the phonological match between the target word and the activated words decreases (see Garcia Lecumberri et al., 2010). This leads to an increase of the number of candidate words compared to clean listening conditions, and this increase is even larger for non-native listeners compared to native listeners (Scharenborg et al., 2018).
The present study investigates the effect of intermittent noise on lexically-guided perceptual learning in native and non-native listening. Given the effect of noise on interpreting lexical information in the speech signal, lexically-guided perceptual learning might be impeded in noise. Indeed, Zhang and Samuel (2014) showed that noise throughout the stimulus (with the exception of the critical sound) interferes with lexically-guided perceptual learning in native listening. But what happens when the speech signal is only occasionally disrupted with noise? Based on the findings by McQueen and Huettig (2012) that intermittent noise changes the dynamics of the competition process for native listeners, we hypothesize that such a change potentially delays recognition of the word with the ambiguous sound (Ben-David et al., 2011;Bradlow, 2011, 2016), and subsequently disrupts lexically-guided perceptual learning, especially for non-native listeners, for whom the number of competing candidate words is larger than for native listeners (Scharenborg et al., 2018). Therefore, we predict a negative noise effect for native listeners on lexically-guided perceptual learning and an even stronger negative noise effect for non-native listeners, which could possibly be so strong that no lexicallyguided perceptual learning will take place. Our study is a test of the robustness of perceptual learning and will help us to understand the flexibility of native and non-native perceptual systems in adverse listening conditions.
To investigate this hypothesis, lexically-guided perceptual learning was examined in four listening conditions. In the first condition, native listeners of English were auditorily presented with a story (no background noise present) in English in which all /l/ and /ô/ sounds were replaced with an ambiguous [l/ô] sound. In the second condition, another group of native English listeners were presented with the same story, but this time parts of the story were masked with background noise, while, crucially, words containing the target ambiguous sound were left intact. In the third experimental condition, Dutch non-native listeners of English were exposed to the clean version of the same story as the native listeners. Finally, in the fourth condition, Dutch nonnative listeners were exposed to the story with intermittent background noise.
Articulation of /l/ is similar in Dutch and English (Collins and Mees, 1999), while British English prevelar bunched approximant /ô/ only occurs in Dutch in coda position Scobbie et al., 2009;Van de Velde and van Hout, 1999), where it never occurs in English. Dutch listeners would thus have to create a languagespecific phonetic category for British English /ô/ (Drozdova et al., 2016). After listening to the story, all participants performed a phonetic categorization task.

Method
Following the standard procedure for lexically-guided perceptual learning studies (e.g., Norris et al., 2003;Scharenborg et al., 2015;Zhang and Samuel, 2014), all experiments included an exposure phase and a test phase. The exposure phase consisted of a story (Drozdova et al., 2016;Eisner and McQueen, 2006) with a between-participant manipulation (see Appendices A and B). Half of the participants listened to the story where all /l/ sounds were replaced by an ambiguous [l/ô] sound, while the other half of the participants listened to the same story where all /ô/ sounds were replaced by the ambiguous [l/ô] sound. Participants were randomly assigned to one of the two versions of the story. During the test phase, all participants had to perform a phonetic categorization task. To obtain a measure of the lexical proficiency in English of the non-native listeners, LexTALE (Lexical Test for Advanced Learners of English: Lemhöfer and Broersma, 2012) was administered to the non-native listeners. LexTALE is an unspeeded lexical decision task in which participants are exposed to 60 items one-by-one shown on a computer screen and have to decide upon the presentation of each item whether it is an existing word in English or not.

Participants
One hundred and seventy-six native English speakers (33 males, M age = 20.9, SD age = 2.6), recruited from the Psychology Electronic Experiment Booking system of the Department of Psychology of the University of York, participated in the native versions of the experiment. Two hundred and one native Dutch speakers (35 males, M age = 21.6, SD age = 2.1) were recruited from the Radboud University Nijmegen subject pool and participated in the non-native versions of the experiment. The Dutch participants had an average score of 68.5 (SD = 13.7) on the LexTALE test, which corresponds to an upper-intermediate level of proficiency (B2 level according to the Common European Framework of Reference for Languages: Lemhöfer and Broersma, 2012). The groups of native and non-native listeners who were exposed to the story in clean speech are supersets of those reported in Drozdova et al. (2016).
An overview of the number of participants for each condition is shown in Table 1. The sample size of the native + noise condition is smaller than that for the other three conditions due to recruitment constraints during the testing period. Prior to the experiment, all native and non-native participants filled in a questionnaire with questions regarding any hearing or learning disorders and possible difficulties in listening in the presence of background noise. Only participants without self-reported learning or hearing disorders were included in the experiments. Participants were assigned to only one condition. Table 1 Number of participants in each listening condition assigned to the /l/-ambiguous and the /ô/-ambiguous version of the story in clean speech and in the presence of intermittent noise.

Listeners
Clean listening condition Noisy listening condition /ô/-ambiguous /l/-ambiguous /ô/-ambiguous /l /-ambiguous   Native  52  48  39  37   Non-native  47  53  50  51 Additionally, 15 native Dutch participants (3 males, M age = 23.1, SD age = 4.7) took part in a pretest of the stimuli, and another eight native Dutch participants (M age = 22, SD age = 2.8) took part in a pilot study to determine the appropriate length of the noise fragments in the noisy condition. None of these participants were included in the main experiments. All participants received a monetary reward for their participation, and signed a consent form prior to the experiment.

Exposure phase: clean
The story used in the exposure phase was taken from a previous experiment (Drozdova et al. 2016). It included 19 words containing one /l/ sound and no /ô/ sounds, and 19 words containing one /ô/ sound and no /l/ sounds. The words were chosen from the CELEX database (Baayen et al., 1995) and had frequencies of at least 100 per million. Since lexically-guided perceptual learning is impeded when listeners hear standard pronunciations of the target sound from the same speaker (Kraljic and Samuel, 2011), no words in the story other than the target words contained /l/ or /ô/. As retuning does not transfer to other allophones of the same sound , all /l/ and /ô/ occurred in the same position for all target words, i.e., at the onset of the third or fourth syllable (except for one word: Internet ). The story consisted of 333 words, of which 38 were critical items (see Appendix A for the story). The total duration of the story was 2 min 21 s. The story was recorded by a male native speaker of British English from South West England in a sound-attenuated booth with a Sennheiser ME 64 microphone at a sampling frequency of 44100 Hz. In order to obtain the ambiguous sound between /l/ and /ô/, the story was recorded in three versions. In the first version, all words were pronounced in a natural way. In the second version, all words containing an /l/ sound were pronounced with an /ô/ sound (e.g., accumurated). In the third version, all /ô/ sounds were substituted with /l/ sounds (e.g., wondeling ). The words were then excised at the positivegoing zero crossings from each version of the story and zero-padded with 25 ms silence at the onset and the offset using Praat (Boersma and Weenink, 2009). The pitch contours of the two items from each pair (e.g., memory-memoly) were equalized and, following the procedure described by Scharenborg and Janse (2013), morphed with the STRAIGHT algorithm (Kawahara et al., 1999). STRAIGHT first decomposes the input files into source parameters and spectral parameters, and subsequently removes pitch information, while keeping frequency information. In order to keep coarticulatory information of upcoming /l/ and /ô/ in the syllable preceding the critical sound available to the listener, whole words were morphed rather than separate sounds. As a result of morphing the item-pairs, an 11-step continuum was created where step 0 was the most /l/-like sound and step 10 the most /ô/-like.
To determine the most ambiguous step between /l/ and /ô/, a pre-test with 15 Dutch listeners was conducted. We chose the most ambiguous steps on the basis of a pre-test with non-native listeners rather than native listeners to ensure that the chosen steps were indeed ambiguous for the non-native listeners, the group we were primarily interested in.
The pre-test consisted of a phonetic categorization task, where listeners had to decide whether they heard a stimulus with an /l/ or an /ô/ sound by pressing the corresponding button on the button box. Participants listened to five different steps of the continuum, i.e., steps 1, 3, 5, 7, 9. The left button of the button box corresponded to the item containing an /l/, whereas the right button corresponded to the item with an /ô/. The two possible answers were also presented on a computer screen with the /l/-reading of the stimulus (e.g., wondeling) on the left side of the screen and the /ô/-reading (e.g., wondering) on the right side of the screen. In half of the trials, the /l/ answer was an existing word and the /ô/ answer a non-word and in the other half of the trials the /ô/ answer was a word and the /l/ answer was a non-word. Participants categorized five steps of each critical word (38 words) and test word (4 words: see subsection Test Phase). Each step of the continuum was presented twice to the participants. Participants categorized 400 items in total.
The proportions of /l/ and /ô/ responses for the test items were calculated. The step on the continuum that received approximately 50% of both responses was chosen as the most ambiguous one. The most ambiguous step was determined individually for each word and then spliced into the corresponding version of the story. Two versions of the story were created: in one version, all words containing an /l/ sound were replaced by the ambiguous [l/ô] sound while all /ô/ sounds remained natural; in the second version, all words containing an /ô/ sound were replaced by the ambiguous [l/ô] sound while all /l/ sounds remained natural.

Exposure phase: noise
For the experiments in the noisy condition, speech-shaped noise was added to the story. For lexically-guided perceptual learning to occur, listeners needed to be able to comprehend the story, hence a signalto-noise ratio (SNR) was created that challenged listening but did not severely impair recognition accuracy. The SNR was chosen on the basis of a study by Scharenborg and colleagues (2018). In this study, Dutch non-native listeners of English had an average recognition accuracy of 83.8% for English words partially embedded in speech-shaped noise at an SNR of 0 dB. This was deemed an appropriate SNR for our criteria. Following McQueen and Huettig (2012) noise was placed on several fragments of the story, so that at least one word, but typically two words (range 1-4 words), preceding and typically at least one word following the critical word was unmasked.
The noise was automatically added to fragments of the story using a Praat (Boersma and Weenink, 2009) script. First, boundaries were manually placed in the signal on the positive zero crossings in Praat. The fragments that were to be masked were marked with an X on the tier. The Praat script then placed a random chunk of the noise signal on the marked part of the audio file. Before adding noise, the audio file was down-sampled to 16000 Hz to match the sampling frequency of the noise file.
The length of the noise fragments was determined on the basis of a pre-test with eight native Dutch listeners. During the pre-test, participants listened to the partially-masked story, and afterwards had to answer five short questions to check their comprehension of the story. All eight participants answered two to four comprehension questions correctly (M = 3.25), which confirmed that the presence of noise made listening challenging but did not severely harm comprehension. For the noise-added version of the story, see Appendix B.

Test phase
The test phase consisted of a phonetic categorization task. Two minimal pairs, not present in the target story, were used: collect-correct and alive-arrive. To avoid a bias towards either the /l/ or the /ô/ interpretation of the ambiguous stimuli, the two pairs had an opposite pattern of word frequency, with the /l/ word being more frequent for the alive-arrive pair (1135 per million for alive and 157 per million for arrive) and the /ô/ word being more frequent for collect-correct (117 per million for collect and 804 per million for correct ). The words were recorded by the same speaker who recorded the story. The two members of each word pair were subsequently morphed together using the procedure described in the previous subsection. The test phase in the experiment included five steps from each of the two continua: the most ambiguous step between /l/ and /ô/ as determined on the basis of the pre-test, and the two steps preceding and following it. For the alive-arrive pair these were steps 3-7, and for collect-correct these were steps 2-6.

Procedure
All participants were tested individually in a quiet cubicle or in a sound-attenuated booth. Prior to the experiment, they filled in a consent form and a short questionnaire containing questions about their age, education, and language background. Subsequently, participants were given verbal instructions about the upcoming tasks. Additionally, they saw instructions on the computer screen informing them that they would be hearing a story in English. The story was played to the listeners binaurally through headphones. Once participants finished listening to the story, a message appeared on the screen indicating that they had to press a button on the button box to proceed to the next task. When participants pressed the button, instructions for the test phase of the experiment appeared on the screen.
The test phase was in the form of a phonetic categorization task where participants had to press a button on the button box to indicate which item (alive or arrive; collect or correct ) they had just heard. The left button on the button box corresponded to the item with the /l/ sound and the right button corresponded to the item with the /ô/ sound.
Since the testing setup used with the native group was not equipped with a button box, participants used the ''z'' key on the keyboard as the left button and the ''m'' key as the right button. The two response options were also visually presented on a computer screen. Test stimuli were divided over four blocks, with a self-paced pause after each block. Each block consisted of the five steps of each pair presented three times in a random order. Participants thus listened to 120 test items. Exposure and test phases were followed by the LexTALE task for the non-native listeners.

Results
To investigate the effect of the presence of background noise on lexically-guided perceptual learning in native and non-native listeners, the responses in the phonetic categorization task were analyzed. We excluded one participant from the analysis (from the group of native listeners, /ô/-exposure group, noise listening condition), because due to a technical error her responses from the final block were missing. Table 2 shows the percentage of /ô/ responses in each experimental condition.
All analyses were performed in R (version 3.0.2) using mixed effects logistic regression with glmer (package lme4) with the optimizer set to BOBYQA (Powell, 2009) and the number of iterations set to 100000. The dependent variable was the response to the ambiguous sound, where /l/ responses were coded as 0 and /ô/ responses as 1. We started the analysis from an overall model including the native and the non-native listener groups in both listening conditions (clean and noise) containing all predictors: Exposure Condition (/ô/-ambiguous or /l/-ambiguous version of the story), Noise (whether the story was presented in clean or in noise), Step on the continuum, Language (whether the participant was a native or a non-native listener), Word P. Drozdova et al. pair (collect-correct or alive-arrive) and all possible five-, four-, threeand two-way interactions between them.
Step on the continuum was included as a categorical variable, 1 all other variables were recoded using deviation coding. A backward selection procedure was applied in which interactions and predictors that were not significant at the 5% level were oneby-one removed from the model, starting with the interactions with the highest, non-significant p values. Each change in the fixed effect structure was evaluated by inspecting the likelihood ratio changes with the anova function.
Note, that all tables that display the statistical results contain labels referring to experimental variables. Their meaning is given in Table 3. As we applied deviation coding (the standard coding in analyses of variance), the value of the category not given can be inferred from the category presented. The values of the categories add up to zero. The estimates of the parameters from the best fitting model are shown in Table 4.
We expected to find a difference between the two exposure groups, which would manifest itself as a significant effect of Exposure Condition. However, as shown in Table 4 the main effect of Exposure Condition was not significant. However, Exposure Condition contributed to two significant two-way interactions with Word and Step. Moreover, a three-way interaction with Word, Language, and Exposure Condition remained in the final model, as its removal significantly decreased the model fit. Given a significant effect of the word pair in the emergence of lexically-guided perceptual learning (Word and its interactions; also found in Drozdova et al. (2016)), we ran separate analyses for the collect-correct and alive-arrive word pairs.

Collect-correct
Responses for the collect-correct test continuum were analyzed with the same backward selection procedure explained in the previous section (but excluding the factor Word). Fig. 1 shows proportions of listeners' responses for the collect-correct test continuum separately for native and non-native listeners and different listening conditions. Table 5 shows the estimates of the parameters that were included in the final model for this analysis.
Contrasting with the main analysis, the analysis for the collectcorrect word pair revealed a significant main effect of Exposure Condition (see also the factor Exposure Condition in Table 5), which means that the /ô/-exposure group gave significantly more /ô/ responses in the phonetic categorization task than the /l/-exposure group for the collect-correct test continuum. This effect of Exposure Condition was significantly different for the first two steps on the continuum compared to Step 3 (significant interaction between Step and Exposure Condition; first two continuum steps on Fig. 1) and did not depend on the language background of the listeners nor the presence of background noise during exposure (no significant interactions between Exposure Condition and Noise or Exposure Condition and Language in the final best fitting model). Native and non-native listeners differed in the overall number of /ô/-responses in the phonetic categorization task in the noise listening condition: native listeners gave more /ô/ responses in the noisy listening condition than non-native listeners (see significant interaction between Noise and Language and panels on the right in Fig. 1). Additionally, the difference at the first step of the continuum was larger for the native listeners than for the non-native listeners, especially in the noise condition (significant interaction between Language, Step and Noise).
P. Drozdova et al. Fig. 1. Proportion of native listeners' (lower panels) and non-native listeners' (upper panels) /ô/-responses for the collect-correct test continuum in the clean and the noisy listening condition. Responses of participants who were exposed to the /ô/-ambiguous version of the story are represented with the black line with triangles. Responses of the participants exposed to the /l/-ambiguous version of the story are shown with the gray line with squares.

Alive-arrive
For the alive-arrive test continuum, the estimates of the parameters included in the final model for the native and non-native listeners for both listening conditions together are presented in Table 6.
As can be seen in Table 6, no main effect of Exposure Condition was observed in the analysis for the alive-arrive test continuum, although Exposure Condition contributed to a significant interaction with the continuum step. As can be seen in Fig. 2, this significant interaction was not caused by the difference between /ô/ and /l/-exposure conditions, but rather by the differences between continuum steps. Although there were no differences between the /ô/ and /l/-exposure conditions on the third step of the test continuum, there were slightly more /ô/-responses on the first steps of the continuum for the /ô/-exposure group than for the /l/-exposure group, while on the last steps of the continuum this difference reversed. In general, irrespective of the native language of the listeners or the listening condition (noise or clean), there was no learning effect for the alive-arrive test continuum. However, similar to the collect-correct test continuum, native listeners gave more /ô/ responses than non-native listeners. This difference, however, was modified by listening condition and continuum step.

Discussion and conclusions
The present study investigated the effect of intermittent noise on lexically-guided perceptual learning in native and non-native listening. We hypothesized that intermittent noise has a detrimental effect on lexically-guided perceptual learning, especially for non-native listeners, due to the detrimental effect of background noise on the competition process. However, contrary to our hypothesis, lexically-guided perceptual learning was observed for both native and non-native listeners irrespective of the presence of intermittent noise. Note, however, this effect was only observed for the collect-correct word pair while no effect was found for the alive-arrive word pair for either listener group. In our discussion, we first focus on the different pattern of responses for the collect-correct and alive-arrive word pairs, and then discuss the results for the native and non-native listeners in the clean versus the noisy listening condition.
The ambiguous sounds used in the exposure and test phases were chosen on the basis of a pre-test with non-native listeners, as they were our main group of interest. In order to be able to compare the native and non-native listeners' ability to show lexically-guided perceptual learning, the same stimuli were used for both listener groups. Nevertheless, in the present study, lexically-guided perceptual learning was found for both listener groups in both listening conditions for the collect-correct word pair, while no lexically-guided perceptual learning was observed for the alive-arrive word pair. Since neither listener group showed perceptual learning for the alive-arrive continuum in either listening condition, and since perceptual learning has been shown for many different continuums (see for an overview ), including the continuum used in this work Mitterer et al. (2013), it is likely that the lack of a perceptual learning effect for alive-arrive was due to idiosyncracies with the steps selected for the alive-arrive continuum. Indeed, acoustic analyses in Drozdova et al., 2016 suggest that the steps for alive-arrive might not be as well positioned on the continuum as they were for the collect-correct pair. In particular, the first step of the alive-arrive continuum was found to be more /ô/-like than the first step of the collect-correct continuum. Moreover, the schwa-initial structure of the alive-arrive pair means that P. Drozdova et al. Fig. 2. Proportion of native listeners' (lower panels) and non-native listeners' (upper panels) /ô/-responses for the alive-arrive test continuum in the clean and noisy listening conditions. Responses of participants who were exposed to the /ô/-ambiguous version of the story are represented with the black line with triangles. Responses of the participants exposed to the /l/-ambiguous version of the story are shown with the gray line with squares.
the monosyllabic words life and rife could have been activated and competed with the carrier words alive and arrive. This would be consistent with the stress-based segmentation mechanisms assumed in languages with a statistical bias for stress-initial words (Cutler and Norris, 1988;Norris et al., 1995). If the ambiguous [l/ô] sound was perceived as word-initial, then it would have been less likely to demonstrate retuning, as we know from Jesse and McQueen (2011). If true, then it might be the case that lexical retuning only occurs or is only revealed in words in which the ambiguous sound is not the start of an existing word embedded in the longer word. Therefore, in our subsequent discussion, we will focus on the collect-correct word pair where the [l/ô] sound was ambiguous enough to induce lexically-guided perceptual learning in both listener groups.
The results for the clean listening condition are in line with numerous earlier studies (e.g., for native listeners: Eisner and McQueen, 2006;Norris et al., 2003;Scharenborg et al., 2015; and for non-native listeners: Cutler et al., 2018;Drozdova et al., 2016;Reinisch et al., 2013). Both the study by Drozdova et al. (2016) and the present study demonstrate that despite differences in native and non-native listening, relatively proficient non-native listeners are able to retune their non-native phonetic categories as a result of exposure to an ambiguous sound (see Cutler et al., 2018 for a discussion on the role of phonological similarity between the native and non-native language of the listeners on this adaptation process).
With respect to the noise condition, Zhang and Samuel (2014) found that learning was blocked in the presence of noise during native listening, whereas the present study found the opposite. There is, however, an important difference between our study and the study by Zhang and Samuel (2014). During the exposure phase in the Zhang and Samuel study, the entire stimulus was masked by noise with the exception of the critical ambiguous sound. In our study, noise was far less prevalent, since it was never present on the words containing the ambiguous sound and most of the time also not on the words directly preceding and following the critical word. As Zhang and Samuel argued, the wide-spread presence of noise during exposure increased the inherent variability in the pronunciation of the speech sounds. Consequently, the variability of the ambiguous sound, which normally would trigger lexically-guided perceptual learning, would prevent the ambiguous sound from acting as a reliable cue to trigger retuning. In our study, the presence of noise might have increased the variability of the speech signal locally, but it did not reduce the reliability of the variability of the ambiguous sound as a cue to lexically-guided perceptual learning as evidenced by the fact that both the native and the non-native groups of listeners showed retuning in the intermittent noise listening condition. Therefore, our study shows that the presence of background noise does not necessarily disrupt retuning, even when phonological representations and lexical knowledge are non-native as in listening in a non-native language.
McQueen and Huettig (2012) previously demonstrated that the presence of intermittent noise in the speech signal alters the competition process during native spoken word recognition and makes listeners less confident about the words they are hearing. Moreover, Scharenborg et al. (2018) showed that the presence of background noise increases the number of activated words in both native and non-native listening. Keeping multiple word candidates in memory can thus slow down recognition of the target word (Norris et al., 1995) containing the ambiguous sound. However, the present results show that even when intermittent background noise is present in the signal, the crucial lexical information to disambiguate the ambiguous sound is available in time for both native listeners and relatively proficient non-native listeners. Given the role of lexical and phonological knowledge in lexically-guided perceptual learning (Norris et al., 2003;Cutler et al., P. Drozdova et al.  Step 1 −3.429 0.062 <.001 Step 2 −2.085 0.047 <.001 Step 3 0.431 0.041 <.001 Step 4  2008), and the influence of the presence of background noise on the interpretation of this information (Ben-David et al., 2011;Bradlow, 2011, 2016;McQueen and Huettig, 2012;Scharenborg et al., 2018), future studies should increase the length of the noise fragments and/or reduce the SNR to determine the conditions under which lexically-guided perceptual learning is fully disrupted as was found in the study by Zhang and Samuel (2014). Native and non-native listeners in the present study were surprisingly similar in how they dealt with the ambiguous sound in the collect-correct test continuum. The only observed difference between the two groups was the number of /ô/-responses in the noisy listening condition: native listeners gave significantly more /ô/-responses than non-native listeners. This result suggests that the presence of noise changed the perception of the target sound (despite not being masked by noise) for the non-native listeners, but since this change occurred for both the /l/-exposure and the /ô/-exposure group, no difference in lexically-guided perceptual learning between clean and noisy listening conditions was observed. Broersma and Scharenborg (2010) previously demonstrated that the presence of noise affects Dutch listeners' perception of English /ô/ to a greater extent than native listeners' perception. Apparently, this difference is present even when noise occurs intermittently, and can be observed even in the subsequent perception of /l/ and /ô/ when listeners hear the items in clean.
Previous studies underlined a number of factors impeding lexicallyguided perceptual learning in native listening, such as variability attributed to a certain dialect (Kraljic et al., 2008a) or a pen in the mouth of the speaker (Kraljic et al., 2008b), initial position of the ambiguous sound in the word (Jesse and McQueen, 2011), or the presence of background noise, which covered all the stimuli except the target sounds (Zhang and Samuel, 2014). The present study demonstrates that the presence of intermittent noise does not fall into the group of Table 6 Fixed-effect estimates of the performance of the listeners in the phonetic categorization task for the alive-arrive word pair. these impeding factors, as lexically-guided perceptual learning remains robust in native and non-native listening irrespective of the listening conditions. This is an important finding showing that the perceptual system of non-native listeners can remain as flexible as that of native listeners even in harder and challenging listening conditions.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. a new coaching position, intimidated him. He expected no equality of chances: no famous team was going to invite him now as a coach. No one. That's enough, he thought. He had to face the situation and this inequality and pay no attention to ignorant fans. (pause). Act independently of what they might say. The exact moment he decided that mind wandering, sitting and thinking about his devastating situation had no utility, somebody knocked at his window.