The processing of subject pronouns in highly proficient L2 speakers of English

Studies on second language (L2) anaphora resolution have mainly focused on learners of null-subject languages, demonstrating that L2 speakers show residual indeterminacy in the L2 referential choice, even at the highest levels of proficiency. On the other hand, studies on anaphora resolution in L2 learners of non-null-subject languages have shown conflicting results, indicating that L2 speakers may process referential expressions in the L2 like native speakers. Using a visual word paradigm task, we test the online processing of pronouns in highly proficient L2 speakers of English whose L1 is Spanish, and compare their performance to a group of native English speakers. The native speakers’ data show rapid use of the first mention bias (i.e., interpreting a pronoun towards the first-mentioned referent) and gender information upon encountering a gender ambiguous or unambiguous pronoun. For the L2 participants, we find similar underlying processes of pronoun resolution in comparison to native speakers. The results do not reveal a processing cost for L2 speakers of a non-null-subject language during anaphora resolution. Overall, our study demonstrates that L2 speakers of a non-null-subject language ( English) can achieve native-like processing of the default referential form signaling topic continuity (i.e., the overt pronoun; Sorace 2011).


Introduction
Producing and comprehending referring expressions is a basic feature of linguistic communication. Research focusing on the psychological processes underlying how speakers make decisions about referring expressions has shown that native speakers are usually fast and efficient at producing and interpreting referring expressions (e.g., Arnold 2010 for a review). Differently from native speakers, adult learners may show residual optionality in the comprehension and production of referring expressions in the second language (e.g., Sorace & Filiaci 2006;Belletti, Bennati & Sorace 2007), even when they attain near-native competence. Research on second language (L2) acquisition has consistently confirmed this observation in adults who acquire null-subject languages, such as Spanish, Italian and Greek (e.g., Italian: Serratrice, Sorace & Paoli 2004;Sorace & Filiaci 2006;Belletti et al. 2007;Sorace, Serratrice, Filiaci & Baldo 2009;Spanish: Montrul 2004;Montrul & Rodríguez Louro present study, we examine the comprehension of pronouns in L2 speakers of a non-nullsubject language (English) whose first language is null-subject (Spanish). We collected eyetracking measures because eye-tracking has been argued to provide an online measure of processing, tapping onto the underlying processes of anaphora resolution in real time. This paper is structured as follows. First, we present evidence from cross-linguistic research on anaphora resolution showing the main differences between English and Spanish. Next, we summarize the literature on L2 acquisition suggesting that L2 learners may show residual indeterminacy in anaphora resolution compared to native speakers. Then, we present an experimental study in which we use eye-tracking measures during listening to examine the time course of pronoun resolution during language processing.

Anaphora resolution and choice of referring expressions in L1 English and L1 Spanish
Anaphora resolution involves resolving references to words or groups of words presented in the discourse. Speakers are known to consider the accessibility of the antecedents presented in the previous discourse to determine the interpretation of an anaphoric expression (e.g., Arnold 2010 for a review). Accessibility can be influenced by several discourse-level features, such as whether an entity is given or new in the discourse, which in turn impacts individuals' interpretation of the anaphoric expression that refers to it (e.g., Ariel 1990). Additionally, syntactic information like the grammatical role of the antecedent is known to be an important factor. For example, anaphoric expressions are more likely to be interpreted towards an antecedent that is in subject position, in comparison to an antecedent that is not in subject position.
Accessible referents in the discourse are usually pronominalized in English. When a pronoun has a potentially ambiguous interpretation, as in (1), English speakers are more likely to interpret the pronoun he as referring to Anthony-the subject of the preceding clause and the most salient referent in the discourse (e.g., Arnold et al. 2000).
(1) Anthony i saw Simon when he i was going to the coffee shop.
Psycholinguistic research has described this interpretation bias as the 'first-mention bias,' confirming the preference of English-speakers for interpreting a pronoun as referring to the first-mentioned entity and grammatical subject in the preceding context (e.g., Corbett & Chang 1983;Crawley, Stevenson & Kleinman 1990;Arnold, Eisenband, Brown-Schmidt & Trueswell 2000;Järvikivi, van Gompel, Hyona & Bertram 2005). 1 The first mentionbias can be overridden in English if additional information is available in the discourse. For instance, if cues to accessibility are present, as in the case of intonation, the pronoun could be resolved as referring to the non-subject referent (e.g., Cummins & Rohde 2015). In addition, semantic gender can also unambiguously signal the referent of an anaphoric pronoun, as illustrated in (2).
(2) Anthony saw Susan i when she i was going to the coffee shop.
To examine the role of accessibility and gender during pronoun resolution, Arnold et al. (2000) recorded the eye movements of a group of English-speaking adults while they viewed a scene on a computer screen with two characters of either same or different gender. At the same time, participants listened to two sentences. The critical manipulation in the design was that the characters had different degrees of accessibility based on their grammatical proficient L2 speakers of English position. The first mentioned character was in the grammatical subject position (i.e., the most accessible referent in the sentence), and the second character was in the object position (e.g., "Donald is bringing some mail to Mickey/Minnie while a violent storm is beginning"). This was followed by a sentence containing a pronoun that was either gender ambiguous or gender unambiguous (as in "He is/she is carrying an umbrella"). The eye-movement results indicated that participants used gender information and accessibility information rapidly, starting at around 200 ms after the offset of the pronoun. The results by Arnold et al. suggest that semantic gender is rapidly integrated during anaphora resolution, and it is used as a reliable cue to interpretation, even when the accessibility of the target referent is decreased by the position in which it is presented in the preceding context (Arnold et al. 2000). Cross-linguistic research on anaphora resolution has also demonstrated that speakers may have different antecedent biases based on the type of anaphoric forms and grammatical properties of particular languages. In addition, previous studies suggest that competing strategies across languages could pose a conflict for speakers who have different sets of referring expressions and interpretation biases in their first and their second languages (e.g., Sorace & Filiaci 2006;Belletti et al. 2007). The aim of the present study is to investigate how L2 speakers of English (a non-null-subject language) whose first language is Spanish (a null-subject language) resolve this conflict during online processing of pronouns in the L2.
In null-subject languages like Italian and Spanish, the inventory and function of pronominal forms differ from that of non-null-subject languages. In null-subject languages, besides full NPs, it is possible to use null pronouns and overt pronouns to refer to previously mentioned entities, which have different interpretation biases, as suggested by the Position of Antecedent Hypothesis (Carminati 2002). According to this hypothesis, speakers interpret null pronouns as referring to the grammatical subject in the preceding discourse, and they interpret overt pronouns to signal a topic shift (i.e. overt pronouns are more often interpreted as referring to a non-subject antecedent). For example, Italian speakers are likely to interpret the null pronoun in (3) to refer to Anthony (i.e., the subject of the preceding clause), and the overt pronoun in (4) to refer to Simone (the object of the clause). (3) Antonio i ha visto Simone quando pro i andava al bar. Antonio has seen Simone when pro went+3sg.imperfect to+the coffee+shop 'Anthony saw Simon when he was going to the coffee shop.' (4) Antonio ha visto Simone i quando lui i andava al bar. Antonio has seen Simone when he went+3sg.imperfect to+the coffee+shop 'Anthony saw Simon when he was going to the coffee shop.' While the Position of Antecedent Hypothesis has successfully explained the pattern of interpretation of null and overt subjects in Romance languages like Italian, recent studies on Spanish have suggested that the theory may not accurately predict Spanish speakers' interpretation biases across different sentence structures (e.g., Filiaci, Sorace & Carreiras 2014;Chamorro 2018;; see also Sorace & Filiaci 2006). For example, Filiaci et al. (2014) tested anaphora resolution in contexts where a subordinate clause introducing two referents was followed by a main clause containing a null-subject or an overt pronoun, as shown in (5). (5) Después de que Bernardo i criticó a Carlos j tan injustamente, after Bernardo criticized+3sg.preterit Carlos so unjustly, pro i le pidió disculpas. pro i to+him-clitic asked+3sg.preterit forgiveness 'After Bernardo criticized Carlos so unjustly, he apologized.' Contemori and Dussias: The processing of subject pronouns in highly proficient L2 speakers of English Art. 38, page 4 of 19 Filiaci et al. (2014) found that Italian and Peninsular speakers of Spanish interpreted null-subjects similarly, i.e. a subject interpretation was preferred in both languages, as predicted by the Position of Antecedent Hypothesis. However, when interpreting overt pronouns, Italian native speakers prefer an object interpretation, whereas Spanish native speakers accept both a subject and an object interpretation. Similar results were found with Mexican Spanish speakers by , showing no clear preference for a subject or an object antecedent when interpreting overt pronouns. A study by Chamorro (2018) employed an offline comprehension task to test null and overt pronoun interpretations by Peninsular speakers of Spanish reading sentences where a subordinate clause followed a main clause, as in (6): La madre i saludó a la chica j cuando ella j /pro i the mother greeted+3sg.preterit to the girl when she/pro cruzaba una calle con mucho tráfico. crossed+3sg.imperfect a street with a lot of traffic 'The mother greeted the girl when she crossed a street with a lot of traffic.' The results showed that native speakers of Peninsular Spanish consistently assigned the object as the preferred antecedent for the overt pronoun, while there was no clear antecedent preference for the null pronoun.
In summary, recent findings suggest that the interpretation biases in Spanish can vary based on the structure of the sentence, and cannot be predicted solely based on the Position of Antecedent Hypothesis (e.g., Filiaci et al. 2014;Chamorro 2018;. Additionally, in Spanish and English, semantic gender on pronouns can serve as a reliable cue to disambiguate the referent of an overt pronominal form (e.g., Arnold et al. 2000). However, we assume that semantic gender is less reliable in the case of null-subject pronouns in Spanish unless disambiguating gender information is marked on other words that follow the null pronoun.

Anaphora resolution and choice of referring expressions in L2 learners
Research investigating anaphora resolution in L2 acquisition has suggested that L2 learners may experience difficulties mastering the use of referential forms and anaphora interpretation biases in the L2 (e.g., Tsimpli, Sorace, Heycock & Filiaci 2004;Sorace & Filiaci 2006). Previous studies on L2 anaphora resolution and on the production of referring expressions, have mainly focused on learners of null-subject languages. The findings suggest that L2 learners of Italian, Spanish, and Greek may be subject to vulnerability during L2 referential choice, even when they have achieved native-like proficiency (e.g., Lozano 2006;Margaza & Bel 2006;Montrul & Rodríguez Louro 2006;Sorace & Filiaci 2006;Belletti et al. 2007;Rothman 2008;Keating et al. 2011). Specifically, it has been shown that learners of a null-subject language whose L1 is a non-null-subject language have a weaker representation of the pragmatic constraints on pronoun distribution when interpreting overt pronouns in null-subject languages. The learners can accurately interpret a null pronoun, which is the default referring expression signaling topic continuity, as shown in (7). However, L2 learners may select the subject of the previous sentence as the antecedent for overt pronouns more often than native speakers, as illustrated in (8). Thus, when the division of labor between null and overt pronouns is present in a language, L2 learners present more optionality when interpreting the non-default form signaling a topic shift (i.e., the overt pronoun in Romance languages; Sorace 2011).
A theoretical approach known as the Interface Hypothesis (Sorace 2011) accounts for the optionality observed in L2 anaphora resolution by proposing that L2 learners and other bilingual populations display a general vulnerability with structures at the interface between syntax and pragmatics, such as choosing/interpreting referring expressions. According to Sorace (2011), interface structures require increased cognitive resources to be processed and are more likely to lead to optionality in use than structures that involve syntactic knowledge only.
While studies looking at the L2 acquisition of null-subject languages confirm a general vulnerability for the comprehension and use of anaphoric expressions structures, it is an open question what role cross-linguistic interference plays and whether the optionality shown by the learners may be a general effect of bilingualism (see Sorace 2011 for discussion). Additionally, studies on anaphora resolution and choice of referring expressions in learners of non-null-subject languages present mixed findings (Roberts et al. 2008;Wilson 2009;Ellert 2013;Contemori & Dussias 2016;2019;Schimke & Colonna 2016;Cunnings et al. 2017;Contemori 2019;Contemori & Ivanova submitted). In German, where the division of labor between null and overt pronouns is present in the contrast between overt pronouns and demonstratives, Ellert (2013) and Wilson (2009) found similar interpretations of overt pronouns in L2 learners of German and in German native speakers. In addition, Ellert (2013) and Wilson (2009) have shown that the interpretation of demonstratives in learners of German is more indeterminate than in native speakers. In a study by Schimke and Colonna (2016), the interpretation of subject pronouns was investigated in native speakers of Turkish (a null-subject language) who learned French (a non-null-subject language) as an L2. The results showed that L2 speakers relied more on discourse cues when interpreting pronouns than native speakers.
Three existing studies on learners of non-null-subject languages present important findings for the purpose of the present research: , Cunnings et al. (2017), and Roberts et al. (2008). In an off-line sentence comprehension task and an eye-tracking reading task, Roberts et al. (2008) examined the interpretation of Dutch pronominal subjects in two groups of L2 learners of Dutch while reading sentences similar to "Hans and Peter are in the office. While Peter is working, he is eating a sandwich". The critical question was which referent would be associated with an ambiguous overt pronoun (in italics in the example). One group's L1 was Turkish (a null-subject language) and the other group's L1 was German (a non-null-subject language). The results revealed a difference between the two L2 groups in the off-line interpretation of Dutch subject pronouns, showing that the L1 Turkish-L2 Dutch participants, but not the L1 German-L2 Dutch speakers, optionally treated overt pronouns as signaling a topic shift, as in their L1. The eye tracking results revealed that both L2 groups experienced a processing cost while reading the ambiguous overt pronoun. The results of the eye-tracking task were interpreted by Roberts et al. (2008) as a difficulty integrating discourse pragmatic contextual information during online processing. The authors suggested that processing multiple sources of information, as in the case of pragmatic and syntactic information during anaphora resolution, might be challenging for L2 learners relative to native speakers.
Relevant to the study presented here, Cunnings et al. (2017) explored spoken language processing of anaphoric expressions by testing intermediate learners of English whose L1 was Greek (a null-subject language) in a visual word paradigm task. These researchers showed that the L2 participants did not experience a processing disruption when listening to simple sentences with either a gender ambiguous or unambiguous pronoun in English (e.g., After Peter spoke to Mrs./Mr. Jones by the till in the shop, he paid for the expensive ice cream that looked tasty), and indeed the early processing measures showed similar use of the first mention bias in the two groups. Unlike Roberts et al. (2008), Cunnings et al. did not find a cost associated with the processing of the pronoun in L2 speakers compared to native speakers.
In , pronoun resolution was examined in Spanish native speakers whose L2 was English. Intermediate learners of English whose L1 was Mexican Spanish demonstrated successful use of the first mention bias in simple intra-sentential anaphora sentences, (e.g., Yolanda met Josefina while she was in high school; see also Contemori 2019 for similar results). However, when the context complexity increased (i.e., when two equally prominent referents are presented in the discourse using a conjoined NP), L2 learners showed less consistent interpretations in comparison to native English speakers. Contemori et al. used a set of off-line comprehension tasks and did not measure the online processing of pronouns in L2 English speakers with Mexican Spanish as the L1. Thus, while L2 participants could have achieved accurate performance on simple intrasentential anaphora, an open question is whether the learners would experience a processing cost associated with the integration of syntactic and discourse information during anaphora resolution, as in Roberts et al. (2008). The present study aims to fill this gap by testing L2 learners of English whose L1 is Spanish. 2 Additionally, while previous research looking at pronoun resolution in L2 English speakers included participants with intermediate levels of proficiency (Cunnings et al. 2017;Contemori 2019;), none of the previous studies on L2 English has tested participants with high proficiency in the L2. This is the goal of the experiment described here. We examine spoken language processing of anaphoric expressions by recruiting a group of highly proficient L2 learners of English and ask whether proficiency in the L2 is associated with a native-like processing of anaphora.
Note that the Interface Hypothesis predicts that in a non-null-subject language, L2 speakers should interpret overt pronouns like native speakers (e.g., Sorace 2011). While in null-subject languages the null pronoun signals topic continuity, in non-null-subject languages the overt pronoun is the default referential form that refers to the current discourse topic. Thus, in a non-null-subject language, it is expected that L2 speakers may achieve native-like competence when interpreting overt pronouns (e.g., Sorace 2011).

Aims and research questions
Past research examining anaphora resolution in learners of non-null-subject languages have used a variety of experimental methods and materials, including the recording of eye movements and off-line sentence comprehension tasks (Roberts et al. 2008;Wilson 2009;Ellert 2013;Contemori & Dussias 2016;2019;Schimke & Colonna 2016;Cunnings et al. 2017;Contemori 2019). Some of the results have demonstrated that L2 learners can achieve native-like processing and native-like off-line interpretation of pronouns when the preceding context is relatively simple (Cunnings et al. 2017;Contemori 2019). In the present research study, we investigate the processing of pronouns in L2 speakers of English by testing participants whose L1 is Mexican Spanish. We examine L2 speakers' ability to comprehend English pronouns that are ambiguous and unambiguous in contexts in which participants can use (reliably and unreliably) the first-mentioned bias and semantic gender information on the pronoun (Cunnings et al. 2017). The materials employed in the task are relatively simple in structure, with one of the antecedents (i.e., the subject in the main clause) always as a highly accessible referent. We explore whether gender information and the first-mention bias have rapid online effects during pronoun interpretation by monitoring participants' eye movements. We test a group of highly proficient speakers of English to assess the attainment of native-like processing routines during L2 anaphora resolution.
The majority of previous studies investigating anaphora resolution in L2 speakers have focused on intra-sentential anaphora, i.e. the case in which the anaphora is presented in the same sentence where the antecedents are introduced. In the present study, the stimuli include inter-sentential anaphora, i.e., the case in which the pronoun is presented in a separate sentence from the antecedents (see Cunnings et al. 2017 for the use of similar stimuli). We do not exclude that the position of the anaphora may have an impact on the pattern of comprehension in L2 speakers. However, we do not discuss this point here as it is beyond the scope of the present paper (see  for related discussion on the comprehension of inter-sentential and intra-sentential anaphora in L2 speakers of English).
For native speakers of English, we predict that participants' interpretations will be driven by the first-mention bias. When gender on the pronoun is informative, native speakers should rapidly integrate this cue to override the initial subject interpretation, even when gender and discourse cues are pitted against each other (e.g., Arnold et al. 2000). For the L2 speakers, participants may experience a processing cost when interpreting referring expressions in real-time, given the purported challenge that they experience integrating multiple sources of information (e.g., lexical, syntactic discourse), as in Roberts et al. (2008). In this case, we expect to find a processing penalty associated with the integration of different sources of information (i.e., semantic gender, syntactic and contextual information), as reported in Roberts et al. (2008) with learners of Dutch. In the visual word paradigm, the processing penalty could emerge as a delayed increase in looks to the target picture when L2 speakers listen to the subject pronoun. A similar performance could also result from L1 cross-linguistic interference. In Spanish, the L1 of the participants, overt pronouns can be interpreted as referring to the non-topic antecedent (see Filiaci et al. 2014 for a discussion about the strength of this bias). Thus, L2 speakers may consider the second antecedent as referent for a pronoun in English if they experience a conflict due to cross-linguistic interference. This effect may emerge as fewer looks to the target picture in the L2 speakers compared to the native speakers upon hearing the pronoun. However, if L2 speakers have successfully acquired the first-mention bias and can rapidly use semantic gender information, they may perform similarly to native English speakers when processing pronouns in simple inter-sentential anaphora context, as in Cunnings et al. (2017). In this case, according to the predictions of the Interface Hypothesis, L2 speakers should demonstrate attainment of native-like interpretation of the default referring expression signaling topic continuity in English (e.g., Sorace 2011).

Participants
Twenty-eight monolingual English speakers (15 females; mean age: 20; SD: 2) and twentyfour L2 English speakers (L1 Spanish) (13 females; mean age = 23.5; SD: 4.5) participated in the experiment. One native speaker was discarded due to a coding error in the eyetracking task, and two participants were excluded for high track loss in the eye-tracking data (more than 50%). The remaining twenty-five monolingual speakers were students at a large North-American university at the time of testing, and received course credits for their participation. Monolingual participants completed a Language History Questionnaire (Marian, Blumenfeld & Kaushanskaya 2007) to ensure that English was the only language that they spoke fluently. When participants reported knowledge of a language other than English, they indicated minimal knowledge of the second language.
English L2 speakers were undergraduate and graduate students at the same institution, and received compensation for their participation. They were born in a Spanish-speaking country (Central/South America) and moved to the US at different times in their lives. They had varying age of exposure to English and were highly proficient in English, as measured with a Michigan English Language Institute College English Test (MELICET). The MELICET includes 50 grammar questions in the form of multiple-choice questions. Only those participants who scored at least 38 out of 50 were invited to participate in the experiment. As shown in Table 1, the average scores for the MELICET test was 43 (range: 38-47; SD = 3). The L2 participants also completed a Language History Questionnaire (Marian et al. 2007) measuring their self-assessed proficiency and general linguistic background. The information from the Language History Questionnaire is reported in Table 1.

Materials
A visual word paradigm task was created that manipulated three variables: Gender of the pronoun (ambiguous or unambiguous), Order of Mention of the pronoun's antecedent (the antecedent was either the first or the second mentioned noun in the previous sentence), and Group (L2 speakers vs. native speakers). In the task, participants heard a sentence
(9) Different Gender -First Mention: A builder (male) saw a doctor (female) by the door. He briefly thanked the doctor for her help.
(10) Different Gender -Second Mention: A builder (male) saw a doctor (female) by the door. She briefly thanked the builder for his help.
(11) Same Gender -First Mention: A builder (male) saw a doctor (male) by the door. He briefly thanked the doctor for his help.
(12) Same Gender -Second Mention: A builder (male) saw a doctor (male) by the door. He briefly thanked the builder for his help.  We created a total of eighty experimental item sets; an item set consisted of four sentences, representing four conditions, as shown in (9)-(12). Four counterbalanced lists containing twenty experimental sentences each (each list containing five items per condition) were created in such manner that each participant saw just one version of the same item. In addition, twenty fillers were included in each list, half containing a third person singular, and half containing a third person plural pronoun. The filler sentences were followed by a comprehension question to ensure that participants were paying attention to the stories. Participants answered the comprehension question verbally, and the answers were recorded and later transcribed by the experimenter. A minimum of 80% of the questions were answered correctly by each of the participants.

Norming of the stimuli
The sentences created for the experiment were normed for naturalness by twenty-seven monolingual English speakers who were recruited at a large North-American University and through Amazon Mechanical Turk. Participants were presented with a list of sentences and were asked to rate how natural the sentences sounded to them on a scale from 1 to 5, where 1 was a sentence completely unnatural or odd, and 5 was a normal everyday  sentence of English. Experimental sentences contained a pronoun followed by an adverb, a main verb, a noun phrase, and an adjunct (e.g., He/briefly/thanked/the doctor/for her help). Sentences with a minor semantic/syntactic ungrammaticality were also included as fillers (e.g., He made a birthday party for his son last week). The norming task was implemented as a Qualtrics Survey. Participants answered questions about their language background at the beginning of the survey. Participants who reported being fluent in a language other than English were discarded (2 participants). Only sentences that were unanimously rated 4-5 were selected for further norming.

The preparation of visual stimuli
Sentences selected from the first round of norming were combined to a panel that included two pictures that were good representations of the characters mentioned, and a distractor picture that appeared in the background, as illustrated in Figures 1-4. The female and male characters were presented as full noun phrases (NPs) (e.g., the doctor, the builder, in Figures 1-4). The sentence-panel pairs were normed to ensure naming agreement and the absence of any interpretation bias (e.g., one character was more likely to perform an action than the other). Participants in the sentence-panel norming task were presented with a panel and were asked to decide which among four options best described the panel.
The four options always presented two versions of the experimental sentence, with the subject-object roles reversed, and a "both" and "neither" option, as shown in (a)-(d): (a) The builder briefly thanked the doctor for his help (b) The doctor briefly thanked the builder for his help (c) Both (d) Neither Fillers were also included in the sentence-panel norming task where the panel fit the description of one of the sentences presented. Experimental and filler sentences were randomized and presented in two lists for the norming. The task was implemented as a Qualtrics Survey. Twenty-one monolingual English speakers were recruited through Amazon Mechanical Turk. Participants answered questions about their language background at the beginning of the survey; they reported being fluent only in English. The sentence-panel pairs were selected on the basis of participants' selection of the options "both" and "neither." Only the pairs for which "both" and "neither" were cumulatively chosen at least 70% of the time were later used in the eye-tracking experiment. In the experiment, target and competitor pictures chosen from the norming task appeared either on the left or on the right of a computer screen, and the positions were counterbalanced. A distractor picture always appeared in the same position on the screen (center-back), as illustrated in Figures 1-4.

Preparation of auditory stimuli
The experimental stimuli consisted of a context sentence where the two characters were introduced, and an experimental sentence where an ambiguous/unambiguous pronoun was present. A female native speaker of English recorded the sentences that were presented auditorily in a sound-attenuated room. Across conditions, the experimental sentences had similar length (mean duration: 3450 ms; SD: 300 ms). A 300 ms pause consisting of ambient noise was inserted between the context sentence and the sentence containing the pronoun.

Procedure and coding
Stimuli were presented on a monitor using a desktop mounted Eyelink 1000 that records eye movements at a 1000 Hz sampling rate. The eye tracker was calibrated and validated for each participant. A fixation cross was displayed on the screen prior to the start of each trial. After the fixation, participants saw three pictures on a screen (Target, Competitor and Distractor) and listened to a short story through audio speakers, while the two pictures remained on the screen. The short story was followed by the experimental sentence.
To analyze the looking behavior in relation to the verbal and visual stimuli presented, we defined three spatial areas of interest (AOI), corresponding to the size of each of the pictures presented on the monitor (target, competitor, distractor picture). Eye movements were time-locked to the onset of the pronoun presented in the second sentence (he/she). The eye-movement data were analyzed starting from 200 ms after the onset of the pronoun to account for the time it takes to program a saccadic eye-movement (Matin et al. 1993), and ending 1500 ms after. The 1500 ms after the onset of the pronoun corresponded approximately to the onset of the second referent (e.g., the model/the tailor, Figures 1-4). The independent variables included in the analysis are Gender with two levels (informative, uninformative), Order of Mention with two levels (first vs. second mentioned referent) and Group with two levels (native speakers vs. L2 speakers). Trials with combined total looking times to the competitor and target of less than 30% of the trial duration were discarded, amounting to 3.5% of the data. Looks to the Target picture were analyzed using growth curve analysis, which has the advantage of evaluating changes in fixations over time. Previous research has suggested that the use of ANOVA is not adequate for the analysis of visual word paradigm data (e.g., Mirman, Dixon & Magnuson 2008). For example, growth curve allows a fine-grained analysis of the timecourse by modeling the dependent variable as a function of time. Growth curve was selected over ANOVA because when ANOVA is used for visual word data, time is treated inappropriately because statistical analyses are applied to aggregate looks for separate windows of time (Mirman et al. 2008). Additionally, while between-participant variability in visual word data is not properly accounted for with ANOVA, growth curve analysis has the advantage of including parameters that characterize individual differences (Mirman et al. 2008). While there is still disagreement on which is the most rigorous statistical method to analyze visual-world data, growth curve analysis has increasingly been adopted in the field and is viewed as an adequate technique (e.g., Mirman 2014). Figure 5 shows the looks to the Target picture by Gender and Order of Mention in the native speakers and L2 group.

Results
Based on visual inspection of the data, Time was coded using a restricted quartic spline (Harrell 2001) with five knots. 3 The three components of the spline were de-correlated using principal components analysis. The fixed effects were Gender, Order of Mention and Group. By-Item random effects were not included because the proportions of looks were aggregated by participant across conditions. The results of the analysis are presented in Table 2.
The analysis reveals a main effect of Time and Gender, and interactions between Gender and Group, Time and Gender, and Time and Group. Also, the interaction between Time, Gender and Order of Mention approached significance. Interactions with Time were assessed on the basis of the model estimates shown in Figure 5. 4 The Gender by Group interaction indicated a higher amount of looks to the Target in the gender-different conditions (0.5) compared to gender-same conditions (0.36) in the native speaker group. No such difference emerges in the bilingual group (gender-different condition: 0.42; gender-same condition: 0.37). The Time by Group interaction shows progressively more looks to the Target in the native speaker group compared to the L2 group between 800-1200 ms, reaching a proportion of 0.51 at 1000 ms after the onset of the pronoun. For the L2 speakers, the increase in looks to the Target reaches a peak of 0.50 towards the end of the time window analyzed, between 1200 and 1600 ms after the onset of the pronoun. The Time by Gender interaction shows an increase in looks to the Target for both groups in the Gender-different condition compared to the Gender-same condition. The increase in looks peaks between 800 and 1300 ms after the onset of the pronoun and the cumulative mean of looks for both native speakers and L2 participants reaches a proportion of 0.53 in the Gender condition compared to 0.39 in the Gender-same condition. The interaction demonstrates a strong Gender effect on the interpretation of the pronouns that emerges early in the time window considered for both native speakers and L2 participants.
Finally, the three-way interaction between Time, Gender and Order of Mention approached significance, indicating that a difference between the two first mention conditions emerged earlier (around 700 ms) compared to the difference between the two second mention conditions (around 900 ms), as a function of gender. Therefore, the three-way interaction approaching significance suggests that the effect of Gender emerges earlier,  when the pronoun refers to the first mentioned entity compared to when it refers to the second mentioned entity. The lack of an interaction between Time, Order of Mention and Group suggests that the participants were equally guided by the first mention bias in the early time window following the onset of the pronoun. Additionally, the interaction between Time, Gender and Order of Mention approaching significance indicates that when gender on the pronoun is informative, the Gender effect is somewhat boosted by the first mention bias in both groups.

General Discussion and Conclusion
The results of the experiment did not show major differences between the native speakers and the L2 speakers. The processing data suggest that L2 speakers process pronouns in English similarly to native speakers when gender on the pronoun and the first-mention bias are manipulated. No delays were observed in looks to the target for the L2 speakers in comparison to the native speakers. Native and L2 speakers had a preference for interpreting a gender ambiguous pronoun as referring to the first mentioned entity in the preceding discourse. Additionally, both groups used the informative gender cue to quickly resolve a gender unambiguous pronoun, as in Arnold et al. (2000). The only difference found between the two groups in the eye-tracking results is a somewhat attenuated effect of gender in the L2 participants, showing that overall, the L2 participants looked significantly less to the target picture when gender on the pronoun was informative in comparison to the conditions in which gender was not informative. In terms of the first-mention bias, the results showed that the processing of ambiguous and unambiguous pronouns was equally driven by the interpretative bias, and that no substantial delays were observed in the L2 data relative to the native speaker group. While semantic gender is expressed in overt pronouns in Spanish (the L1 of the participants), the first mentioned-bias is an interpretation preference unique to English. Thus, L2 speakers have to learn through experience that third person pronouns frequently co-refer with the subject of a previous sentence. Our results demonstrate that highly proficient L2 speakers are sensitive to this pattern of occurrence and can also use this bias during processing in a native-like fashion.
Our results are in line with the study by Cunnings et al. (2017) in which similar materials were used. Cunnings et al. showed that intermediate learners of English whose L1 was Greek did not experience a cost when processing pronouns in the L2. By replicating Cunnings et al.'s results with a group of highly proficient L2 speakers who speak a different L1 (Spanish), our results strongly support the proposal that L2 speakers show similar underlying processes of pronoun resolution as native speakers, in inter-sentential anaphora when the first mention bias guides the interpretation, and where the use of gender can override the use of the bias.
Differently from Roberts et al. (2008), the results of our eye-tracking task did not show any delayed processing or processing cost in the L2 speakers when the pronoun was encountered, but only an attenuated use of the gender information during pronoun resolution (see also Cunnings et al. 2017). In the eye-tracking study by Roberts et al. (2008), a processing penalty on the pronoun was found for a group of proficient learners of Dutch whose L1 was Turkish (a null-subject language) and a group whose L1 was German (a non-null-subject language). However, Roberts et al. used sentences in which two referents introduced in the discourse had similar degrees of salience (e.g., Peter and Hans are in the office), and followed them by sentences that contained a potentially ambiguous pronoun that native speakers resolved locally (While Peter is working, he is eating a sandwich). Our experimental manipulation was aimed at testing different aspects of the processing of pronouns in English (i.e., semantic gender and use of the first-mentioned bias) in sentences with inter-sentential anaphora that are not globally ambiguous. However, we do not exclude that the processing of pronouns can be more costly in the L2 compared to the L1 when the complexity of the context is increased, as shown by Roberts et al. (2008) and Contemori et al. (2019). The accessibility of the referents is an important factor in pronoun resolution that has to be carefully evaluated to interpret anaphora successfully. When two potential antecedents have a similar degree of accessibility, as in Roberts et al.'s stimuli, calculating the discourse status of the referents can be more difficult. The complexity of the discourse is an aspect of anaphora resolution that requires further attention in future research, as it explains the difficulty with the interpretation of anaphoric expressions found in previous L2 studies, as shown by .
As suggested by Sorace (2011), the syntactic dependencies between a null pronoun and its antecedent (i.e., an instance of topic continuity) are less challenging to acquire for L2 speakers of Romance languages than discourse dependencies between an overt pronoun and its antecedent (i.e., an instance of topic shift). Analogously, L2 speakers of non-nullsubject languages should be able to master the interpretation of dependencies between an overt pronoun and its antecedent (i.e., an instance of topic continuity) like native speakers (see Wilson 2009; Ellert 2013 for additional discussion). This prediction is met in our study, showing that highly proficient L2 learners do not experience a processing cost when interpreting ambiguous or unambiguous overt pronouns in English.
In our study, we did not test a group of L2 speakers whose L1 and L2 have a similar set of referring expressions and interpretation biases. In addition, we explored an anaphoric dependency that seems to be fully acquired by L2 speakers (e.g., Cunnings et al. 2017;Contemori 2019;). Thus, we cannot directly address the question of whether the optionality observed with anaphoric structures is due to cross-linguistic interference or whether it is a general effect of bilingualism (e.g., Sorace 2011). Future research should make comparisons relevant to this question.
To conclude, in the present study we found that under certain circumstances (i.e., when the complexity of the discourse is relatively low), pronoun resolution in the L2 is not associated with a processing cost. Furthermore, even when the two languages of an L2 speaker have different sets of referring expressions and interpretation biases, as is the case in Spanish and English, highly proficient L2 speakers can make use of native-like processing strategies during anaphora resolution.