Noun-phrase production as a window to language selection: An ERP study

Characterising the time course of non-native language production is critical in understanding the mechanisms behind successful communication. Yet, little is known about the modulating role of cross-linguistic influence (CLI) on the temporal unfolding of non-native production and the locus of target language selection. In this study, we explored CLI effects on non-native noun phrase production with behavioural and neural methods. We were particularly interested in the modulation of the P300 as an index for inhibitory control, and the N400 as an index for co-activation and CLI. German late learners of Spanish overtly named pictures while their EEG was monitored. Our results indicate traceable CLI effects at the behavioural and neural level in both early and late production stages. This suggests that speakers faced competition between the target and non-target language until advanced production stages. Our findings add important behavioural and neural evidence to the underpinnings of non-native production processes, in particular for late learners.


Introduction
From the speaker's perspective, producing a determiner-noun phrase (NP) e.g., [the flower] seems an effortless operation. However, according to current models, word production is a complex, multi-stage process. For example, the influential LRM model (Levelt et al., 1999) describes three primary stages of single word production. In a picture naming task, first, the depicted object is conceptualised; second, the concept is lexicalized, i.e., it is given a grammatical, phonological and phonetic form; and finally, the name of the object is articulated. Current research has increasingly focused on characterising the time course of word production. For example, Indefrey and Levelt (2004) and Indefrey (2011) combined behavioural, chronometric and electrophysiological evidence to estimate the time course of each stage in native language word production (Abdel Rahman and Sommer, 2008;Camen et al., 2010;Cheng et al., 2010;Hanulová et al., 2011;Laganaro et al., 2009;Rodriguez-Fornells et al., 2002;Schiller, 2006;Schmitt et al., 2001;Van Turennout et al., 1997;Zhang and Damian, 2009), see Fig. 1.
Electroencephalography (EEG) and event-related potentials (ERPs) are particularly valuable tools to explore the neuro-cognitive processes of native language production. More specifically, the EEG signal yields an implicit measure of the neural signature and the time course of each individual production stage (Aristei et al., 2011;Bürki and Laganaro, 2014;Costa et al., 2009;Eulitz et al., 2000;Habets et al., 2008;Hoshino and Thierry, 2011;Valente et al., 2014). For example, Bürki and Laganaro (2014) found that producing French NPs, such as [le chat] "the cat" or NPs including an adjective [le grand chat] "the big cat", in comparison to a bare noun [chat] "cat" was linked to the topographic stability of the EEG signal between 190 ms and 300 ms and following 530 ms post-stimulus onset. These findings were interpreted as a longer duration of lexical retrieval (lemma retrieval in LRM terms) and phonological encoding for NPs compared to bare nouns. They were corroborated by longer naming latencies for bare nouns and NPs compared to NPs including an adjective Lange et al., 2015;Schriefers et al., 1999). Lexical retrieval has been previously associated with lexical access and grammatical gender processing (Alario and Caramazza, 2002;Badecker et al., 1995;Bürki and Laganaro, 2014;Levelt et al., 1999;Strijkers et al., 2010). In contrast, phonological encoding was described as the processing of the phonological code of the word and its subsequent syllabification (Levelt et al., 1999). Grammatical gender, hereafter gender, is a noun classification system (Corbett, 1991). More specific to gender processing in Romance languages, research suggested that the activation and selection of determiners in NPs occurred both during lexical retrieval and at the early part of the consecutive phonological encoding (Alario and Caramazza, 2002;Bürki et al., 2016;Miozzo and Caramazza, 1999;Sá-Leite et al., 2020). As shown in Bürki et al. (2014), in these languages producing the phonological forms of determiners and adjectives is partially dependent on the phonological form of the noun, e.g., Spanish [la F taza F roja F ] vs. English [the red mug] (Miozzo and Caramazza, 1999;Sá-Leite et al., 2020;Schriefers, 1992Schriefers, , 1993. The work by Bürki and Laganaro (2014) and similar studies (Eulitz et al., 2000;Habets et al., 2008;Koester and Schiller, 2008;Lange et al., 2015) characterised the time course of native language word production. However, the time course of non-native production continues to be a complex issue in multilingualism research, especially with respect to the locus of target language selection (Costa et al., 2009;Hanulová et al., 2011;Hoshino and Thierry, 2011;Strijkers et al., 2010). In light of the increasing proportion of non-native speakers and multilingual communities (Berthele, 2021), the need to further characterise the individual production stages in non-native production has become more urgent.
In this study, we build upon the theoretical models of native speaker single word production and empirical findings on native NP production Indefrey, 2011;Indefrey and Levelt, 2004;Levelt et al., 1999). We specifically concentrated on the time course of those production stages preceding the articulation stage, namely lexical retrieval and phonological encoding. Collecting behavioural and EEG measures, we examined the overt production of determiner + noun NPs in the non-native language Spanish, e.g., [la flor] "the flower", by native speakers of German.
Producing utterances can demonstrably be more challenging in the non-native than in the native language (Pivneva et al., 2012;Runnqvist et al., 2011). Studies have found longer and more variable naming latencies in the non-native compared to the native language (Gollan et al., 2005;Hanulová et al., 2011;Ivanova and Costa, 2008;Kroll et al., 2006). These quantitative differences between native and non-native speech production were reported for various levels of language proficiency (Christoffels et al., 2007;Ivanova and Costa, 2008;Sholl et al., 1995), and for language pairs with varying phonological and orthographic similarity, e.g., intermediate German learners of Dutch (Christoffels et al., 2007) and highly proficient Greek learners of English (Parker-Jones et al., 2012). The question which arises at this point is: where does this delay in naming latencies originate from? Recent studies explored word frequency and age of acquisition (AoA) as modulating factors of the non-native production processes (see Hanulová et al., 2011;for discussion). In this study, we focus on another factor that could influence the time course of non-native short utterance production, namely cross-linguistic influence (CLI), as described in section 1.1. By extension, we aim to explore the following question crucial to non-native production research: during which production stage does the delay in naming latency occur? In section 1.2, we outline the electrophysiological correlates of CLI in more detail and we discuss how they offer us an insight into these issues.

Cross-linguistic influence
Non-native speakers face cross-linguistic influence (CLI) during language production and language comprehension (Cárdenas-Hagan et al., 2007;Ganushchak et al., 2011b;Lemhöfer et al., 2008;Morales et al., 2016;Müller and Hulk, 2001;Thierry and Wu, 2007;von Grebmer zu Wolfsthurn et al., 2021). Broadly speaking, CLI is the interaction of the languages within a multilingual system and its influence on the underlying cognitive processing mechanisms. CLI supports the notion that the native and non-native language are co-activated during language production (Guo and Peng, 2006;Hermans et al., 1998;Kroll et al., 2008, Lee andWilliams, 2001). Co-activation and CLI are rooted into theoretical models. For example, the Revised Hierarchical Model (RHM; Kroll and Stewart, 1994) postulates a conceptual level and separate lexical levels for the native and non-native language with strong lexical connections between the two languages. Critically, the model also suggests that the strength of the connections is modulated by proficiency: as non-native proficiency increases, the connection strength between the non-native lexicon and the conceptual level increases and the involvement of the L1 becomes less prominent (Kroll and Stewart, 1994).
For a speaker to successfully complete a naming task in either the native or non-native language, it is crucial that co-activation and CLI are resolved prior to articulation. A robust finding in the CLI and language selection literature is the presence of an inhibitory control system which mitigates CLI effects and effectively inhibits the non-target language (Abutalebi and Green, 2007;Green, 1998), but see (Verdonschot et al., 2012). The mitigation of CLI effects and the associated increased cognitive effort is evident at the neural level. For example, increased activation of brain areas involved in language production for non-native compared to native language production was linked to increased error monitoring of competing representations during CLI (Parker-Jones et al., 2012;Rodriguez-Fornells et al., 2005;Rossi et al., 2018).
Previous studies on CLI have almost exclusively focused on early acquisition and intermediate to high proficiency levels in the non-native language, thereby leaving a systematic gap in the exploration of the effects of CLI on the time course of NP production in late language learners with lower proficiency levels (Costa et al., 2003;Hoshino and Thierry, 2011;Lemhöfer et al., 2008). Yet, this is a critical issue because studies suggested that proficiency impacted language-related neurocognitive mechanisms in multilinguals, shown in that CLI effects were more pronounced at lower proficiency levels (Bosch and Unsworth, 2020;Heidlmayr et al., 2021;Steinhauer et al., 2009;Van der Meij et al., 2011;White et al., 2017;Yip and Matthews, 2007). For example, Sunderman and  found that compared to lower proficient learners, highly proficient English learners of Spanish were less susceptible to CLI from the native language, and performed better in a picture naming task. Costa and Santesteban (2004) previously proposed that during production, highly proficient speakers activated only the lexical entry from the target language, thereby effectively avoiding CLI during lexical retrieval. Therefore, in this study we directly focused on CLI effects in a group where first, CLI was found to be most prevalent; and second, is frequently understudied in the literature, namely late language learners with intermediate proficiency levels. We defined late language learners as having acquired an additional (non-native) language later in development (AoA >12 years), see Rossi et al. (2006). Moreover, and in contrast to highly proficient late language learners, Fig. 1. Estimated time course of single word production in the native language according to Indefrey and Levelt (2004) and Indefrey (2011). 11 . our group was further characterised by less than three years of exposure to the non-native language and intermediate proficiency levels in the B1/B2 range according to the Common European Framework of Reference for Languages, CEFR (Council of Europe, 2001).
Immediately relevant to CLI effects is the question about when the target language is selected in non-native production. For example, is the target language selected (and CLI resolved), prior to lexical retrieval? Or instead, does CLI carry over to later production stages, such as phonological encoding? Current debates remain inconclusive with respect to two accounts of the locus of target language selection (Costa et al., 2006;Hanulová et al., 2011;Hoshino and Thierry, 2011;Sá-Leite et al., 2019). One account suggests that lexical entries from both the target and non-target language are activated, but only the lexical entry corresponding to the target language is selected for subsequent phonological processing (Gollan et al., 2005;Hermans et al., 1998;Lee and Williams, 2001). Under this account, CLI is resolved at lexical retrieval. On the other hand, a second account suggests that the lexical entries from the target and non-target language are both activated and selected for phonological encoding (Christoffels et al., 2007;Colomé, 2001;Costa et al., 2000;Green, 1998;Hoshino and Kroll, 2008;Pulvermüller, 2007;Rodriguez-Fornells et al., 2005). Within this perspective, CLI is not resolved at lexical retrieval, but continues into subsequent phonological processing.
In order to discriminate between these two accounts, in this study we focus on two linguistic phenomena representing CLI, the gender congruency effect and the cognate facilitation effect. These effects provide us with further insight into the underlying production stages and their inner mechanisms in non-native NP production (sections 1.1.1. and 1.1.2., respectively).

The gender congruency effect
The gender congruency effect is reflected in faster processing of congruent versus incongruent nouns, as reported in language production studies (Bordag and Pechmann, 2007;Lemhöfer et al., 2008;Morales et al., 2016;Paolieri et al., 2019Paolieri et al., , 2020Schiller and Caramazza, 2003;Schiller, 2006;Schiller and Costa, 2006;Schriefers, 2003). Congruent nouns have similar grammatical gender values across languages, for example the lexical items for the concept "arm", which are masculine in German [der M Arm] and in Spanish [el M brazo]. In contrast, incongruent nouns have dissimilar gender values across languages, for example the lexical items for "key", which are masculine in German [der M Schlüssel] and feminine in Spanish [la F llave]. Gender systems can vary across languages. For example, German has three gender values, i.e. feminine, masculine and neuter, and their distribution is not equally distributed across all lexical items (Schiller and Caramazza, 2003). On the other hand, Spanish is characterized by a feminine-masculine gender value distinction with an approximately balanced distribution (Bull, 1965;Eddington, 2002). 2 As previously discussed, gender processing in Romance languages was linked to lexical retrieval and phonological encoding (Alario and Caramazza, 2002;Badecker et al., 1995;Bürki and Laganaro, 2014;Miozzo and Caramazza, 1999). Subsequently, this links the gender congruency effect to these two production stages. Therefore, the gender congruency effect offers a gateway to the following three issues: first, it allows us to observe the mechanisms underlying CLI of the gender systems during lexical retrieval; second, it provides us with a way to study the implications of CLI of the gender systems on the time course of non-native NP production; and third, it allows us to explore the locus of target language selection with respect to the two accounts regarding target language selection (Hoshino and Thierry, 2011;Sá-Leite et al., 2019). If the target language was selected prior to lexical retrieval in multilingual language production, we would not observe a gender congruency effect during subsequent production stages. Alternatively, if the target language was not selected before lexical retrieval, we would observe a gender congruency effect because the activated lexical entries from both languages would be subject to CLI of the gender systems during gender processing. As a result, CLI would facilitate the processing of congruent nouns compared to incongruent nouns. Current behavioural evidence supports this notion in late language learners with intermediate proficiency compared to early learners with high proficiency (Bordag and Pechmann, 2007;Costa et al., 2003;Sá-Leite et al., 2020).
Yet, few studies have investigated the gender congruency effect from a neural perspective (Heim et al., 2009). One ERP component associated with this effect is the N400 (Paolieri et al., 2020;Wicha et al., 2003). It is reflected in a negative voltage amplitude peak around 400 ms post-stimulus onset and was previously linked to lexical-semantic integration and lexical co-activation (Chen et al., 2017;Hoshino and Thierry, 2011;Kutas and Federmeier, 2011;Lau et al., 2008;Leckey and Federmeier, 2019). In relation to the gender congruency effect, less negative N400 amplitudes were linked to congruent trials compared to incongruent trials in the time window between 300 ms and 500 ms post-stimulus onset in a translation-recognition task (Paolieri et al., 2020; see also Wicha et al., 2003). Given the temporal characteristics of these neural correlates of the gender congruency effect, this suggests that both co-activation and CLI between the languages may remain unresolved until around 500 ms post-stimulus onset. In LRM terms, this time window coincides with phonological encoding. Therefore, the behavioural and ERP findings related to the gender congruency effect suggest that the target language is not selected prior to lexical retrieval. However, from these findings it remains unclear whether CLI is resolved upon termination of gender processing or whether it indeed continues into phonological encoding stages.
Regarding the specific question about the locus of target language selection, there are two possible scenarios. In the first scenario, the target language is selected after the completion of gender processing. Subsequently, only the lexical entry from the target language carries over to phonological encoding. In contrast, the second scenario postulates that the target language is selected during or after phonological encoding. Here, the lexical entries from both the target and non-target language are processed for phonological encoding. This implies that both languages remain active after the completion of gender processing and that CLI potentially results in further delays during later phonological processing. Evidence by Hoshino and Thierry (2011) preliminarily supported the latter notion. In a picture-word interference (PWI) task with highly proficient Spanish learners of English, the EEG signal was modulated during lexical retrieval for semantically and phonologically related trials compared to unrelated trials. However, they found no further modulation of the signal after 400 ms post-stimulus onset. The authors interpreted these results as showing that the target language had not been selected at lexical retrieval, but that the selection had taken place by 400 ms post-stimulus onset. In LRM terms, this time window coincides with phonological encoding. Therefore, these results supported the second scenario whereby first, lexical entries from both the target and non-target language were selected for phonological processing, and second, CLI continued beyond lexical retrieval (Christoffels et al., 2007;Colomé, 2001;Costa et al., 2000;Hoshino and Kroll, 2008;Hoshino and Thierry, 2011;Pulvermüller et al., 2009;Rodriguez-Fornells et al., 2005). Building on the work by Hoshino and Thierry (2011) to add further evidence for discriminating between the two scenarios outlined above, we also explored the cognate facilitation effect. Via the exploration of this effect, we probed whether or not CLI continued after gender processing in late language learners. 1 Note that Fig. 1 serves for visualisation purposes and does not claim that the production processing stages are discrete stages or follow a sequential pattern, in line with Levelt et al. (1999) and Indefrey (2011). This notion is subject to an open debate (Camen et al., 2010), which is beyond the scope of our study.
2 Note that similar labels for gender values ("masculine", "feminine") can be found across different languages. However, we do not assume that these labels are conceptually identical (see also Lemhöfer et al., 2010) but merely utilise them for descriptive purposes.

The cognate facilitation effect
In a broad sense, cognates are words with a large degree of phonological and orthographic overlap (Li and Gollan, 2018). For example, [Melone] and [melón] "melon" are examples for cognates in German and Spanish, respectively, whereas [Arm] and [brazo] "arm" are non-cognates. It has previously been shown that cognates are processed faster compared to non-cognates (Bosma et al., 2019;Casaponsa et al., 2015;Li and Gollan, 2018). This is a critical finding as it suggests that this cognate facilitation effect can be linked to the CLI during phonological encoding in production (Christoffels et al., 2007;Costa et al., 2000Costa et al., , 2005Hoshino and Kroll, 2008). In order for this effect to occur, the lexical entries from both the target and non-target language need to be subject to subsequent phonological encoding. Here, we used the cognate facilitation effect to first, study CLI during this particular production stage with respect to the overall time course of non-native production; and second, to add to the discussion of whether or not the target language is selected after gender processing.
While neural correlates of the cognate facilitation effect have been scarcely researched in non-native production, evidence from non-native comprehension links the N400 to this specific effect (Midgley et al., 2011;Peeters et al., 2013;Xiong et al., 2020). For example, Peeters and colleagues (2013) found faster latencies and smaller N400 amplitudes for cognates compared to non-cognates between 400 ms and 500 ms in French late learners of English in a lexical decision task. Contrastingly, Christoffels et al. (2007) found faster naming latencies and more negative amplitudes for cognates compared to non-cognates in fronto-central regions between 275 ms and 475 ms post-stimulus onset in unbalanced German-Dutch speakers in an overt picture naming task, without linking it to the N400. Further, studies showed that the size of the cognate facilitation effect decreased as non-native proficiency increased (Bultena et al., 2014;Casaponsa et al., 2015). Therefore, ERP evidence from the cognate facilitation effect suggests that first, the target and the non-target language are co-activated, which in turn leads to CLI of the phonological systems; second, that lexical entries from both languages are subject to phonological encoding; and finally, that the non-target language is not inhibited during lexical retrieval, particularly at lower proficiency levels. Instead, the cognate facilitation effect suggests that CLI continues beyond lexical retrieval into phonological encoding stages, as supported by the literature (Christoffels et al., 2007;Colomé, 2001;Costa et al., 2000;Hoshino and Kroll, 2008;Hoshino and Thierry, 2011).
Combining evidence from the gender congruency effect and the cognate facilitation effect, the findings presented above suggest a modulating influence of CLI on the time course of non-native production that continues beyond gender processing until phonological processing stages. However, these interpretations are debatable given the scarcity of research on CLI, in particular in terms of the neural correlates of the gender congruency effect and the cognate facilitation effect. Therefore, aside from exploring CLI effects in late language learners, we also focused on the neural underpinnings of CLI to characterise the time course of non-native production and the locus of target language selection.

Electrophysiological correlates of CLI
As discussed above, the N400 was linked to CLI effects such as the gender congruency effect and the cognate facilitation effect, but also the co-activation of languages (Chen et al., 2017;Paolieri et al., 2020;Peeters et al., 2013). In this study, we used the N400 to capture the co-activation and the linguistic aspects of CLI. Importantly, studies suggested that the N400 onset may be delayed in late language learners and that overall N400 amplitudes in these learners decrease compared to those of native speakers (Midgley et al., 2009;Weber-Fox and Neville, 1996). Some studies further suggested that N400 amplitudes were modulated by non-native proficiency, i.e., the N400 became more native-like with increasing proficiency (Midgley et al., 2009;Newman et al., 2012;White et al., 2017; but see Wood Bowden et al., 2013). For example, in a phoneme discrimination task with French low and high proficient late language learners of English, Heidlmayr et al. (2021) found a smaller N400 effect for the low compared to the high proficient group of late language learners. Therefore, with respect to our group of late language learners, the N400 is likely to be delayed, or smaller in size, compared to that in Paolieri et al. (2020) and Peeters et al. (2013).
In this study, we also focused on the P300 component to capture the cognitive mechanisms underlying the successful mitigation of CLI and the selection of the target language. The P300 is a positive-going deflection of the EEG signal with a peak around 300 ms post-stimulus onset. Early studies found that the P300 was elicited at different topographical sites (Ritter and Vaughan, 1969), leading to suggest separate P300 subcomponents. These subcomponents include the P3a, and the P3b (Barry et al., 2020;Polich, 2007). The P3a component is a positive-going wave with a fronto-central distribution which occurs around 200 ms-300 ms post-stimulus onset. In contrast, the P3b component was found in later 300 ms-400 ms time windows at centro-parietal electrodes (Hruby and Marsalek, 2003;Squires et al., 1975). Relevant to this study, the P300 has been previously linked to cognitive processes such as cognitive interference, cognitive control, working memory load and inhibition (Barker and Bialystok, 2019;Luck, 1998;Neuhaus et al., 2010;Polich, 2007). More recently, the P300 was linked to the allocation of attentional resources (Barker and Bialystok, 2019;González Alonso et al., 2020). It is typically found with inhibitory paradigms such as the Flanker task or the Oddball paradigm (Eriksen and Eriksen, 1974;Soares Pereira et al., 2019). More relevant to our study, it was also reported in paradigms which included a Flanker task preceded by a linguistic task such as code-switching or picture naming, e.g., in Bosma and Pablos (2020) and in Jiao et al. (2020). These studies highlighted the critical role of the P300 in inhibition in regulating native and non-native language use. Our experimental paradigm relies on the successful mitigation of CLI effects and the inhibition of the non-target language, therefore the P300 is a critical component to consider in this study alongside the N400.

The current study: non-native NP production
In the current study, we explore the effect of CLI on the time course of non-native NP production from a behavioural and neural perspective. The goals of the present study are twofold. First, we investigate how CLI affects behavioural measures and the EEG signal in non-native NP production. On the basis of the LRM model, we use two CLI effects to characterise the unfolding and neural signatures of individual production stages: lexical retrieval via the gender congruency effect, and phonological encoding via the cognate facilitation effect. Our second goal is to gain further insight into the process of target language selection. More specifically, we probe the locus of target language selection by investigating the two CLI effects with respect to the individual production stages.
Therefore, our research questions are: first, are there traceable effects of gender congruency (congruent vs. incongruent) and cognate status (cognate vs. non-cognate) at the behavioural and neural processing level in non-native NP production? Second, what can the temporal unfolding of processing gender congruency and cognate status tell us about the time course of non-native NP production? Finally, during which processing stage is the target language selected during non-native NP production?
We study non-native NP production in German late intermediate learners of Spanish with a B1/B2 proficiency level by employing an overt picture naming task. We combine behavioural measures of naming accuracy and naming latencies with EEG recordings. We are particularly interested in the modulation of the P300 as an index for inhibitory control, and the N400 as an index for co-activation and CLI. To obtain information about the linguistic background of our late language learners, we combine the Language Experience and Proficiency Questionnaire (LEAP-Q, Marian et al., 2007) with a Spanish vocabulary size test, the LexTALE-Esp (Izura et al., 2014). To formulate the hypotheses for this study, we rely both on the time estimates proposed by Indefrey and Levelt (2004), Indefrey (2011), Bürki et al. (2014) and Hoshino and Thierry (2011) and on the theoretical framework and discussion of ERP components outlined in the previous sections.

Hypotheses
Behavioural hypotheses. We predict effects of gender congruency and cognate status on behavioural measures of naming accuracy and naming latencies. For congruent and cognate nouns, we predict a facilitatory CLI effect, reflected in higher naming accuracy and shorter naming latencies compared to incongruent and non-cognate nouns. In turn, this has direct implications for the time course of non-native NP production.
For congruent non-cognates and incongruent cognates, we predict more subtle CLI effects. In concrete behavioural terms, we expect lower naming accuracy and longer naming latencies for congruent noncognates compared to congruent cognates, and higher naming accuracy and shorter naming latencies compared to incongruent noncognates. On the other hand, we anticipate the reverse pattern for incongruent cognates: CLI would hinder gender processing, but act as a facilitator during phonological processing and influence the time course of non-native NP production.
EEG hypotheses. We first probe the presence of a P300 or an N400 effect, as existing research remains inconclusive about whether or not we can expect both components to be elicited in our experimental paradigm. Further, we predict a modulation of P300 and N400 as a function of condition. We expect smaller P300 and N400 amplitudes for producing congruent cognates compared to incongruent non-cognates. This would reflect higher processing costs and more involvement of the inhibitory control system for the latter.
For congruent non-cognates and incongruent cognates, we predict a similar degree of processing costs. These trials are subject to both CLI facilitation and hindrance. Therefore, we do not expect significant differences between congruent non-cognates and incongruent cognates in terms of P300 and N400 amplitudes. However, we do expect the P300 and N400 amplitudes for these particular trials to be significantly larger compared to congruent cognates, and to be significantly smaller for incongruent non-cognates. Therefore, we expect the smallest P300 and N400 amplitudes for congruent cognates, followed by larger amplitudes for congruent non-cognates and incongruent cognates, and finally, the largest amplitudes for incongruent non-cognates.

Participants
Thirty-three healthy, right-handed native German speakers (twentyseven females) with a B1/B2 level of Spanish were recruited from the campus of the University of Konstanz (M age = 23.06 years, SD age = 2.47 years). At the time of testing, participants did not report any psychological or language disorders, nor visual and hearing impairments. Prior to the experiment, we provided all participants with an information sheet. Next, they signed an informed consent form before the experiment in compliance with the Ethics Code for linguistic research in the Faculty of Humanities at Leiden University. Upon termination of all tasks, participants received a debrief form, signed the final consent form and received a monetary compensation.

LEAP-Q: linguistic profile of participants
Prior to the experimental session, the linguistic profile of participants and their experience with Spanish was assessed using LEAP-Q (Marian et al., 2007); see Appendix A for details. We opted for a home-based administration of the questionnaire to minimise any self-report biases often induced by laboratory environment (Rosenman et al., 2011). The majority of participants (n = 31) reported English as their first non-native language (M AOA = 8.90, SD AOA = 1.90), while two participants learned French as their first foreign language (M AOA = 8.5, SD AOA = 2.5). A total of sixteen participants learned Spanish as second non-native language. Further, Spanish was disclosed as third non-native language by fifteen participants, and as fourth non-native language by two participants.
The mean AoA of Spanish was M AOA = 16.29 years (SD AOA = 2.39). Participants reported to be fluent in Spanish on average at M = 18.53 years of age (SD = 2.29). They started to read in Spanish at M = 17.27 years of age (SD = 3.03). Before the time of testing, almost all participants (n = 31) spent some time in a Spanish-speaking country (M = 0.96 years, SD = 0.69). On a scale from one to ten (ten being maximally proficient), participants reported a current speaking proficiency of M = 6.76 (SD = 1.00) in Spanish. Further, they classified their comprehension proficiency with M = 7.34 (SD = 0.92) and finally, reading proficiency with M = 7.18 (SD = 1.07). On a scale from zero to ten (ten being maximally exposed), participants quantified their exposure to Spanish at the time of testing with M = 5.20 (SD = 2.48). This compares to an exposure of M = 3.12 (SD = 2.31) to their first foreign language. On a daily basis, this corresponded to an exposure to Spanish of M = 10.03% (SD = 9.48) compared to the other languages. Exposure to Spanish occurred via the following contexts: interaction with Spanish native speakers, listening to Spanish radio shows, watching Spanish television, reading or self-instruction in Spanish. At the time of testing, six participants reported a self-perceived proficiency of Spanish as first nonnative language, twenty-six participants as second non-native language, and one participant as third non-native language. We used this metric as a proxy for moderate confidence levels with Spanish.
Noting here that most participants acquired one or more foreign languages prior to acquiring Spanish is important. Research on CLI effects in L3 (L4, …Ln) language processing have demonstrated that all languages within a multilingual system might affect processing in the target language (Lago et al., 2021;Lemhöfer et al., 2004;Rothman, 2015). Here, language dominance was found to be a driving factor of CLI, in that more dominant languages are linked to stronger interference, compared to less dominant languages (Francis and Gallard, 2005;Lago et al., 2021). In our study, a total of eighteen participants reported that they had acquired French, of which fourteen acquired it prior to Spanish. AoA of French was M = 11.38 years of age (SD = 1.98). Accordingly, speakers of French reported a speaking proficiency of M = 2.85 (SD = 0.87), a comprehension proficiency of M = 4.15 (SD = 1.46) and finally, a reading proficiency of M = 4.85 (SD = 1.72) on a scale from one to ten. At the time of testing, exposure to French was reported as M = 0.56 (SD = 2.72) on a scale from zero to ten. Compared to the other languages, participants were exposed to French M = 1.11% (SD = 1.41) on a daily basis. All participants who had acquired French claimed a higher self-reported proficiency for Spanish compared to French, which was reported as third non-native language following Spanish. Therefore, on the basis of previous research (e.g., Lago et al., 2021), we predict only a limited influence of French on CLI effects due to the low dominance and proficiency of speakers in this language at the time of testing. Nevertheless, we included the acquisition of French as a co-variate in our analysis to see whether this had an effect on our results. As discussed in section 3, we did not find an effect on our outcome variables.

Materials and design
Prior to the experimental session, participants were asked to complete the LEAP-Q. In the laboratory, they completed the Lextale-Esp vocabulary size test and an overt picture naming task. We measured EEG during the picture naming task.

Tasks and stimuli
The Lextale-Esp and the picture naming task were programmed in Eprime 2 (Schneider et al., 2002) and administered on a Windows 10 computer.
Lextale-Esp. We generated an E-prime version of the original Lextale-Esp task with identical instructions and stimuli. This task was used to complement self-reported measures from the LEAP-Q and the vocabulary size score was added as a co-variate to subsequent statistical analyses.
Picture naming task. Our picture stimuli were obtained from the MultiPic picture database (Duñabeitia et al., 2018). We selected the picture stimuli according to two criteria: those with the highest percentage of valid responses given by participants and those with the highest percentage of participants giving the object's exact name. Then, each picture was assigned a gender congruency type (congruent versus incongruent across German and Spanish) and a cognate status (cognate versus non-cognate across German and Spanish). The latter was based on the degree of semantic, phonological and orthographic overlap.
We  Clegg (2011). Further, we included terminal phoneme of the target noun as a co-variate and item (i.e., the individual picture) as a random effect in the statistical analyses (see sections 3.1.2. and 3.2.2. for more details).

EEG recordings
EEG data were collected via the BrainVision Recorder software (Version 1.23.0001) by Brain Products GmbH. We used an EasyCap electrode cap following a standard 10/20 montage (Appendix B). Data were measured at thirty-two channel locations via passive electrodes. We recorded the horizontal electrooculogram (HEOG) from two electrodes at the outer canthus of the left and right eye. We recorded the vertical electrooculogram (VEOG) from an electrode placed below the left eye. All electrodes were initially referenced to channel Cz, which we later reused as a data channel during re-referencing. The ground electrode was placed on the right cheek of the participant. Impedances of the electrodes were checked and configured using actiCAP Control Software (Version 1.2.5.3) by Brain Products GmbH. We kept impedances below 5 kΩ for the reference and ground electrode. For the remaining channels, impedances were below 10 kΩ. The sampling rate was 500 Hz.

Lextale-Esp
Participants first completed the Lextale-Esp task. For this task, we presented them with a fixation cross at the centre of the screen for 1000 ms. This was followed by the visual presentation of a letter string on the horizontal midline of the screen which corresponded to either a Spanish word, or a pronounceable pseudo-word. Participants were then asked to indicate via a button-press whether or not the letter string corresponded to a Spanish word. The letter string remained on the screen until the participant responded. Each letter string was only shown once. The total number of trials was 87, because we excluded three trials due to an overlap with the experimental stimuli from the picture naming task prior to the experiment. Offline, we calculated the vocabulary size score by subtracting the percentage of incorrectly identified pseudo-words from the percentage of correctly identified words for each participant (Izura et al., 2014). The maximum score was 100, whereas the minimum score varied as a function of false positives.

Picture naming task
For the picture naming task, we followed a 2 × 2 fully factorial within-subjects design with two main manipulations: gender congruency and cognate status. Half of the trials were congruent in that the gender was similar across German and Spanish. The other half of trials were incongruent, characterised by a dissimilarity in gender across German and Spanish. Further, half of congruent and incongruent pictures were cognate words, and the other half were non-cognate words (Table 1). There were 24 stimuli per condition, resulting in a total of 96 stimuli.
Following the standard procedure in the field of speech production, the task was divided into a familiarisation phase and an experimental phase, with a total duration of 30-40 min. The familiarisation phase consisted of three rounds. In each round, participants were exposed to all 96 stimuli pictures and were instructed to overtly name each picture in Spanish using an NP construction with the correct determiner and noun (e.g., [el brazo] "the arm"). During this phase, the experimenter provided oral feedback on the accuracy of the NP production by the participant whenever necessary. Specifically, the correct determiner or noun was provided in Spanish for cases where either the determiner or the noun, or both, were incorrectly produced by the participant. In the experimental phase, participants named the objects as fast and accurately as possible using a Spanish NP. Participants' EEG and voice were recorded exclusively during the experimental phase. A typical trial was initiated with the display of a fixation cross for 1000 ms, followed by the display of the picture for 2700 ms in the centre of the screen. Each picture was shown only once during the experimental phase, resulting in a total of 96 trials. Trial order was randomized. Participants were reminded throughout the experimental phase to name the object as fast and accurately as possible and to reduce all unnecessary movement. There was a short break after 50 trials to minimise participants' fatigue.

Behavioural data exclusion
Naming latencies and EEG data for one participant were lost due to a malfunctioning microphone and a subsequent failure during the EEG recording. Further two participants were excluded to match the datasets included in the EEG analysis (see section 3.4. for details). In total, we included 30 datasets in this analysis.

Behavioural data analysis
We used Praat (Broersma and Weenink, 2019) to calculate naming accuracy and naming latencies for each trial for the picture naming task. Next, we analysed our behavioural data in RStudio Version 1.3.959 (R Core Team, 2019). We employed a single-trial modelling approach using the lme4 package (Bates et al., 2020) to model our two behavioural outcome variables, naming accuracy and naming latencies. We modelled naming accuracy using generalized linear mixed models (GLMM) and the glmer() function with a binomial distribution. Next, we modelled positively skewed naming latencies using the glmer() function in combination with a gamma distribution and the identity link function. Only correct trials were included in our analysis of the naming latencies. For both outcome variables, we generated the most theoretically plausible maximal model on the basis of our hypotheses and our two main manipulations, gender congruency and cognate status. To preserve statistical power and to control for potential confounds, we added familiarisation phase performance as a co-variate to our statistical analysis, rather than excluding trials where errors were made before the experimental phase. Similarly, we added Lextale-Esp score, target noun gender, word length, Table 1 Sample set of stimuli for the picture naming task.

Cognate status cognate
order of acquisition of Spanish, acquisition of French and terminal phoneme as co-variates. Further, we included subject and item as random effects. To establish the model of best fit for the picture naming task, we followed a top-down model selection procedure by testing for the significance of each factor (Barr, 2013;Bates et al., 2018). In order to balance Type I error and power, random effects were chosen as maximal as possible while avoiding overfitting (Matuschek et al., 2017). In the case of non-convergence or singular fit, we simplified our model structure by removing first interactions for random slopes; second, correlations between random slopes; and finally, interactions between fixed effects. We used treatment coding as our contrast, which defaulted to congruent trials and cognates as the reference level. Absolute t-values greater than 1.96 were interpreted as statistically significant at α = 0.05 (Alday et al., 2017). Next, we performed model comparisons using the anova() function based on the Akaike's Information Criterion, AIC (Akaike, 1974), the Bayesian Information Criterion, BIC (Neath and Cavanaugh, 2012) and the log-likelihood ratio. To perform model diagnostics, we checked the model fit by plotting the model residuals against the predicted values.

Lextale-Esp
The mean Lextale-Esp score was M = 18.45 (SD = 20.52). Scores were highly variable and ranged between − 23 and 60, with 100 being the maximum score. Vocabulary scores of 60-80 on this task were previously associated with C1-C2 proficiency levels (Lemhöfer and Broersma, 2012), therefore all of our speakers fell below the B2 proficiency range.

Picture naming task
We first calculated descriptive statistics for naming accuracy and naming latencies for each condition (Table 2).

Naming accuracy
For naming accuracy, our model of best fit included main effects for gender congruency and cognate status, with subject and item as random effects. Moreover, the co-variates Lextale-Esp score and familiarisation phase performance were included in the final model (Appendix C). The remaining co-variates target noun gender, word length, order of acquisition, acquisition of French and terminal phoneme resulted in non-convergence or singular fit and were therefore excluded from the model fitting procedure. As predicted, participants were marginally more accurate for congruent trials compared to incongruent trials with β = − 0.452, SE = 0.232, z = − 1.95., p = 0.052. Despite being included in the model of best fit, cognates were not significantly different from non-cognates with β = − 0.279, SE = 0.233, z = − 1.19, p = 0.232 (Fig. 2).

Naming latencies
For naming latencies, our model of best fit included a main effect for gender congruency as well as a random effect for subject and item and a bysubject random slope for gender congruency. Cognate status did not significantly improve the model fit and was dropped from the model fitting procedure. Further, familiarisation phase performance was included as a co-variate (Appendix D). Lextale-Esp score, target noun gender, word length, order of acquisition, acquisition of French and terminal phoneme resulted in non-convergence or singular fit and were not included in the subsequent model fitting procedure. Participants were significantly faster in naming congruent items compared to incongruent items with β = 0.059, SE = 0.028, z = 2.04, p = 0.041 (Fig. 2).

EEG data exclusion
The EEG data from the same participant where we lost the voice recordings was also lost during the EEG data acquisition process. Further, we determined a set of criteria to include data in the EEG analyses. First, we only included trials where the correct NP was produced. Second, only correct trials not contaminated by artefacts (valid trials) were analysed. Finally, we set the inclusion threshold for correct and valid trials at 60%. As a result, two additional data sets were excluded due to excess artefact contamination. See Appendix E for rejection rates by condition.

EEG data analysis
Articulatory artefacts pose a challenge when examining EEG data from word production tasks since they may contaminate the signal (Ganushchak et al., 2011a;Grözinger et al., 1975;Porcaro et al., 2015). Therefore, we applied a vigorous pre-processing procedure to separate the signal from artefacts using BrainVision Analyser 2.2. The pre-processing procedure included the following steps: visual inspection of the raw data, re-referencing from Cz to the average mastoid electrodes (TP9 and TP10) and reusing Cz as a data channel, filtering between 0.1 Hz and 30 Hz, linear derivation of the two HEOG electrodes to form a combined channel for horizontal eye movements, interpolation of noisy channels, ocular correction ICA using VEOG and HEOG parameters, and finally, artefact rejection. After pre-processing our EEG data, we added a unique voice onset (VO) marker to every correct trial to mark the articulation onset for each participant. We then generated segments around the picture onset markers and the VO markers for each participant from − 200 ms prior to picture onset to 1200 ms after picture onset. Following segmentation, we applied baseline correction using the 200 ms prior to picture onset until picture onset. A novelty of our statistical analysis was the implementation of single-trial linear mixed effects models (LMM) for our EEG data (Frömer et al., 2018). For this, we exported all available voltage samples from valid segments for statistical analysis in RStudio (R Core Team, 2019). In contrast to more traditional EEG analyses involving ANOVAs, the assumptions for single-trial LMM do not include equal number of observations for each participant or uniform effects for each participant. Instead, single-trial LMM capture by-subject and by-item variance and therefore have superior explanatory power over more traditional ANOVAs when modelling EEG data (Baayen et al., 2008;Fröber et al., 2017).
After exporting our EEG data, we performed a permutation test to tentatively explore the locus of the effect of gender congruency and cognate status (collapsed into the variable condition) on voltage amplitudes. We used the permutes package (Voeten, 2019) to calculate F-values across all electrodes and the entire available time window between − 200 ms and 1200 ms with respect to stimulus onset. Visual inspection of the outcome of the permutation test revealed potential modulatory effects of condition in centro-parietal areas between 350 ms and 600 ms post stimulus onset (Fig. 3). Previous literature on the distribution and time correlates of both the P300 and the N400 support this outcome (Barry et al., 2020;Koester and Schiller, 2008;Paolieri et al., 2020;Peeters et al., 2013;Polich, 2007;Roelofs et al., 2016). Due to increased articulatory artefacts in EEG data closer to the participant's articulatory onset, we only explored the EEG data up to a maximum of 600 ms post-stimulus onset (Porcaro et al., 2015).
On the basis of the outcomes of the permutation test and previous literature, we defined nine topographic areas for our data channels. Along the anterior-posterior axis, we defined anterior, central and

EEG results
Visual inspection of the voltage amplitudes for the selected channels revealed the characteristic P1/N2 complex for early visual processing   (Cheng et al., 2010;Eulitz et al., 2000;Misra et al., 2012;Schendan and Kutas, 2003). Further, visual inspection also revealed a positive-going wave between 350 ms and 600 ms, consistent with the topographic distribution of a P300 (Barry et al., 2020). Further, Fig. 4 shows a by-condition modulation of the EEG signal between 350 and 600 ms, as tentatively suggested in the permutation test (Fig. 3). Descriptively speaking, we saw the largest amplitudes for congruent cognates with M = 5.14 (SD = 9.29), followed by incongruent cognates with M = 5.05 (SD = 9.24), congruent non-cognates with M = 4.88 (SD = 8.94), and finally, incongruent non-cognates with M = 4.47 (SD = 9.04) in the 350 ms-600 ms time window. We found no indication for an N400 effect prior to the 600 ms, after which the signal becomes increasingly noisy due to the proximity to the articulatory onset. See Fig. 5 for a visualisation of the individual channels included in this analysis.
The model of best fit for voltage amplitudes included an interaction effect of gender congruency and cognate status. Further, hemisphere and familiarisation phase performance were included as co-variates (Appendix F). Lextale-Esp score, target noun gender, word length, order of acquisition, acquisition of French and terminal phoneme did not significantly improve the model fit or led to over-fitting. More specific to the co-variate of acquisition of French, the model comparison between the model with and without this particular co-variate yielded χ 2 (1, 30) = 0.018, p = 0.893.
We therefore dropped acquisition of French from the model selection procedure. In the model of best fit, item and subject emerged as random effects, with a by-subject random slope for the interaction effect of gender congruency and cognate status. Voltage amplitudes were more positive for congruent cognate nouns compared to incongruent non-cognates with β = − 0.684, SE = 0.342, t = − 2.002, p = 0.045. The difference in amplitude between the remaining conditions was not significant. 3

Discussion
The aim of this study was twofold: first, we examined CLI of the gender systems and phonological systems to obtain a better characterization of the time course of non-native NP production. Secondly, we explored which production stage is associated with the selection of the target language in a multilingual language production configuration.
We studied the gender congruency effect to highlight CLI of the gender systems during lexical retrieval. We predicted higher naming accuracy, shorter naming latencies, less positive P300 amplitudes and less negative N400 amplitudes for congruent compared to incongruent nouns. Critically, this would indicate that the target language was not selected prior to lexical retrieval. We also explored the cognate facilitation effect to illustrate CLI during phonological encoding and expected higher naming accuracy, shorter naming latencies, less positive P300 amplitudes and less negative N400 voltage amplitudes for cognates compared to non-cognates. The presence of a cognate facilitation effect would imply that the lexical entries from both the target and non-target language actively competed during phonological encoding, placing the locus of target language selection beyond lexical retrieval (Christoffels et al., 2007;Colomé, 2001;Hoshino and Kroll, 2008;Hoshino and Thierry, 2011;Peterson and Savoy, 1998;Pulvermüller et al., 2009;Rodriguez-Fornells et al., 2005).
In line with our predictions, we found that participants were significantly more accurate and faster at naming congruent nouns compared to incongruent nouns. These behavioural findings are important for two reasons: First, the presence of the gender congruency effect suggests CLI during gender processing. Second, the gender congruency effect also implies that the target language was not selected before lexical retrieval. Yet, results from the gender congruency effect alone cannot clarify whether CLI continued beyond gender processing. Therefore, we complemented these findings with results from the cognate facilitation effect. Despite a clear descriptive trend, we found no evidence for a cognate facilitation effect at the behavioural levelin contrast to previous research on the cognate status in non-native production (Acheson et al., 2012;Christoffels et al., 2007;Peeters et al., 2013). There are two possible interpretations of this outcome: first, our late language learners did not face CLI during phonological encoding. As a result, there were no detectable processing differences between cognates and non-cognates. Critically, this would imply that CLI may be resolved prior to phonological encoding and that only the lexical entry from the target language is phonologically encoded. The second interpretation is that the behavioural measures lacked the power to pick up on a fine-grained modulation based on cognate status. Our EEG data are able to discriminate between these two possible interpretations.
Despite clear evidence for a P300 effect, we did not find evidence for an N400 effect in the time window of interest. This is somewhat surprising given that previous research linked the N400 to language coactivation, and to the neural correlates of the gender congruency effect and the cognate facilitation effect (Chen et al., 2017;Paolieri et al., 2020). However, studies also showed a reduced or delayed N400 in speakers with lower proficiency levels (Heidlmayr et al., 2021;Midgley et al., 2009;Weber-Fox and Neville, 1996). Therefore, the N400 effect may have been absent, or delayed and masked by articulatory artefacts.
Regarding the P300, its topographic characteristics are in line with a P300 component in the time window between 350 ms and 600 ms, more specifically a P3b component (Barry et al., 2020;Hruby and Marsalek, 2003;Polich, 2007;Squires et al., 1975). The P300, in particular the P3b, has been linked to classical inhibitory tasks as well as inhibitory tasks combined with a linguistic task (Bosma and Pablos, 2020;Eriksen and Eriksen, 1974;Jiao et al., 2020;Soares Pereira et al., 2019). Critically, it was proposed to reflect general cognitive mechanisms such as inhibition, conflict resolution and cognitive interference, and more recently the recruitment and allocation of attentional resources and working memory load (Barker and Bialystok, 2019;González Alonso et al., 2020;Neuhaus et al., 2010a,b;Polich, 2007Polich, , 2012Wu and Thierry, 2013). In order to successfully produce the correct NP in the target language, speakers not only had to go through the multi-stage process of language production, but had to simultaneously mitigate CLI effects between the target and non-target language. Here, we argue that the P300 directly taps into this latter notion and that it provides an index for this ongoing conflict between the target and the non-target language. Our EEG data revealed a small, but robust by-condition modulation of P300 voltage amplitudes, reflected in the interaction effect of gender congruency and cognate status. As predicted, P300 amplitudes were significantly different for trials with high processing costs and a larger involvement of the inhibitory control system, i.e., incongruent non-cognate trials compared to congruent cognate trials. Therefore, our results suggest quantitatively different neural patterns for producing NPs subject to differential processing costs and inhibitory demands. More importantly, this modulation of P300 amplitudes appeared to last until 600 ms post-stimulus onset (and possibly beyond). This notion has direct implications for the time course of non-native production because it provides a clear time frame for the cognitive mechanisms underlying the mitigation of CLI.
Our EEG results were indicative of the following: first, our speakers faced CLI both during the processing of gender and cognate status. 3 As per suggestion of a reviewer, we also explored left anterior negativity (LAN) effects as a function of condition. Based on previous literature (Barber and Carreiras, 2005;Friederici et al., 1999;Hahne and Friederici, 1999;Steinhauer et al., 2009;Valente et al., 2016;Weber-Fox and Neville, 1996), we determined channels Fp1, F3, F7, FC3 and FT7 in left anterior regions as our ROI, and the time-window of interest between 300 ms and 500 ms post stimulus onset. The data show a negative-going wave peaking at around 450 ms post-stimulus onset, consistent with the topography of a delayed LAN. However, there seemed to be little difference between the conditions in terms of LAN voltage amplitudes. This was confirmed in our statistical analysis: The model that contained condition as fixed effect was not significantly better than the model that did not contain condition (χ 2 (3, 30) = 0.072, p = 0.995). We therefore found no evidence for a by-condition modulation of the LAN in this particular study.
Secondly, in line with the findings by Hoshino and Thierry (2011) on the locus of target language selection, CLI appears to continue beyond gender processing until at least phonological encoding in late language learners. This finding favours the interpretation that target language selection takes place after gender processing. To the best of our knowledge, this is the first study to report a P300 effect during overt non-native NP production in a paradigm that was not explicitly about inhibitory control, but instead included an implicit inhibitory control component. Similar tentative EEG results are reported by González-Alonso et al. (2020) within the framework of third language acquisition of artificial mini-grammars.
An interesting feature of the P300 effect was the elicitation of more positive amplitudes for congruent and cognate nouns compared to incongruent and non-cognate nouns. This is in contrast to our original hypothesis, where we predicted less positive amplitudes for congruent and cognate nouns compared to incongruent and non-cognate. Notably, this particular pattern of behavioural and EEG results has been previously reported in the literature in connection to the cognate facilitation effect (Acheson et al., 2012;Christoffels et al., 2007). For example, Acheson et al. (2012) used an overt picture naming task with unbalanced German-Dutch speakers to study conflict monitoring during bilingual language production with respect to the Error-Related Negativity (ERN). They found faster naming latencies for cognates compared to non-cognates. However, this was linked to more negative amplitudes for cognates from about 150 ms post-stimulus onset at the FCz electrode. Furthermore, Jiao et al. (2020) measured EEG during a picture naming task combined with a flanker task in unbalanced Chinese-English bilinguals. Their ERP results showed more positive P300 amplitudes for congruent compared to incongruent flankers in centro-parietal regions, while response times for congruent flanker trials were shorter compared to incongruent flanker trials (see also Bosma and Pablos, 2020). These results mirror those from our study. On the other hand, studies have also supported the more traditional notion of faster response times or shorter naming latencies in combination with smaller ERP amplitudes for cognates compared to non-cognates (Comesaña et al., 2012;Peeters et al., 2013;Strijkers et al., 2010;Xiong et al., 2020) and smaller P300 amplitudes for congruent trials in the flanker task (Wu and Thierry, 2013). Therefore, our behavioural and EEG patterns are by no means unusual. Instead, they suggest a clear involvement of inhibitory control and acutely reflect the critical processes linked to successful non-native NP production. We propose that this study highlights the significance of the P300 as an index for the cognitive processes underlying the mitigation of CLI and the selection of the target language. Nevertheless, given the scarcity in terms of research, the directionality of the P300 effect elicited during non-native NP production warrants closer inspection in the future.
Taken together, we found traceable effects of CLI both at the behavioural and at the neural level, establishing CLI as a significant modulator of the time course of non-native NP production. This has implications for the time course of the production processes in nonnative NP production, as reflected in naming accuracy, naming latencies and ERP patterns with respect to gender processing and phonological processing. CLI acts both as a facilitator and a hindrance during the production process: on one hand, there is a processing advantage for congruent and cognate nouns. On the other hand, this appears to be less the case for incongruent and non-cognate nouns. Moreover, our findings suggest that late language learners not only face CLI during early production stages and lexical retrieval, but possibly also during later phonological processing stages of phonological encoding.
In turn, this implies that lexical entries from both the target and nontarget language are selected for phonological processing, thereby shifting the locus of target language selection until phonological encoding or after it. Arguably, this highlights the complexity of non-native production processes compared to the production process in native-like speakers. Given the design of our study, we cannot exclude the possibility that our speakers resolve CLI between target and non-target language at even later production stages, e.g., the phonetic encoding stage. Yet, our findings have important implications for characterising the theoretical and neural underpinnings of the time course of non-native production processes, in particular for speakers with intermediate proficiency levels. Further, our findings add novel evidence to the debate about the locus of target language selection in late language learners.

Conclusion
In this study, we found traceable CLI effects at the behavioural and the neural level. More specifically, speakers faced CLI during gender processing and during phonological processing, which in turn impacted the time course of non-native production. In terms of the locus of target language selection, our findings suggested that the target and non-target language remained active at least until phonological encoding. Our findings have important theoretical implications for the conceptualisation of non-native production mechanisms, and warrant further exploration with regard to subsequent production stages and the exact involvement of inhibitory control in non-native NP production. Finally, we argue that there should be an increased focus on both the P300 component as an index of CLI and lower proficiency levels in studies on non-native NP production.

Citation diversity statement
To shed light on the systematic underrepresentation of work by female scientists and scientists identifying as members of a minority compared to the papers published in the field, we included a Citation diversity statement (Dworkin et al., 2020;Rust and Mehrpour, 2020;Torres et al., 2020;Zurn et al., 2020). For this, we classified the first and last author in each paper from our reference list based on their preferred gender (wherever information was available).
Our reference list consisted of 26% woman/woman authors, 38% man/man, 21% woman/man and finally, 13% man/woman authors. This compares to 6.7% for woman/woman, 58.4% for man/man, 25.5% woman/man, and lastly, 9.4% for man/woman authored references for the field of neuroscience (Dworkin et al., 2020). A clear limitation of this classification is the rather broad binary woman/man distinction. However, we have full confidence that the classification system will vastly improve in the future with the routine addition of the preferred gender to personal and academic websites.

Funding statement
This project has received funding from the European Union's Hori-zon2020 research and innovation programme under the Marie Skłodowska Curie grant agreement No 765556 -The Multilingual Mind.

Appendix A
Overview of the native and non-native languages acquired by the participants of the current study (N = 33) according to the LEAP-Q (Marian et al., 2007).

Appendix B
Electrode positions following a 10/20 montage. Electrodes included in the analysis are highlighted in purple.
Model of best fit for Naming Accuracy, including estimated means, confidence intervals errors and z-values.