Prediction during simultaneous interpreting: Evidence from the visual-world paradigm

We report the results of an eye-tracking study which used the Visual World Paradigm (VWP) to investigate the time-course of prediction during a simultaneous interpreting task. Twenty-four L1 French professional conference interpreters and twenty-four L1 French professional translators untrained in simultaneous interpretation listened to sentences in English and interpreted them simultaneously into French while looking at a visual scene. Sentences contained a highly predictable word (e.g., The dentist asked the man to open his mouth a little wider). The visual scene comprised four objects, one of which depicted either the target object (mouth; bouche), an English phonological competitor (mouse; souris), a French phonological competitor (cork; bouchon), or an unrelated word (bone; os). We considered 1) whether interpreters and translators predict upcoming nouns during a simultaneous interpreting task, 2) whether interpreters and translators predict the form of these nouns in English and in French and 3) whether interpreters and translators manifest different predictive behaviour. Our results suggest that both interpreters and translators predict upcoming nouns, but neither group predicts the word-form of these nouns. In addition, we did not find significant differences between patterns of prediction in interpreters and translators. Thus, evidence from the visual-world paradigm shows that prediction takes place in simultaneous interpreting, regardless of training and experience. However, we were unable to establish whether word-form was predicted.


Introduction
There is strong evidence that comprehenders often predict what they are about to hear (Huettig, 2015;Kuperberg & Jaeger, 2016;Pickering & Gambi, 2018). Similarly, most theoretical accounts assume that simultaneous interpreters regularly predict what they are about to hear as they work (Gerver, Longley, Long, & Lambert, 1984;Moser-Mercer, Frauenfelder, Casado, & Künzli, 2000;Setton, 2005). Although several studies have identified instances when interpreters produce a translation (in the target language) before they hear the complete source utterance (Seeber, 2001;Van Besien, 1999;Wilss, 1978), no study has measured predictive processing in trained interpreters and untrained bilinguals online during a simultaneous interpreting task.
In this study, we investigated the time course of prediction in simultaneous interpreting by tracking the eye movements of professional conference interpreters and professional translators untrained in simultaneous interpreting as they looked at a visual scene and simultaneously interpreted English sentences into French. We hypothesized that, despite the challenging conditions created by the concurrent production task, both interpreters and translators would predict upcoming meaning. However, it was also plausible that neither group would engage in prediction because of the challenging listening conditions, or that simultaneous interpreters might engage in prediction whereas translators might not.

Prediction in a native language
Much evidence indicates that comprehenders predict semantic, syntactic and phonological aspects of upcoming words in a native language (Pickering & Gambi, 2018). Evidence for semantic prediction comes from Altmann and Kamide (1999), who presented participants with visual scenes showing an agent and four objects, such as a boy sitting on the floor of his room with a cake, a train set, a toy car and a balloon. Participants heard sentences containing a verb which was compatible with either one or all four of the objects serving as its patient, such as "The boy will eat the…" or "The boy will move the…". When participants heard the verb "eat" they began looking at the cake before noun onset, but when they heard the verb "move" they did not, indicating that they used information from the verb to predict the meaning of the upcoming noun. Semantic prediction is supported by findings from similar eye-tracking studies using the visual world paradigm (Mani & Huettig, 2012;Rommers, Meyer, Praamstra, & Huettig, 2013) and other studies that do not involve a visual context (Grisoni, McCormick Miller, & Pulvermüller, 2017).
There is evidence that comprehenders predict syntactic and phonological aspects of upcoming words, although this evidence is not fully consistent. In an ERP study, Otten, Nieuwland, and Van Berkum (2007) had participants listen to short texts, half of which ended with a highly predictable noun such as "cross", rather than a less predictable noun, such as "crucifix", in the (Dutch translation of) the sentence "My grandfather and grandmother are very religious. Above the head of their bed hangs a big…". They found a negative deflection on the adjective that agreed in gender with the less predictable noun (starting 300 ms after adjective onset), indicating that participants predicted the syntax of the predictable noun. In another study, in which participants read short texts similar to those used in Otten et al. (2007), Otten and Van Berkum (2008) found a negative ERP effect, but this appeared later relative to adjective onset. In contrast, Wicha, Moreno, and Kutas (2004) found a positive ERP effect on articles that did not agree in gender with a predictable noun in Spanish, and Van Berkum, Brown, Zwitserlood, Kooijman, and Hagoort (2005) also found an early positivity on adjectives that did not agree in gender with a predictable noun. However, in a larger-scale study which was a close replication of Van Berkum et al. (2005), Nieuwland, Arkhipova, and Rodríguez-Gómez (2020) failed to replicate these adjective effects, and in fact found weak evidence of a negative effect. It is possible that syntactic predictions just include the gender of the upcoming noun or that people also predict a specific article (see Fleur, Flecken, Rommers, & Nieuwland, 2020). There is also evidence that comprehenders predict the form of an upcoming word (DeLong, Urbach, & Kutas, 2005;Ito, Corley, Pickering, Martin, & Nieuwland, 2016;Ito, Gambi, Pickering, Fuellenbach, & Husband, 2020;Ito, Pickering, & Corley, 2018;Laszlo & Federmeier, 2009), although not all studies have found such effects (Nieuwland et al., 2018).

Prediction in adverse conditions
The above studies provide evidence, most convincingly from the visual world paradigm, for prediction in a native language in quiet laboratory conditions. But a simultaneous interpreting task means listening in adverse listening conditions. Firstly, during an interpreting task, most interpreters listen in a non-native language, in which they are less proficient than their mother tongue. Secondly, the speech signal that they receive is imperfect, because they produce utterances while listening, which means they must comprehend noisy speech. Thirdly, engaging the production mechanism during comprehension may, in itself, limit prediction. Despite its name, simultaneous interpreting is not fully simultaneous, and production is almost always time-delayed compared to comprehension. This means that the interpreter often produces the same utterance at a lag (something which may be akin to producing an unrelated utterance) and so the production mechanism may not assist in making an appropriate prediction. (On the other hand, when the interpreter produces an utterance whose timing is closely synchronized with the source utterance, the production mechanism might facilitate prediction.) Finally, interpreters comprehend incoming speech under increased cognitive load, because they must remember, reformulate, and produce the incoming message while listening. Therefore, prediction during simultaneous interpreting may be limited or impeded.
Indeed, there is evidence that prediction in L2, prediction in noise, prediction with concurrent (unrelated) production and prediction under cognitive load are all more limited than prediction in L1 in ideal conditions. First, Martin et al. (2013) compared prediction of semantic and phonological content among native English speakers and late Spanish-English bilinguals as they read highly predictable sentences in English. Following DeLong et al. (2005), sentences contained a more or less predictable noun which was either vowel or consonant initial (e.g., "He was very tired so he sat on a chair/an armchair.") Both groups showed a reduced N400 effect on the more predictable noun, but only the native English speakers also showed a reduced N400 effect on the article that preceded the more predictable noun. (Ito, Martin, & Nieuwland, 2017b point out that Martin et al., 2013 use an atypical reference channel, so these results may not generalise well.) More recently, Ito et al. (2018) compared the time-course of prediction in L1 and L2 speakers of English, and found that L2 speakers' predictive eye movements were delayed in comparison to those of L1 speakers, and L2 speakers did not make predictive fixations on phonological competitors. These studies suggest that L2 speakers may not predict word-form as L1 speakers do (see also Ito, Martin, & Nieuwland, 2017a).
Syntactic predictions also may not take place in L2 as they do in L1. For instance, Mitsugi and Macwhinney (2015) showed that intermediate L2 speakers of Japanese did not use case-marking information predictively (although this could have been because of the particularly complex Japanese case-marking system). This in turn suggests that some aspects of predictive processing are not automatic, and so the occurrence of prediction is linked to the time and resources available (Ito & Pickering, 2021).
Semantic prediction, on the other hand, may be less affected by nonnativeness. In a visual-world experiment based on Altmann and Kamide (1999), Dijkgraaf, Hartsuiker, and Duyck (2017) had participants listen to sentences which were either constraining (e.g., "Mary knits a scarf") or not (e.g., "Mary loses a scarf") and look at scenes depicting four objects, of which only one could be knitted, but all could be lost. Participants were one group of Dutch-English bilinguals, who listened in both Dutch and English, and one group of English monolinguals who listened in English. They found that both groups made predictive eye movements, and that the effect of condition (constraining vs. nonconstraining) was similar for L1 listening in English and in Dutch, as well as for Dutch participants listening in their L2. This might be because the semantic level is shared between languages, and so participants predict the same meaning. Now let us consider prediction in noisy conditions. In a Bayesian account, word predictability may have a greater influence on word recognition than bottom-up input when the speech signal is less clear (Norris & McQueen, 2008). There is also evidence that increased attention when listening to noisy speech leads to greater reliance on topdown processing, which may include prediction (Wild et al., 2012). However, listening to noisy speech in an L2 may not affect prediction in the same way. Mayo, Florentine, and Buus (1997) found that even when L2 speakers showed native-like comprehension in quiet conditions, their comprehension was more degraded in noise than that of L1 speakers. Meanwhile, Mattys, Carroll, Li, and Chan (2010) found that L2 speakers relied more on acoustic cues than top-down strategies when listening to noisy speech.
Producing speech that is unrelated to what is being comprehended may also limit prediction. Martin, Branzi, and Bar (2018) had participants read sentences while either engaging in a concurrent task (tongue tapping), listening task (listening to /ta/), or production task (producing /ta/). Participants in the concurrent production condition predicted less than those in the other two groups, suggesting that occupying the production mechanism limits prediction. On the other hand, comprehending and producing closely related utterances at the same timethat is, utterances in different languages but with (ideally) similar meaningmay support, rather than impede, predictive processing. By some accounts, prediction takes place using the production mechanism (see Pickering & Gambi, 2018 for a review), and there is evidence suggesting a link between prediction and production (Adank, 2012;Drake & Corley, 2015;Hintz, Meyer, & Huettig, 2017;Mani & Huettig, 2012). Where comprehenders engage their production mechanism by concurrently producing the same content, this may trigger predictive processing. The production mechanism might be engaged in this way when comprehension and production are very synchronized.
There is evidence that cognitive load limits prediction. In a study similar to Dijkgraaf et al. (2017), Ito, Corley, and Pickering (2017) had L1 and L2 participants listen to constraining or non-constraining sentences while they either did or did not have to remember five unrelated words. For both groups, the additional load generated by memorizing unrelated words led to delayed predictive eye movements, suggesting that prediction requires cognitive resources in both native and nonnative speakers. In addition, Huettig and Janse (2016) found that individual differences in working memory affected predictive processing, with participants with better working memory making more predictive eye movements (although see also Otten & Van Berkum, 2009 who found that predictive processing was similar in participants with both high and low working memory capacity).
Some factors may mitigate the adverse effects of comprehending in L2 and cognitive load, and thus support prediction during simultaneous interpreting. For instance, highly proficient users of an L2 may predict more similarly to L1 speakers (Kaan, 2014). Chambers and Cooke (2009) had native English speakers listen to sentences in their L2, French, as they looked at four objects, one of which was the referent of a target word (e.g., poule [chicken]), and one of which was the referent of an English interlingual homophone (e.g., pool) in the sentence "Marie va nourrir/décrire la poule" [Marie will feed/describe the chicken]. When participants heard the constraining verb (feed), they looked at the target object (chicken), and rarely considered the interlingual homophone (pool), showing that they had limited their expectations to the predictable noun before word onset. Proficiency in French and predictive fixations were positively correlated.
In addition, certain non-language aspects of the simultaneous interpreting setting may also support prediction. Specifically, the referents of utterances are often visually present in a simultaneous interpreting context. For example, during conference interpreting, interpreters almost always see the speaker, and may also view Power-Point presentations or other documents or images while simultaneously interpreting.

Prediction in simultaneous interpreting
We have reviewed evidence showing that prediction may, but need not always, occur at all linguistic levels (see Pickering & Gambi, 2018), that L2 speakers may predict more slowly than L1 speakers (Ito et al., 2018), that cognitive resources are needed for prediction (Ito, Corley, et al., 2017), and that listening in L2 in noisy conditions may increase reliance on bottom-up processing strategies (Mattys et al., 2010). In addition, there is evidence that concurrent production (of irrelevant speech) may impede prediction (Martin et al., 2018). Although these adverse comprehension conditions may be somewhat mitigated by high L2 proficiency, similarities between L1 and L2, the presence of visual referents, and nearly synchronized concurrent production, the evidence provides reasons to expect that prediction might be impaired in simultaneous interpreting.
In spite of this, most accounts of simultaneous interpreting assume a key role for prediction (Gerver et al., 1984;Moser, 1978;Moser-Mercer et al., 2000;Seeber, 2001;Seleskovitch, 1984;Setton, 2005). It is included as a processing stage in one of the earliest process models of simultaneous interpreting (Moser, 1978). Setton (2005) suggested that an ability to predict is a prerequisite for success in simultaneous interpreting and Chernov (2004) even proposed that being able to anticipate how a message will develop is what makes simultaneous interpreting possible. Indeed, prediction may allow interpreters to ignore parts of the input and focus entirely on production or memorizing (De Groot, 2011). Prediction has been described as both a skill (Moser-Mercer, 2000) and a strategy (Seeber, 2001;Setton, 2001;Van Besien, 1999) used by simultaneous interpreters. This implies that interpreters either have or develop (implicitly or explicitly) a special ability to predict during the task of simultaneous interpreting that other groups may not have. In other words, theories from the Interpreting Studies literature posit that trained interpreters alone may use predictive cues during interpretation (Frauenfelder & Schriefers, 1997), and that both training and experience may be necessary to engage (strategically) in prediction during simultaneous interpreting (e.g., Moser, 1978).
Evidence that interpreters predict during interpreting comes from the observation that they sometimes produce an utterance in the target language before hearing it in the source language (Hodzik & Williams, 2017;Seeber, 2001;Van Besien, 1999;Wilss, 1978). However, these studies tend to be based on theories according to which interpreters alone (are able to) predict during a simultaneous interpreting task (Seeber, 2001;Van Besien, 1999;Wilss, 1978), and the question of whether untrained bilinguals predict during a simultaneous interpreting task has not been extensively investigated. Only Hodzik and Williams (2017) considered whether interpreters have a shorter lag compared to untrained bilinguals when interpreting predictable content. They found no significant difference between groups, although note that their interpreters were mainly students of interpreting, rather than trained professionals.
To our knowledge, no study has used the visual-world paradigm to track the time course of prediction in interpreters and bilinguals untrained in interpreting during an interpreting task. Use of the visual world paradigm makes it possible for us to study the time-course of prediction while participants are engaged in a concurrent production task, and thus investigate whether, during an interpreting task, prediction is specific to interpreters or generalises to other bilingual populations. In the visual-world paradigm, the experimenter infers prediction from looks to pictures (or objects) before they are mentioned (e.g., Altmann & Kamide, 1999;see Pickering & Gambi, 2018, for discussion). It is true that the visual referents in an interpreting context are unlike such pictures (interpreters may see speakers, as well as their gestures and facial expressions, a presentation, and objects such as name plates and the rostrum rather than four images on a screen). However, a visual context is regularly present in a conference interpreting context (Seeber, 2017) and may aid comprehension, just as the visual scenes presented in the visual-world paradigm may.
A related question is whether interpreters and bilinguals untrained in simultaneous interpreting are able to make word-form predictions during an interpreting task. For example, is it possible to predict the phonological form of an upcoming word as well as its meaning during a simultaneous interpreting task? Such predictions may be particularly advantageous in simultaneous interpreting, as they would allow interpreters to plan their own upcoming utterance with precision (Amos & Pickering, 2020). Word-form prediction would be compatible with an account by which comprehenders use their production mechanism to predict by working through the same stages, in the same order, as during productionfrom meaning, to syntax, to sound, but without articulating (Pickering & Gambi, 2018). If so, we would expect people to make predictions primarily in the language in which they are comprehending (the source language) during an interpreting task. However, given that both languages are strongly activated in simultaneous interpreting, predictions may also be formed in the target language. Alternatively, the concurrent activation of both languages, and the regular switching of focus from comprehension to production and back between the two languages, may lead to a weaker activation of each language. This might lead to an apparent lack of word-form prediction.

The current study
We designed a study to test whether simultaneous interpreters and translators untrained in simultaneous interpreting make predictions during simultaneous interpreting, and to shed light on how specific these predictions might be in both groups. We chose to test both simultaneous interpreters and professional translators, so that we could test two groups who are similar in terms of language proficiency and age, and are used to working with two languages at the same time (meaning that both groups are likely able to carry out the task of simultaneous interpreting). Testing both a group of interpreters and a group of translators allows us to see whether prediction during simultaneous interpreting is possible only for trained interpreters, or whether untrained bilinguals also engage in prediction during a simultaneous interpreting task.
Native French-speaking participants listened to highly constraining sentences in English and simultaneously interpreted these sentences into French. Based on Ito et al. (2018), they viewed a visual scene containing three distractors (which were pictures of unrelated objects) and a picture of a critical object, whose name was one of the target word (e.g., mouth; bouche), a word phonologically related to the English form of the target word (English phonological competitor; e.g., mouse; souris), a word phonologically related to the French form of the target word (French phonological competitor; e.g., bouchon; cork), or an unrelated word (e. g., bone; os).
The timing of fixations in the visual array is linked to underlying comprehension processes (Tanenhaus, Magnuson, Dahan, & Chambers, 2000). We therefore measured the timing of fixations on the critical objects. If participants predict while simultaneously interpreting, they should fixate target objects more than unrelated objects. Such predictive looks would demonstrate that L2 listeners predict upcoming utterances while simultaneously interpreting. If participants fixate English phonological competitor objects more than unrelated objects, this would demonstrate that L2 listeners pre-activate phonological information in their L2. If they fixate French phonological competitor objects more than unrelated objects, this would demonstrate that listeners engage in crosslinguistic prediction. If the simultaneous interpreter group predicts more or earlier than the translator group, this would demonstrate that interpreters engage in more or earlier prediction during simultaneous interpreting than translators.

Participants
Twenty-five conference interpreters working in Geneva, whose A language was French and whose B or C language was English, participated in the experiment. 1 One participant was excluded from the analysis because they almost never (less than 3% of the time) fixated the depicted objects (on experimental items and filler items). All participants had normal or corrected-to-normal vision and reported no language disorders. Most participants (n = 20) were members of the International Association of Conference Interpreters (AIIC), which promotes professional ethics, standards and conditions in conference interpreting, were accredited to international organisations such as the UN, or were both AIIC members and accredited. The four remaining participants were professional interpreters working in Geneva.
Twenty-five translators working in Geneva, who translated into French from languages including English, participated in the experiment. Three of the participants were trained translators who no longer worked as translators but in related fields (e.g., management). One additional participant was tested but the results were excluded at random because only 24 participants were necessary to match the interpreter condition. All participants had normal or corrected-tonormal vision and reported no language disorders.
We determined sample size from Ito et al. (2018), which used items (sentences and pictures) with the same characteristics and experimental structure as our experiment. There is no comparable work using interpreters (or translators) and so we did not have any data on which to conduct an appropriate a priori power analysis. In addition, the population of French-to-English professional interpreters is extremely limited, especially given that we wished to recruit them from the wellestablished community in Geneva (a community with which we are highly familiar). For instance, there are currently 87 such interpreters in Geneva who are members of the International Association of Conference Interpreters (AIIC). As recruitment of such busy professionals is challenging, we decided to recruit the same number of participants per group as in Ito et al. (2018).
The groups were matched for factors pertaining to background and language background (see Table 1). All participants completed a language background questionnaire, based on the Leap-Q questionnaire (Marian, Blumenfeld, & Kaushanskaya, 2007). Unlike in the original questionnaire, participants were asked to provide language background information only on French and English (rather than on all of their languages), and used a five-point instead of a ten-point scale for the selfproficiency ratings. Participants also provided information about their professional background.
The two groups did not differ in age of acquisition, age of fluency, time living in an area where French was spoken, and time living in an area where English was spoken. However, the translator group had a significantly higher overall current exposure to French (see Table 1). Participants also rated their language proficiency for speaking, reading, and listening, in French and English, on a five-point Likert-type scale (from 1 = "very low" to 5 = "very high"). With the exception of one 1 A language classification system unique to the interpreting profession is used to describe the languages in an interpreter's combination. The A-language is the language in which the interpreter is most proficient, and is a target language into which the interpreter works from any of the languages in his or her combination. The C-language is a source language from which the interpreter works. The B-language is a source language and target language in which the interpreter is perfectly fluent, but nonetheless less proficient than the Alanguage. (AIIC, 2019) translator, all participants rated their French proficiency as 5 for all areas of ability. There were no between-group differences in self-rated reading or writing proficiency in English. However, interpreters rated their listening ability in English as higher than translators (see Table 1). It is likely that this difference in perceived listening proficiency is linked to the different professional profiles of the two groups, given that listening in English is a key component of the interpreting process.

Stimuli
Experimental stimuli consisted of 32 English sentences, each paired with a visual array containing three distractor objects and one of four critical objects (see Appendix 1). Critical objects appeared in each of the four quadrants equally frequently following a Latin-square design. The experimental sentences each contained a highly predictable word (e.g., mouth, in "The dentist asked the man to open his mouth a little wider.") at varied positions in the sentence (range = 5th -20th word, M = 10.8, SD = 3.18), but never sentence finally. The sentences consisted of a mean of 15.4 words (range = 10-21, SD = 2.85). Stimuli sentences were based on Ito et al. (2018) and Block and Baldwin (2010), or else were designed by the authors.
There were an additional 32 filler sentences. These sentences were designed to not be constraining for any particular word. These sentences were paired with the same visual scenes as the experimental sentences, but the quadrants in which the objects appeared were varied. Filler sentences mentioned distractor objects 75% of the time, so together with the experimental sentences, which mentioned a critical object present 25% of the time (i.e., in the target condition), the sentences mentioned one of the objects in the visual scene 50% of the time.
The sentences were recorded at a sampling rate of 48 kHz in a soundproof recording studio by a male native Southern British English speaker. The speaker read the experimental sentences at a rate of 2.01 syllables per second (SD = 0.27). The mean sentence duration was 9.87 s (SD = 1.63). The mean onset time of the critical word was 6.28 s (SD = 1.94) (see Table 2).
The predictability of the target words was assessed using a cloze probability test (online, via LimeSurvey). First, 20 native speakers of French who were proficient in English read 40 English sentences that we judged could be completed with a predictable word, and completed each sentence with the first word that came to mind. Then, 14 different native speakers of French were asked to complete the same English sentences with a word in French. This was important in ensuring that there was one clearly predictable word in both languages. We were left with 27 sentences that were high cloze for both the English and the French word. We then constructed five further sentences. Twelve participants completed these sentences with a word in English, and 10 participants with a word in French. The final mean cloze probabilities were 90.7% (SD = 11.9, range = 60-100%) for the English word, and 91.7% (SD = 10.9, range = 57-100%) for the French word (see Table 2).
Each of the visual scenes contained four objects: a critical object and three distractor objects. In the target condition, the critical object corresponded to the predictable word (e.g., mouth [French: bouche]). In the English competitor condition, the English name of the critical object phonologically overlapped at onset with the predictable word (e.g., mouse [souris]). In the French competitor condition, the French name of the critical object phonologically overlapped at onset with the French translation of the predictable word (e.g., cork [bouchon]). The mean number of phonemes shared between the predictable words and English competitor words was 2.2 (SD 0.55) out of a mean of 3.6 phonemes (61%). The mean number of phonemes shared between French translations of predictable words and French competitor words was 2.1 (SD 0.53) out of a mean of 4.2 phonemes (50%). 2 English and French names of the predictable objects were unrelated to each other phonologically. The translation of the English phonological competitor was unrelated to the French translation of the predictable word, and the translation of the French phonological competitor was unrelated to the English translation of the predictable word. In the unrelated condition, the name of the critical object did not have phonological onset overlap with either the English or French name of the object.
We conducted an online picture naming test to assess naming agreement for the depicted objects in English and in French. French native speakers who were proficient in English and who did not participate in either the eye-tracking experiment or the cloze probability test looked at pictures of objects and gave the first name that came to mind when they looked at the picture. We did not discriminate between correct and incorrect spellings of the same word (e.g., stappler/stapler) or related words sharing the same meaning and phonological onset (e.g., mike/microphone). In selecting the pictures used for the phonological competitors, we took into account naming agreement ratings only in the relevant language. Some of the items were changed and re-tested and each of the objects in the final stimuli set was named in French and in English by at least 12 participants. Naming agreement for experimental objects was 88.4% in English (SD: 12.7, range 50% -100%) and 94.4% in French (SD: 9.6, range 50% -100%) (see Table 3 for a breakdown of these results).
The study comprised 32 experimental and 32 filler sentences (a total of 64 items). There were 32 arrays containing four images. Each array was shown twice, once with an experimental item and once with a filler item. Each experimental list contained two half lists, each made up of the 32 visual arrays paired with 16 experimental and 16 filler sentences. Visual arrays paired with experimental items in one half-list were paired with fillers in the other half-list, and vice versa. Experimental images were counterbalanced in the full lists, resulting in 4 different sets of items, and 8 experimental lists in total. Some of the experimental sentences included a word that was also a critical object in another sentence. In these cases, the sentence in which the word corresponded to a critical object was always played before the sentence simply mentioning the name of the object.  2 For three of the French competitor words, only one phoneme overlapped with the target word, but the words were closely related in orthography (aile/ aimant, lit/lion and nuage/nuque). This was also the case for one English competitor word (bee/beard).

Procedure
Before the experiment began, participants read and signed an informed consent form approved by the Ethics Committee of the Faculty of Translation and Interpreting at the University of Geneva. The experiment then started with a picture familiarization task. Participants saw all objects appearing in the experiment in an automatically generated randomized order. The objects were shown on the screen one at a time above a caption showing their English and French name. At the same time, participants heard the English and French name for the object through their headphones. After that, they were asked to name each object using the words that had been provided. The order of language presentation was counterbalanced so that half of the participants heard and saw the English word followed by the French word, and the other half saw the reverse. Participants were instructed to look at and listen to the names given to the objects, so that they could name the objects using the same words later.
Objects were considered as correctly named if both the English and French word were correctly repeated by the participant. Incorrectly named objects (3.1% Interpreters, 4.4% Translators) were repeated, and the experimenter prompted participants who did not provide the correct name for the object on second viewing.
In the eye-tracking experiment, participants were seated in front of a computer screen, at a distance of approximately 60 cm, in the experimental laboratory (LaborInt) of the Interpreting Department of the University of Geneva. The computer was set up inside a portable ISO4043-compliant interpretation booth. The participant's dominant eye 3 was tracked on an SR Research EyeLink® 1000 remote desktopmounted eye-tracker. Participants were asked to listen to, and simultaneously interpret, the sentences into French, and subsequently to judge whether the sentence had mentioned any of the objects shown on the display. After the instructions, the eye-tracker was calibrated using the nine-point calibration grid. Pictures were presented on a viewing monitor at a resolution of 1024 × 768 pixels. Each trial started with a drift correction, followed by a 500 ms blank screen. The visual scene was presented just over 1000 ms before onset of the predictable word in experimental trials. On filler trials, the presentation was just over 1000 ms before the onset of a word that referred to a distractor, or else at an arbitrary mid-sentence point if the sentence did not mention anything in the scene. Mean preview time for the experimental items was 1053 ms (SD 21 ms) for the Interpreter group and 1029 ms (SD 18 ms) for the Translator group. For the Filler items, mean preview time was 1047 ms (SD 18 ms) for the interpreter group and 1025 ms (SD 11 ms) for the translator group (with the difference being due to a change in display computer across groups). The picture stayed on the screen until offset of the spoken sentence. A blank screen then appeared for 4000 ms, after which audio recording of the interpretation stopped. After this, the following question appeared: "Did the sentence mention any of the pictures?". Participants gave their answer using a keyboard, pressing 1 for "Yes" and 2 for "No", and the next trial started.
The experiment started with four practice trials, after which participants were given a chance to ask questions. The experimenter also checked whether participants were interpreting the trial sentences simultaneously, and, if necessary, reminded participants that they should interpret simultaneously. The eye-tracker was then recalibrated before participants began the experiment. No feedback was given during the experiment. The experimenter monitored the eye-tracking display and recalibrated if necessary. The session lasted about 50 min.

Comprehension question accuracy
The mean accuracy for comprehension questions in the experimental trials was 97.3% (SD 2.9%) for the Interpreter group and 97.5% (SD 3.6%) for the Translator group. Incorrectly answered trials were excluded from the eye-tracking analysis.

Eye-tracking data analyses
We analysed data from the two groups separately using a linear mixed effects model with the lme4 package (Bates, Mächler, Bolker, & Walker, 2015), using the optimx optimiser (Nash, 2014) in R Studio Version 1.4.1717 (RStudio Team, 2021). Following (Ito et al., 2018), proportions of fixations on target, English competitor, French competitor, and unrelated objects were calculated separately, using the Eye-Link's DataViewer, for 50 ms bins. 4 Blinks and fixations outside the computer screen were included in calculation of the proportion of fixations. However, bins containing only blinks or fixations outside the computer screen were then excluded from the analysis. We explored the time-course of effects by running the model for each bin from 1000 ms before target word onset to 1000 ms after onset, in order to consider prediction and bottom-up activation of phonology. The model evaluated the arcsine-transformed fixation proportions on critical objects as predicted by condition for each bin. The unrelated condition was used as a reference group, or baseline condition, using the relevel function in RStudio, so that we could test the effects of each critical condition relative to the unrelated baseline condition (target vs. unrelated, English competitor vs. unrelated, and French competitor vs. unrelated). The model included random intercepts and (de-correlated) random slopes for participants and items (Barr, 2008a). We used the bobyqa method within the optimx optimizer, which allowed all models to converge. However, as the models returned singular fit warnings, we also carried out a Bayesian linear mixed model analysis to check our findings (again using the optimx optimiser, this time with the nlminb method). Where the Bayesian model returned different results for any time bin, this is noted in the footnotes. As in Ito et al. (2018), and similarly to Borovsky, Elman, and Fernald (2012), we base our conclusions on periods over which a minimum of three consecutive bins are significantly different between a critical condition and the baseline condition. We consider that any significant divergence begins at the start time of the first of these consecutive bins, and that the difference between a critical and the baseline condition is significant when the t-value has absolute values exceeding 2 (Baayen, Davidson, & Bates, 2008).
We also carried out a cluster-based permutation analysis. This allowed us to compensate for some of the disadvantages of a bin-by-bin analysis by allowing us to identify time ranges over which an effect was statistically reliable, correct for multiple comparisons and avoid the element of arbitrariness in the choice of length of time bins (Barr, Jackson, & Phillips, 2014). Since the bin-by-bin analyses for the experimental items did not reveal significant differences between English and French competitor and unrelated objects, we considered fixations on Target and Unrelated objects only in the cluster-based permutation analyses. We ran the cluster-based permutation analyses for the filler items for the conditions identified as significant in our binby-bin linear mixed model analyses. We used the "clusterperm" package in RStudio to run by-subject and by-item ANOVAs for each time bin for the period from − 1000 ms before until 1000 ms after word onset. We 3 Dominance was assessed using a sighting ocular dominance test. then detected clusters of time bins by applying a clustering threshold using the detect_clusters_by_effect function. We then ran permutation tests by subject and by item to calculate a Monte Carlo p-value which evaluated the probability of such clusters occurring by chance.

Linear-mixed model for experimental items by group
As shown in Fig. 1, interpreters looked at the target object consistently more than the unrelated object from − 600 ms before, until 1000 ms after, predictable word onset. This result was supported by the results of a cluster-based permutation analysis, which detected a cluster for the period of − 600 ms to 1000 ms in both the by-item and the by-subject analyses. Using a Monte Carlo estimate, we calculated that the probability of this cluster occurring by chance was minimal for both the byitem and by-subject analysis (p < .001). Thus, interpreters showed predictive looks to the target object, and persisted in these looks. Although they tended to look more at the English competitor object than the unrelated object from 550 ms to 900 ms after word onset, this difference was not statistically significant. Interpreters therefore predicted predictable words, but we did not find evidence that they predictively activated their phonology.
Translators also fixated the target object significantly more than the unrelated object from − 400 ms onwards (except in one time bin at − 150 ms) but they did not predictively activate the English phonological information of the predictable word 5 (see Fig. 2). We carried out the same cluster-based permutation analysis as for the interpreter group. Based on our by-subject analysis we identified a cluster running from − 450 ms before word onset until 1000 ms after word onset with a probability of chance occurrence of p < .001. The by-item analysis returned slightly different results, with a first cluster whose probability of occurring by chance was minimal starting only at − 50 ms before word onset until 600 ms after word onset (Monte Carlo p-value of <.001) and continuing again from 750 ms until 1000 ms after word onset (Monte Carlo p-value of <.001). Thus, our analyses showed that translators engaged in prediction. Translators did not fixate either the English or the French phonological competitor more than the unrelated object at any point in the analysis window. In sum, translators (like interpreters) predicted predictable words while carrying out a simultaneous interpreting task, but there was no evidence of word-form prediction.

Linear mixed model for filler items by group
We then analysed the filler items to examine whether interpreters fixated the target, English competitor, or the French competitor object more than the unrelated object even when the predictable word was not mentioned in the sentence. Filler sentences mentioned one of the distractors from the visual stimuli set 75% of the time. Where filler sentences did not mention any object present on the screen (25%), we chose an arbitrary mid-point in the sentence and considered the 1000 ms before and after this point. We used the same linear mixed effects model as for the experimental analysis. As shown in Fig. 3, in the interpreter group, there were no additional fixations on critical objects at any point before onset of the distractor word. There were significantly more fixations on the French phonological competitor object than the unrelated object from 850 ms to 1000 ms after word onset. 6 We ran a cluster-based permutation analysis to further investigate this result. Both the bysubject and by-item analyses detected a cluster of bins at 850 to 1000 ms (Monte Carlo p value: by subject p < .05, by item p < .01). The bysubject analysis also detected a cluster during which fixation proportions on the French phonological competitor were less than those on the unrelated object. This cluster ran from 300 to 450 ms after word onset and its probability of occurring by chance was p < .05. These results do not follow the same pattern as the results from the analysis of the experimental items, which show no significant increase or decrease of fixations on the French phonological competitor object compared to the unrelated object.
We carried out the same analysis on the filler items for the translator group. Translators looked at the target object more than at the unrelated object from − 350 ms to − 100 ms, at the English competitor object more than at the unrelated object from − 300 ms to 0 ms and at the French competitor object more than at the unrelated object from − 250 ms to 0 ms in the predictive time window. 7 We ran a cluster-based permutation analysis to consider the fixations on each of the competitor objects compared to the unrelated object. We found a similar pattern. For the French competitor object, we found a cluster in the by-item analysis from − 250 ms to − 50 ms with a Monte Carlo p-value of <.001, and in the by-subject analysis, we found a cluster from − 350 ms to − 50 ms with a Monte Carlo p-value of <.01. For the English competitor object, we found a cluster in the by-item analysis from − 300 to 50 ms with a Monte Carlo p-value of <.001, and in the by-subject analysis, we found a cluster from − 300 ms to 0 ms with a Monte Carlo p-value of <.001. For the Target object, we found a cluster in the by-item analysis from − 350 ms to − 150 ms with a Monte Carlo p-value of <.001, and in the by-subject analysis, we found a cluster from − 400 ms to − 200 ms with a Monte Carlo p-value of <.01. As shown in Fig. 4, these results appear to be due to a decrease in fixations on the unrelated object. They do not follow the same pattern as the results from the experimental sentences.

Between-group analysis
To compare results from the interpreter and translator groups on the experimental items, we ran our linear mixed model on both groups together and specified an interaction by group. We did not find any interaction between group and condition before the onset of the predictable word. After onset, we found a significant interaction between fixations on the target vs. unrelated object and group for three consecutive bins from 750 ms until 900 ms after word onset. 8 There was also an interaction between the fixations on the English competitor vs. unrelated object and group for two bins from 750 ms to 850 ms after word onset (although this did not pass our three-bin threshold, and only one bin was significant when we ran the Bayesian model). Although the bysubject cluster-based permutation analysis detected an interaction between group and fixations on the target compared to the unrelated object at two time points, at − 600 ms and from 800 to 850 ms, these clusters may have occurred by chance (Monte Carlo p-value at − 600 ms: p = .842, Monte Carlo p-value from 800 to 850 ms: p = .295). We detected no clusters when we ran the corresponding by-item analysis.
Finally, we calculated the average arcsine transformed fixation proportions over the three time bins from − 600 ms to − 400 ms, during which time our by-group analyses had indicated interpreters had begun predicting but the translators had not, and ran our linear-mixed model over this time period. We found that the interaction between the fixations on the target condition compared to the unrelated condition, and the professional group (interpreter/translator) approached, but did not reach, significance (t = − 1.725).
We thus find that both interpreters and translators predict during an interpreting task, and that neither group predicts the word-form of a predictable word (see linear mixed models). We did not find significant 5 A Bayesian linear mixed model with random slopes and intercepts found a significant difference starting only at − 350 ms and a lack of significant difference between target and unrelated object from − 200 to − 50 ms and at 50 ms. The results for the other bins were the same. 6 The Bayesian analysis did not return a significant difference in fixations over three consecutive bins. Only the bins at 850 ms and 950 ms were significant. 7 In the Bayesian analysis, for the target vs. unrelated object, the bin at − 150 was not significant; for the English competitor vs. unrelated object, the bin at − 300 ms was not significant, and for the French competitor vs. unrelated object, the bins at − 250 and − 200 ms were not significant. 8 Using the Bayesian model, the difference was significant from 750 ms to 850 ms only.
differences between predictive fixations in the two groups over consecutive time bins. We found a significant difference between fixation proportions on target and unrelated objects that depended on profession after the onset of the predictable word, from 750 ms to 900 ms.

Relation between eye movements and lag in simultaneous interpretation
We also analysed the eye-tracking data in light of our recordings of participants' interpretations. Based on our audio recordings, we considered when participants began interpreting, whether they  completed their interpretation within 4000 ms of the end of the spoken sentence, and whether they used the equivalent noun in French to translate the predictable word in English.
We excluded one of the translator participants from this part of the analysis because the audio recording had failed. We also excluded four trials from the analysis because four participants in the Translator group failed to provide an interpretation for one item. We again excluded items for which the comprehension question was answered incorrectly.
For the items for which participants answered the comprehension question correctly, 85.7% (SD: 14.3) were interpreted within the allocated time of 4000 ms after the end of the sentence (85.8% of items for interpreters (SD: 16.4%) and 85.6% for translators (SD: 14.7%)).  Participants interpreted 97.1% (SD: 5.9%) of items simultaneously (defined as beginning interpretation before the end of the sentence): 97.7% (SD: 5.6%) of items for interpreters and 96.4% (SD: 6.6%) for translators. On average, participants began interpreting − 2519 ms before onset of the target word (SD: 2733 ms) (interpreters at − 2305 ms (SD: 2847 ms), translators at − 2745 ms (SD: 2847 ms)). Mean onset of the interpretation was at 3782 ms (SD: 1791 ms) after the beginning of the sentence (3982 ms (SD: 1671 ms) for interpreters and 3570 ms (SD: 1888 ms) for translators). Although it may appear surprising that translators began interpreting slightly earlier than interpreters, this may be because translators understood the instruction to "interpret simultaneously" as meaning to begin their interpretation as soon as possible, whereas interpreters perhaps first waited for a meaningful unit before beginning their interpretation.
When participants predict more quickly, they may prepare their own utterance more quickly thanks to this prediction. And the reverse: preparing and producing their utterance more quickly may support predictive processing. We carried out an exploratory analysis to see if there was indeed a relationship between prediction and lag. We used the same linear-mixed model to investigate the time at which participants fixated the critical objects in relation to the time, relative to the target word, at which they began their interpretation (relative onset time) and whether or not they completed their interpretation. We split the data on the mean, with all items during which participants began interpreting at least − 2519 ms before the target word and completed their interpretation before audio recording stopped at 4000 ms after the end of the sentence in one group, and all items during which participants began interpreting less than − 2519 ms before the target word in the second group.
As shown in Fig. 5, when interpretation began at an earlier relative onset time, and participants completed their interpretation within 4000 ms of the end of the English sentence, the proportion of fixations on the target vs. the unrelated object was significantly higher (|t| > 2) from − 550 ms until the end of the analysed time window at 1000 ms after word onset. In addition, there were significantly more fixations on the English competitor object compared to the unrelated object (|t| > 2) from − 400 to − 250 ms before word onset. 9 As shown in Fig. 6, when interpretation began at a later relative onset time, the proportion of fixations on the target vs. the unrelated object was significantly higher (|t| > 2) from − 50 ms before word onset until before word onset until 1000 ms after word onset. 10 There were no significant differences in the proportion of fixations on the English or French competitor object compared to the unrelated object at any point of the analysed time window.
Thus, on trials in which interpretation began earlier relative to the onset of the target word, predictive eye movements also began numerically earlier. In addition, for trials in which the relative onset time of interpretation was earlier, we found evidence suggesting phonological prediction. We further explored these findings by running a linear mixed model that included an interaction term for relative onset time (measured in seconds and centred) and condition. We included random intercepts and decorrelated random slopes for item and participant. We found an interaction of relative onset time and condition (target vs. unrelated) (|t| > 2) from − 200 ms to 0 ms and an interaction of relative onset time and condition (English competitor vs. unrelated) (|t| > 2) from − 200 ms to 50 ms. 11 There was also an interaction between relative onset time and condition (French competitor vs. unrelated) (|t| >2) from − 150 ms to 0 ms. Thus, in the time period from − 200 ms before word onset until word onset itself, the divergence between the proportion of fixations on the target versus the unrelated object depended on the relative onset time, with greater divergence when relative onset time was earlier. In the time period from − 200 ms before word onset until 50 ms after word onset, the divergence between the proportion of fixations on the English competitor versus unrelated object depended on the relative onset time, with greater divergence when relative onset time was earlier. From − 150 ms before word onset until word onset, the divergence between the proportion of fixations on the French competitor object versus the unrelated object depended on the relative onset time.
Raw data and scripts for these analyses are available on Open Science Framework at: https://osf.io/3sjfd/.

Discussion
We investigated the time-course of prediction during a simultaneous interpreting task in professional simultaneous interpreters and professional translators untrained in interpreting. L1 French interpreters and translators listened to English sentences containing a highly predictable word and simultaneously interpreted these sentences into French. The results showed that both professional simultaneous interpreters and professional translators predict upcoming information while simultaneously interpreting. Simultaneous interpreters and translators both made predictions. There were no significant differences in predictive patterns between interpreters and translators. In an exploratory analysis, we found that on trials where participants had a shorter lag, they predicted to a greater extent than on trials where they lagged further behind the original.

Evidence for prediction during simultaneous interpreting
Our findings show that both professional simultaneous interpreters and professional translators, untrained in simultaneous interpreting, make predictive eye movements during a simultaneous interpreting task. This evidence supports theories of simultaneous interpreting which assume that prediction takes place during simultaneous interpreting (e. g., Chernov, 2004;Seleskovitch, 1984). It also demonstrates that training is not necessary for prediction to take place during simultaneous interpreting.
We found robust prediction in L2 listeners who are completing an additional task. This finding contrasts with some recent findings and extends others. For example, Ito, Corley and Pickering (2017) found that L2 listeners did not make predictive eye movements before target word onset when they had an additional cognitive load, and Dijkgraaf et al. (2017) found semantic prediction in L2 listeners when there was no additional task. Our study shows that highly proficient bilinguals make predictive eye movements even in the combination of challenging conditions present during a simultaneous interpreting task (noise, concurrent production and cognitive load). In other words, challenging listening conditions in L2 do not necessarily prevent prediction. Now let us compare our experiment with Experiments 1 and 2 of Ito et al. (2018). The professional simultaneous interpreters (L2 speakers of English) in our study made predictive eye movements earlier than the L2 speakers in Experiment 2 of Ito et al.'s, 2018 study (a significant difference over at least three time bins first emerged only at − 350 ms, similar to our translator group, who began predictive fixations at − 400 ms). In fact, the predictive fixations on target objects made by 9 The Bayesian linear mixed model showed a significant difference between the target and unrelated conditions emerged at 500 ms (with two bins not returning significance at 50 and 100 ms) and the significant difference between the English competitor object and the unrelated object first emerged at − 350 ms. 10 The Bayesian linear mixed model showed a significant difference between the target and unrelated conditions emerged at word onset and lasted until 1000 ms after word onset (with one bin not significant at 250 ms).
simultaneous interpreters in our study had a remarkably similar timing to the predictive fixations made by L1 speakers in Experiment 1 of Ito et al. (2018), with predictive fixations in both of these experiments beginning at − 600 ms before onset of a predictable word. However, unlike the L1 speakers in Ito et al. (2018), neither interpreters nor translators fixated on English competitor objects significantly more than on unrelated objects during the predictive time window. The results suggest that during a simultaneous interpreting task, people may predict Mean target offset was at 602 ms. Picture onset was just before − 1000ms. The black dots along the top of the graph show 50ms bins in which the difference between target and unrelated conditions was significant (|t| > 2). The open dots show 50 ms bins in which the difference between the English competitor and unrelated objects was significant (|t| > 2).

Fig. 6.
Graph showing trials in which participants began interpreting at − 2519 ms or less before the onset of the target word, or after onset of the target word. Time 0 ms shows target word onset. Mean target offset was at 602 ms. Picture onset was just before − 1000 ms. The black dots along the top of the graph show 50 ms bins in which when the difference between target and unrelated conditions was significant (|t| > 2). predictable words, but may not pre-activate their phonological form, even when they are trained in interpreting.

No clear evidence that simultaneous interpreters predict earlier or more than translators
Although both simultaneous interpreters and translators predicted during our simultaneous interpreting task, we found more robust predictive patterns in the interpreter group. However, we found no significant between-group differences in patterns of prediction. This suggests that training and experience do not affect the time course of prediction in challenging conditions, such as present in simultaneous interpreting. Thus our results do not lend support to theories of interpreting that posit that prediction in simultaneous interpreting is a skill or strategy.

Prediction of word-form during simultaneous interpreting
We did not find any evidence of prediction of word form in English by either the translator or the interpreter group during the simultaneous interpreting task, nor was there evidence of word-form activation after word onset (see Section 4.4). This contrasts with Ito et al. (2018) who found phonological prediction in the absence of an interpreting task (see also Kukona, 2020). Therefore, it may be that, as in Martin et al. (2018), simultaneous interpreting engaged participants' production mechanisms in producing a translation of what they had already heard, and so they could not use those mechanisms to the same extent to make wordform predictions. The load associated with the simultaneous interpreting task may also have limited eye movements that were linked to linguistic activation (for example, see Prasad & Mishra, 2020). Alternatively, it may be that even highly proficient L2 speakers do not predictively activate word form during comprehension.
Further, despite engaging in a simultaneous interpreting task into French, professional translators or interpreters showed no evidence of activating the French form of the target word. This finding is similar to Ito et al. (2018), who did not find that Japanese-English bilinguals participants activated the phonology of the Japanese translation of the English target word.

Lack of cross-linguistic activation even after word onset
We did not find any evidence of word-form activation after word onset. Neither the English nor the French competitor object attracted significantly more looks than the unrelated object even once the phonologically related word had been produced. One explanation for this is that, despite the target-absent design, participants narrowed their expectations before the onset of the predictable word and therefore did not look at the competitor at or after word onset (Barr, 2008b;Dahan & Tanenhaus, 2004;Weber & Crocker, 2012). The linguistic input may thus have led to a lack of fixations on what were less relevant objects (see Huettig & Altmann, 2005 for a discussion).
This lack of relevance of the competitor object in the predictable context may have been compounded by the fact that participants were switching between two languages, meaning that at any given point in time, each language may have been activated to a greater or lesser extent. For instance, once participants had comprehended the target word in English, they may have focused more on producing its translation in French. However, they may still have been uttering another, unrelated word in French, in the period immediately following onset of the target word. In addition, although comprehension and image presentation were time-locked and were uniform across participants, participants' production was neither time-locked to image presentation, nor did all participants produce the same sentence constituents at the same time. This may explain the apparent lack of French phonological activation. One or more of these factors may have led to the lack of crosslinguistic activation after the onset of the target word.

Exploratory findings suggest that lag and prediction are linked
In an exploratory analysis, we found that participants began to make predictive eye movements earlier when production was more closely synchronized with comprehension (i.e., when interpretation began earlier relative to the target word) than when production and comprehension were less synchronized. In addition, when production and comprehension were more synchronized, participants appeared to activate the phonological form of the upcoming word predictively.
Following a prediction-by-production account, it may be that greater synchronicity between production and comprehension leads to greater prediction. In other words, if participants' own utterances are more synchronized with the utterances that they comprehend, predictive processing might be supported; whereas if participants' own utterances are less synchronized with what they comprehend, this situation might be more akin to engaging the production mechanism in an unrelated task. Another possibility is that greater prediction leads to greater synchronicity of production and comprehensionthat is, participants produce their own utterance more quickly as their comprehension has been more rapid due to greater prediction.

Conclusions
We reported an experiment that investigated whether professional interpreters and professional translators make predictive eye movements during a simultaneous interpreting task. We found that both professional interpreters and translators untrained in interpreting predicted upcoming language, but neither group pre-activated phonological information associated with a target word. There were no significant differences in predictive patterns between groups. Thus, highproficiency bilinguals routinely engage in prediction during simultaneous interpreting, even though it is a complex and difficult task, and this prediction appears to be independent of training and experience.

Funding
Rhona Amos was funded by an SNSF doc.mobility grant, number P1GEP1_188165.

Declaration of Competing Interest
None.

Appendix 1. Experimental sentences and critical visual objects
List of experimental sentences used in the experiment, with cloze values provided for the English and French word. Object names are provided in French and English, and listed as the target object, the English competitor object, the French competitor object and the Unrelated object. One of the objects named was shown in the display alongside three distractor objects.