Task-specific signatures in the expert brain: Differential correlates of translation and reading in professional interpreters

Insights on the neurocognitive particularities of expert individuals have recently benefited from language studies on professional simultaneous interpreters (PSIs). Accruing research indicates that behavioral advantages in this population are restricted to those skills that are directly taxed during professional practice (e.g., translation as opposed to reading), but little is known about the neural signatures of such selective effects. To illuminate the issue, we recruited 17 PSIs and 15 non-interpreter bilinguals and compared behavioral and electrophysiological markers of word reading and translation from and into their native and non-native languages (L1 and L2, respectively). PSIs exhibited greater delta-theta (1-8 Hz) power across all tasks over varying topographies, but these were accompanied by faster performance only in the case of translation conditions. Moreover, neural differences in PSIs were most marked for L2-L1 translation (the dominant interpreting direction in their market), which exhibited maximally widespread modulations that selectively correlated with behavioral outcomes. Taken together, our results suggest that interpreting experience involves distinct neural signatures across reading and translation mechanisms, but that these are systematically related with processing efficiency only in domains that face elevated demands during everyday practice (i.e., L2-L1 translation). These findings can inform models of simultaneous interpreting, in particular, and expert cognitive processing, in general.


Introduction
Neurocognitive research on expertise has been fueled by insights from professional simultaneous interpreters (PSIs). These multilinguals are trained to accurately reformulate oral messages from and into their native and foreign languages (L1s and L2s) under strict time constraints (Chernov, 2004;Christoffels and de Groot, 2005;García, 2014), typically with no previous rehearsal. Relative to non-interpreter bilinguals (NIBs), PSIs are characterized by diverse neuroplastic effects (Elmer et al., 2014a;Elmer, Meyer, & Jancke, 2010) and enhanced performance in specific cognitive functions (Becker et al., 2016;Strobach et al., 2015).
Within the verbal domain, these behavioral advantages seem restricted to skills that are directly taxed during simultaneous interpreting (SI) -e.g., word translation as opposed to word reading-, suggesting that SI-related changes do not generalize to linguistic processing at large (García et al., 2019). However, there is very little evidence on the neural correlates of these selective linguistic effects, and no study has examined potential associations between such signatures and outward performance. Thus, a major gap exists towards the formulation of multidimensional neurocognitive models in the field. To foster progress in this direction, we conducted the first assessment of oscillatory modulations during translation and reading tasks in PSIs and NIBs.
Bilingual verbal skills face elevated demands during professional SI. In conference settings, incoming discourse typically exceeds the ideal rate of 95-120 words per minute (Chernov, 2004;Gerver, 1975), with ear-voice spans ranging between 2 and 4 s (Anderson, 1994;Gerver, 1976) and periods of overlapping input and output amounting to 70% of individual sessions (Chernov, 1994). As recently proposed in an integrative review (García et al., 2019), these outstanding lingusitic demands can lead to behavioral verbal advantages in PSIs, but only for skills that prove critical in their trade. For example, in the lexical domain, PSIs outperform NIBs in word translation (Christoffels et al., 2006) and language detection (Aparicio et al., 2017), two domains that are fundamental for successful professional performance. Conversely, they have no advantages in lexical decision (Hiltunen et al., 2016) or word reading (Santilli et al., 2018), both skills playing no distinct role in SI relative to other multilingual activities (García et al., 2019). This suggests that SI experience might impact on specific verbal mechanisms in a differential manner.
Nevertheless, relevant neural effects in PSIs seem widespread across language systems at large. These specialists, as well as SI trainees, exhibit structural differences in various brain areas, including perisylvian, frontostriatal, and parietal regions implicated in bilingual cognitive control and linguistic processing (Becker et al., 2016;Elmer et al., 2014a;Hervais-Adelman et al., 2017;Van de Putte et al., 2018). Also, they present increased neurophysiological modulations underlying diverse word-level operations (Elmer and Kuhnis, 2016;Elmer et al., 2010) -e.g., greater theta-band (4-7 Hz) connectivity between the left auditory-related cortex and Broca's area during semantic decision in single-and dual-language conditions (Elmer and Kuhnis, 2016). Prima facie, this would indicate that the domain-selectivity of behavioral advantages may not be mirrored by highly circumscribed effects at a cerebral level.
Notably, however, at least some neural patterns in PSIs seem differentially related to specifically taxed domains. Indeed, distinct neurofunctional patterns in this population have been detected for L2-L1 processes (namely, the ones corresponding to the dominant direction in professional practice) relative to L1-L2 and single-language processes (Christoffels et al., 2013;Elmer et al., 2010). In fact, this pattern has also been tracked longitudinally over the course of SI training (Hervais- Adelman et al., 2015a). Therefore, it appears that broad-gauge neurobiological changes along language systems may also be accompanied by distinct patterns for the mechanisms that are more markedly recruited in daily professional practice.
Yet, this notion remains empirically underspecified, as few studies have jointly assessed behavioral and neurofunctional signatures of interpreting experience across intra-and cross-linguistic tasks (Elmer and Kuhnis, 2016;Elmer et al., 2010;Hervais-Adelman et al., 2015a,b;Van de Putte et al., 2018), and none has tested for direct associations between such measures. Moreover, no report seems to have capitalized on the high sensitivity of frequency analysis to address the issue. Unlike other neuroscientific measures, frequency analyses allow disentangling distinct but parallel neural processes which occur simultaneously as an overall task unfolds (Roach and Mathalon, 2008). Crucially, this allows decomposing neurophysiological signals into co-occurring but discernible frequency bands, each of which can index cognitive processes that often prove untraceable without this level of granularity (Kielar et al., 2014). Indeed, by targeting specific bands, studies on L1 (Braunstein et al., 2012) and L2 (Vilas et al., 2019) processing have revealed significant differences (between conditions or groups) that escape other analyses of even the same electrophysiological signals, such as those based on event-related potentials (ERPs). Moreover, frequency analyses can capture subtle effects during word reading (Klimesch et al., 1997;Rohm et al., 2001) and translation (Grabner et al., 2007) processes, and they are particularly sensitive to expertise-related effects across various populations (Behroozmand et al., 2015;Doppelmayr et al., 2008;Pallesen et al., 2015). Hence, frequency analyses afford a particularly promising framework to examine the conjecture raised above.
In this sense, the 1-8 Hz frequency range, which encompasses the delta (0.5-4 Hz) and theta (4-8 Hz) bands, emerges as a sensitive candidate for investigation. Indeed, lexico-semantic processing seems to be indexed by oscillatory changes that spread across both bands (Davidson and Indefrey, 2007;Hald et al., 2006;Kielar et al., 2014) or within either of them (Allefeld et al., 2005;Bastiaansen et al., 2005;Grabner et al., 2007;Molinaro and Lizarazu, 2018;Vilas et al., 2019) in a host of native-language, foreign-language, and even translation-specific processes. Moreover, power increases over different portions of this range have been reported among correlates of expertise in other domains (Behroozmand et al., 2015;Doppelmayr et al., 2008;Pallesen et al., 2015). Accordingly, increased event-related power synchronization across the delta-theta (1-8 Hz) band represents a likely candidate to index expertise effects in PSIs relative to NIBs across linguistic tasks.
Against this background, we conducted the first EEG-based comparison of word reading and translation processes in PSIs and NIBs. Based on previous findings, we hypothesized that selective advantages in translation, indexed by faster response times (RTs) for PSIs, would be accompanied by increased event-related power synchronization across tasks, presumably over the delta-theta band. In addition, we predicted that such differences would be more marked for specifically trained skills (i.e., translation abilities) and selectively related to processing speed (RTs) in them. Moreover, we examined other frequency bands -alpha (8-13 Hz), low beta (13-21 Hz), and high beta (21-34 Hz)-to explore whether potential delta-theta effects are specific to that frequency range or manifest in others as well. Finally, to further test the specificity of this effect, we conducted exploratory analyses of relevant ERPs. Briefly, with this approach, we aimed to better understand the scope of neurocognitive effects in a model of expert language processing.

Participants
Thirty-four subjects were recruited for the study, but two of them were removed due to excessive noise in the signals. The final sample thus comprised 32 subjects, who partially overlap with those from a previous report (Santilli et al., 2018), namely: 17 PSIs and 15 NIBs. Importantly, this sample size was similar to or larger than those reported in previous EEG studies assessing translation processes and/or differences between PSIs and NIBs (Christoffels et al., 2013;Elmer and Kuhnis, 2016;Elmer et al., 2010;Grabner et al., 2007;Jost et al., 2018;Klein et al., 2018). Moreover, to determine the statistical power of our sample size, we used G*Power 3.1 (Faul et al., 2007) to calculate the minimum effect size detectable given our experimental design and statistical approach (a mixed effects ANOVA, including 32 subjects, 4 measures, and a resulting total sample size of 128). Having set a power of 0.95 and an alpha level of 0.05, we obtained a η 2 p of 0.06. This effect size is lower than those reported in a previous EEG study comparing PSIs and NIBs on L1, L2, L1-L2, and L2-L1 tasks (Elmer et al., 2010) -namely, a η 2 p of 0.84 for a main effect of group and a η 2 p of 0.64 for a group-by-task interaction. Therefore, our study is capable of detecting differences that are even smaller (i.e., harder to trace) than those obtained in similar studies.
All participants were native speakers of Spanish (L1) with high proficiency in English (L2). They were right-handed and had normal or corrected-to-normal vision. None of them reported a history of neurological or psychiatric disease. The PSIs had a mean of 14.65 years of experience (SD ¼ 12.09), mainly in the field of conference interpreting. The NIBs were either English teachers or advanced students at an English teacher program, with no experience in interpreting. Data from a previously reported self-report questionnaire  showed that the samples were not significantly different in terms of demographic variables (gender and age) and linguistic factors (competence in and weekly exposure to both languages, age of L2 learning, and years of L2 study), with additional tests revealing non-significant differences in short-term memory, cognitive flexibility, and overall executive skills (all p-values ¼ n.s.). Crucially, however, PSIs were significantly more competent in both interpreting directions and they spent significantly more time engaging in such activities each week (all p-values < .001) -although their professional practice mainly required them to interpret from L2 to L1. For details on these variables, including descriptive statistics, p-values, and effect sizes, see Supplementary material 1 (Table S1).
All subjects signed an informed consent and all experimental protocols were performed in accordance with the Declaration of Helsinki. This study was approved by the Bioethics Committee of the National University of Mar del Plata.

Materials
The study consisted of two previously reported reading and translation tasks, developed on Python programming language (www .python.org) with the Pygame development library (www.pygame.org). Taken together, the tasks involved 384 nouns, half in each language . The stimuli were divided into three blocks of 64 items per language, with each block comprising the same number (n ¼ 16) of concrete cognates (e.g., roca, rock), abstract cognates (e.g., comedia, comedy), concrete noncognates (e.g., mesa, table), and abstract non-cognates (e.g., castigo, punishment). 1,2 The Spanish and English blocks were matched for frequency ranking (p ¼ .97) and syllabic length (p ¼ .99), and blocks within each language were additionally matched for frequency (Spanish: p ¼ .95; English: p ¼ .98) -data for these variables were extracted from Davies (2008aDavies ( , 2008b. Stimuli were pseudorandomly distributed within each block, such that items with similar phonological patterns or semantic proximity were at least separated by two intermediate items. Below we describe how these blocks were allocated across the four experimental tasks, namely: L1 reading (L1R), L2 reading (L2R), backward translation (BT, from L2 to L1), and forward translation (FT, from L1 to L2).

Tasks and procedure
Within-language processing was tested through two previously reported reading tasks Santilli et al., 2018) based on two of the stimulus blocks described above -one in Spanish, for L1R (64 items), and one in English, for L2R (64 items). None of the items in these blocks was a translation equivalent relative to those of the other language. Note, too, that these two blocks were used exclusively for the reading (as opposed to the translation) tasks. Each trial began with a fixation cross (shown for a random period of 100-300 ms) and continued with the target word (which remained visible for 200 ms). Stimuli were presented in white letters (font: Times New Roman; size: 70 pts), centered on the screen against a black background. In each task, participants were instructed to read each word out loud, as fast and accurately as possible, and to press a key as soon as they began articulating their response. The keystroke served both to record RTs and to cue the following trial. Importantly, note that theuse of key-presses allows circumventing phonetic and phonological confounds affecting oral responses (Rastle et al., 2005;Rastle and Davis, 2002) and that both vocal and manual responses prove equally sensitive to particular linguistic effects (Hutson et al., 2013). Accuracy was judged by two separate examiners on a trial-by-trial basis, and the few cases of disagreement were settled by a third examiner. RTs were recorded by a custom-made Python script (www.python.org). Trials were considered invalid if the subject (i) failed to respond, (ii) committed a false start, (iii) uttered a wrong word, or (iv) translated the stimulus instead of reading it. To avoid order-related biases, the reading tasks (L1R, L2R) were counterbalanced across participants. Taken together, both reading tasks lasted approximately 10 min.
Between-language processes were examined via two previously reported translation tasks Santilli et al., 2018) -one for BT and one or FT. BT performance was assessed with one of the two remaining English blocks (64 items), whereas FT performance was examined via one of the two remaining Spanish blocks (64 items). Importantly, these blocks were used exclusively for the translation (as opposed to the reading) tasks. To avoid priming effects between tasks, the blocks used for BT and FT in each subject were chosen so that the items in one language would not be translation equivalents of those in the other language. Also, half the sample performed the BT task with one English block and the other half did so with the other English block (and the same was true of the use of the Spanish blocks for the FT task). The structure and response modality of each trial was exactly identical to that described for the reading tasks. Trials were rejected if the subject (i) failed to respond, (ii) committed a false start, (iii) read the word instead of translating it, or (iv) provided either a wrong or non-predefined translation. 3 As was the case with the reading tasks, the order of BT and FT tasks was counterbalanced across participants. Taken together, the two translation tasks lasted approximately 20 min. The full counterbalancing scheme of all tasks can be found in the Supplementary material 1 (Table S2).

Behavioral data analysis
Differences in the number of rejected trials, as well as accuracy and RT outcomes, were analyzed via mixed effects ANOVAs including a between-subjects factor (group: PSIS, NIBs) and two within-subjects factors (task: reading, translation; source language: L1, L2) -with subjects as a random factor. In each analysis, a Gaussian error distribution with an identity link function was assumed for the dependent variable and the significance threshold was set to p < .05. Post hoc analyses for significant interactions were made via Tukey's HSD test. For each subject, mean accuracy was calculated as the percentage of correct recorded trials per condition, while mean RT was calculated considering only correct responses with latencies below 2000 ms, as in previous works (Fine et al., 2013;Marinus and de Jong, 2011). Effect sizes for main effects and interactions were calculated based on partial eta squared, η 2 p . Depending on the value of this index, effect sizes can be considered small (>0.02), medium (>0.13), or large (>0.26) (Cohen, 1988). In the case of pair-wise comparisons, effect sizes were calculated through Cohen's d, an index that also discriminates among small (0-0.20), medium (0.50-0.80), and large (>0.80) effects (Cohen, 1988). All statistical analyses were performed on Statistica 10 (http://www.statsoft.com/). Of 1 Cognates are words from a given language that have major orthographic and/or phonological overlap with a viable translation equivalent in another language (e.g., c amara in Spanish and camera in English). Noncognates are words which lack such sublexical overlap relative to their translation equivalents.
2 Note that our design is not aimed at targeting concreteness or cognate status as factors for analysis. Rather, the inclusion of exemplars representing all possible combinations of these variables sought to ensure that the results could be generalized to lexico-semantic processing at large, especially considering that both concreteness (Barber et al., 2013;Jessen et al., 2000;van Hell andde Groot, 1998a, 1998b) and cognate status (Broersma et al., 2016;Christoffels et al., 2003;Midgley et al., 2011) can modulate behavioral and neural responses during bilingual lexical processing.
note, whereas the behavioral data comes from a sample that partially overlaps the one reported in a previous paper (Santilli et al., 2018), its combination with the newly analyzed EEG data allows addressing these study's novel aims, namely: (i) detecting real-time neural differences between groups and (ii) tracking associations between discriminatory neural patterns and performance in specific tasks.
2.5. EEG methods 2.5.1. Acquisition and preprocessing EEG activity was recorded online during all four tasks for each participant. Signals were acquired through a Biosemi Active-two 128channel system with pre-amplified sensors and a DC coupling Amplifier. All signals were originally sampled at 1024 Hz, later resampled to 512 Hz, and referenced to the average of all channels. Similarly to previous works (Christoffels et al., 2013;Kielar et al., 2014;Vilas et al., 2019), EEG data were filtered between 0.5 and 45 Hz, and epochs were selected from continuous data in a window from À0.5 s to 1 s around the time of stimulus onset. In line with reported procedures (Vilas et al., 2019), eye movements and blink artifacts were corrected with independent component analysis, remaining artifacts were rejected through visual inspection, and noisy channels were corrected by interpolation. Epochs corresponding to incorrect, invalid, or excessively long (>2000 ms) RTs were excluded from analysis. Considering these criteria, as stated in the "Participants" section, data from two NIBs were excluded from analysis, leading to the final sample of 17 PSIs and 15 NIBs. All EEG signal processing steps were implemented on MATLAB software (vR2016a) through the EEGLAB (v14.1.2) toolbox.

Frequency analysis
Frequency analysis was implemented through EEGLAB software using the Fast Fourier Transform algorithm with a hanning taper. The mean of event-related power synchronization between trials was calculated for each condition and subject, with a baseline of 300 ms before stimulus onset (À300 ms, 0 ms), as in previous works (Hald et al., 2006;Kielar et al., 2014;Vilas et al., 2019). We calculated the power for each trial relative to its respective baseline and then averaged the ensuing values across trials for each subject.
In line with previous works on word translation (Grabner et al., 2007) and bilingual reading (Kielar et al., 2014), power was averaged in different frequency bands of interest. Here we focused on the delta-theta (1-8 Hz), alpha (8-13 Hz), low beta (13-21 Hz), and high beta (21-34 Hz) bands. Also, to gain statistical power and obtain temporally specific results, power changes in each band were calculated for an initial window (0-300 ms) associated with early language access processes, and a later window (300-600 ms) related to various lexico-semantic processes (Grabner et al., 2007;Hald et al., 2006;Vilas et al., 2019;Willems et al., 2008).

Spatial cluster analysis of power across bands
Power differences between groups were assessed in each task through cluster-based topographic analyses considering each frequency band across both time windows. Following the original description by Maris and Oostenveld, 2007 and subsequent neurolinguistic studies (Davidson and Indefrey, 2007;Kielar et al., 2014;Vilas et al., 2019), we implemented this approach through a permutation test. First, for each between-group comparison in each time window and frequency band, we performed a Wilcoxon test (Wilcoxon, 1946) -a univariate non-parametric test that does not assume normal distributions (Sheskin, 2003)-on the power values associated to each electrode, and we obtained the p-values corresponding to each electrode. Second, to increase statistical stringency and topographic precision even beyond pioneering reports of this method (Maris and Oostenveld, 2007), we set a threshold of p .01 to define clusters of neighboring electrodes with potential differences between groups. With the purpose of identifying topographically consistent differences, clusters were considered significant only if they encompassed more than five electrodes. This approach is empirically advantageous as it can evaluate between-group differences without relying on a priori topographical hypotheses based on ROIs or particular sets of electrodes (Maris and Oostenveld, 2007). As in previous works (Davidson and Indefrey, 2007;Kielar et al., 2014;Maris and Oostenveld, 2007), permutations (5000) were implemented to generate a sample -in our case, for the largest cluster size (cluster-level statistic). This sample was used to determine the significance (Montecarlo p-values) of the clusters obtained in the original data by evaluating the proportion of maximum sizes for permutation clusters that are larger than those of our comparison (Maris and Oostenveld, 2007). These clusters were considered significant at a p < .025 (assuming an alpha level of .05) relative to the calculated sample, as in previous works (Maris and Oostenveld, 2007;Vilas et al., 2019). Between-group comparisons for each task were corrected for multiple comparisons among the four frequency bands and the two time windows via the false discovery rate (FDR) method, considering up to the three largest clusters discriminating between PSIs and NIBs.

Correlation between frequency and RT results
In order to establish direct links between significant neural and behavioral patterns, we evaluated whether band-specific power differences were related to outcomes in each task yielding differential performance between groups. To this end, as in previous studies linking EEG modulations with behavioral performance (Melloni et al., 2016), we used Spearman's correlations (a test recommended for non-normal data distributions) to calculate nonlinear associations between the mean power values of the significant clusters and the mean RT of the corresponding condition. This procedure was implemented for all subjects of each group. Correlations were deemed significant if, after an FDR correction (Benjamini and Hochberg, 1995) considering comparisons between each task and group, they yielded a p < .05.

Complementary ERP analysis
As a complementary exploratory analysis, we examined whether between-group differences in each task were also present in modulations of the N400, an ERP shown to track differences between PSIs and NIBs in other linguistic paradigms (Elmer et al., 2010). To this end, for each task separately (L1R, L2R, BT, FT), we averaged the EEG signal over all trials for each electrode and subject (after removing the mean baseline of 300 ms pre-stimulus). This yielded one activity value for each electrode and subject. Then, we used univariate non-parametric Wilcoxon tests to compare the values for each electrode between groups. Using a threshold of p ¼ .01, we aimed to determine differential clusters comparing their sizes with those obtained through a permutation (n ¼ 1000) distribution, as done for our frequency analyses (for details, see sections 2.5.2 and 2.5.3).

Behavioral results
After rejection of trials with recording errors, the number of remaining trials for accuracy analysis did not differ significantly between groups  (Table S5).

Frequency results
Given that frequency analyses are performed as between-group comparisons for each task separately, the number of trials rejected due to faulty signals was compared in each case via unpaired two-tailed t- Fig. 1. Response times, power differences, and behavioral-neural associations for PSIs and NIBs during word reading and translation. A. Response times for each task in PSIs and NIBs. The asterisk (*) indicates significant differences. B. Clusters yielding significant between-group differences in each task. Results revealed selective and consistent power increases for PSIs over NIBs in the delta-theta band (1-8 Hz) across tasks. The color bar indicates statistics associated to univariate tests of significant electrode clusters. C. Scatterplot of the associations between the mean power of significant clusters and response time for BT. Significant associations were found for PSIs (first and second insets), there being no significant associations for NIBs in their respective task-specific clusters (third and fourth insets). BT: backward translation; FT: forward translation; L1R: native-language reading; L2R: foreign-language reading; NIBs: non-interpreter bilinguals; PSIs: professional simultaneous interpreters.
tests. The trials thus rejected was similar between PSIs and NIBs in each task -for descriptive statistics, see Supplementary material 2.1 (Table S7).
Frequency analyses for the reading tasks showed significant clusters discriminating between groups (Figure 1, panel B), characterized by higher delta-theta band (1-8 Hz) power for the PSIs in the later timewindow (300-600 ms). For L1R, we observed two significant clusters, one comprising principally frontal and posterior electrodes over the left hemisphere (FDR-corrected p ¼ .01), and another one mainly comprised of right posterior electrodes (FDR-corrected p ¼ .01). For L2R, the only significant cluster was found over left frontal and posterior electrodes (FDR-corrected p ¼ .01), mainly overlapping with the first L1R cluster. None of the remaining combinations of time window and frequency band yielded significant differences at the established thresholds.
As regards translation tasks, given that RTs were greater in NIBs than in PSIs, we first ensured that potential neural activity differences between groups would not be driven by the later responses of NIBs -indeed, longer latencies could involve neural differences across frequency bands due to motoric artifacts rather than condition-specific modulations. To this end, for each task separately, we omitted the top 5% of trials yielding the longest RTs in each NIB and the top 5% of trials yielding the shortest RTs in each PSI. This provided an adequate empirical framework to circumvent the potential biases mentioned above, given that the remaining number of trials in each task and condition was similar for both groups (see Supplementary material 2.1), and their mean RTs did not yield significant between-group differences for either BT [t(30) As observed for reading, frequency results for the translation tasks showed that, relative to NIBs, PSIs exhibited consistently higher power in the delta-theta (1-8 Hz) band across the later time-window (300-600 ms). However, differences were markedly more widespread for BT, with three clusters discriminating between groups: one was distributed across left frontal and posterior sites (FDR-corrected p ¼ .002), another one extended principally over bilateral frontal channels (FDR-corrected p ¼ .002), and the third was mainly comprised of right posterior electrodes (FDR-corrected p ¼ .001). By contrast, FT yielded less distributed between-group differences over two clusters: one consisting of right frontal electrodes (FDR-corrected p ¼ .02) and another one spreading over right posterior electrodes (FDR-corrected p ¼ .02). None of the remaining combinations of time window and frequency band yielded significant differences at the established thresholds. Notably, these results remained nearly identical when the analyses were rerun without omitting the top 5% of trials yielding the longest RTs in NIBs and the shortest RTs in PSIs -for details, see Supplementary section 2.2 and Figure S1.
Additionally, we explored electrophysiological differences between reading and translation tasks. To this end, we followed the same analysis criteria detailed in sections 2.5.2 and 2.5.3 of the main manuscript (the only difference being that, as required by within-group contrasts, we employed two-tailed unpaired t-tests instead of Wilcoxon tests). We focused on the frequency band and time window yielding significant effects in our main analysis (i.e., the delta-theta band in the 300-600 ms window). As in previous neuroscientific studies comparing translation and single-language conditions (Klein et al., 1995;Rinne et al., 2000), we contrasted L1R with FT and L2R with BT -i.e., the conditions that used the same language for their stimuli. Results revealed widespread power increases for the reading over the translation conditions in both groups -see Supplementary material 2.3, Figure S2.

Exploratory examination of word-category effects
For strictly exploratory purposes, we examined whether betweengroup differences were driven by specific lexical categories, namely: abstract cognates, abstract noncognates, concrete cognates, concrete noncognates. Analysis of accuracy, RT, and frequency modulations convergently showed that the between-group differences reported above were not driven by any of these four word categories in particular (see Supplementary material 2.4).

Correlations between RTs and significant power patterns
Given that both translation tasks yielded significant behavioral differences, we explored possible associations between RTs in each of them and the mean power of significant clusters via Spearman correlations -including FDR correction (Figure 1, panel C). For PSIs, we found that RTs for BT had significant negative associations with the frontal (rho ¼ À0.51, cluster-corrected p ¼ .05) and the right posterior (rho ¼ À0.57, cluster-corrected p < .05) clusters. No significant association was found between RTs for BT and the remaining cluster (rho ¼ 0.15, clustercorrected p ¼ .56) -for details, see Supplementary material 2.5 (Table S10). Neither did this group exhibit any significant associations between FT and underlying neural activity. Finally, results for NIBs revealed no significant associations between RTs and neural activity in either BT (cluster 1: rho ¼ À0.01, cluster-corrected p ¼ .96; cluster 2: rho ¼ À0.08, cluster-corrected p ¼ .78; cluster 3: rho ¼ 0.26, clustercorrected p ¼ .35) or FT (cluster 1: rho ¼ À0.03, cluster-corrected p ¼ .92; cluster 2: rho ¼ 0.06, cluster-corrected p ¼ .83) -for details, see Supplementary material 2.5 (Table S11).
This overall pattern of results remained the same upon rerunning the analyses without omitting the top 5% of trials yielding the longest RTs in NIBs and the shortest RTs in PSIs: following FDR correction, the only significant correlations were those involving BT in PSIs, both in cluster 1 (rho ¼ À0.5319, cluster-corrected p ¼ .0451) and cluster 2 (rho ¼ À0.5613, cluster-corrected p ¼ .0451). Every other correlation in PSIs and NIBs remained non-significant after FDR correction -for details, see Supplementary material 2.6 (Table S12 and Table S13).

Correlations between frequency patterns and years of interpreting experience
Finally, we tested for possible associations between frequency patterns and interpreting experience in the PSIs group. Power values for clusters differentiating PSIs from NIBs in each task were averaged and tested for correlations with years of interpreting experience. We found no significant associations between these metrics in either of our two analytical approaches, that is: neither when removing the 5% of fastest and slowest trials to remove potential motor artifacts (rho < 0.38, p > .13), nor when retaining all valid trials (rho < 0.38, p > .13).

Complementary ERP results
Results from the complementary ERP analyses showed no significant clusters in any of the four tasks when focusing on a canonical N400 window of 200-600 ms (Kutas and Federmeier, 2011). Null results were also obtained for each task when considering a shorter (300-500 ms) and a longer (150-650 ms) window. For details about these results, see Supplementary material 2.4 (Table S14).

Discussion
Through direct comparisons of PSIs and NIBs, this study examined how neurocognitive correlates of reading and translation are modulated by experience in SI. While behavioral results revealed interpreter advantages only for translation, frequency analyses showed greater deltatheta power for PSIs across both tasks, irrespective of language of presentation. Notably, however, neural differences in PSIs were most marked for BT, which exhibited maximally widespread modulations that selectively correlated with behavioral performance. Below we discuss these findings and their implications for models of SI and expertiserelated effects at large.
In line with other studies (Christoffels et al., 2006), PSIs outperformed non-professional bilinguals in translation but not in single-language production. This pattern supports the recent claim (García et al., 2019) that, behaviorally, linguistic advantages in PSIs are confined to domains taxed during professional practice -e.g., discourse comprehension (Yudes et al., 2013) and semantic error detection (Fabbro et al., 1991;Yudes et al., 2013), as opposed to number counting (Signorelli et al., 2011) and word repetition (Hiltunen et al., 2016). Interestingly, this finding supports the "common demands hypothesis," which suggests that transfer between trained skills and performance in fine-grained tasks depends on similarities between the two of them Patterson, 2014, 2015).
Notwithstanding, reading tasks did involve neural processing differences between groups. In both languages, PSIs showed higher delta-theta band (1-8 Hz) power in the later time-window (300-600 ms), mainly over left frontal and posterior electrodes -with additional recruitment of right posterior sites for L1R. This aligns with previous studies showing greater engagement of similar topographies for PSIs than NIBs in varied linguistic operations, such as semantic decision (Elmer and Kuhnis, 2016). It would thus seem that SI experience can modulate the neural signatures of processes that are behaviorally unaffected -as observed for other language-experience factors, such as age of L2 acquisition (Klein et al., 2018;Vilas et al., 2019). Here, the distributed power increases observed for PSIs during reading could reflect a differential allocation of cognitive resources for diverse language domains, further suggesting that only those subjected to field-specific practice (such as translation) may reach outwardly detectable enhancements.
These insights are further refined by the neural effects observed during translation tasks. Whereas FT (L1-L2 processing) involved power increases for PSIs over right posterior electrodes, BT (L2-L1 processing) was the only condition yielding widespread bilateral power increases across frontal and posterior scalp sites. Suggestively, those topographies correspond to areas implicated in critical processes of SI, including speech comprehension and production (Abutalebi and Green, 2007; Hervais-Adelman and Babcock, 2019), sound-to-articulation mappings (Wise et al., 1999), and domain-general operations mediating language control and attention allocation (Hervais-Adelman and Babcock, 2019;Hervais-Adelman et al., 2015b). Also, given that our results were not driven by any particular lexical subcategory (as detailed in section 3.3), the observed effects seem to reflect general operations cutting across lexico-semantic processing at large.
In line with this distinctive signature of BT, previous studies comparing PSIs with NIBs have reported differential expertise-related patterns for L2-L1 relative to L1-L2 processes (Elmer et al., 2010). Also, neural differences between language directions have been found in studies assessing PSIs only, including fMRI reports on translation (Rinne et al., 2000) and ERP experiments on semantic decision (Proverbio et al., 2004). Considering that, at least in the Western context, professional SI is predominantly performed in backward rather than forward direction (Bros-Brann, 1976;Donovan, 2004;Gile, 2005;Seleskovitch and Lederer, 1989), the distinctive patterns observed for BT may also be driven by field-specific experience. In fact, specific training in backward SI has been claimed to differentially affect the neural signatures of cross-linguistic processes in L1 and L2 (Elmer et al., 2010). Our finding of uniquely widespread power increases for BT relative to all other conditions aligns well with this idea.
Further support for that interpretation comes from correlation results. Although behavioral advantages were detected in both translation tasks, only BT presented a significant association between power increases and RTs -a pattern that was exclusive to PSIs, as opposed to NIBs. In other words, during BT, the greater the synchronization among distributed hubs distinguishing PSIs from NIBs, the faster the experts' performance. To our knowledge, amid the growing neuroscientific literature on SI expertise (García et al., 2019;Hervais-Adelman and Babcock, 2019), this is the first evidence of a link between neural modulations and task-specific performance in PSIs. Suggestively, however, The PSIs' higher power values were not significantly correlated with their years of interpreting experience. Thus, as previously proposed by Elmer et al., (2010: 151-152), this could indicate that neurophysiological modulations related to SI expertise "are more likely related to the specific training during the SI education rather than to the amount of years of experience." Note, in this sense, that selective links between neural enhancements and behavioral boosts confined to specifically trained behavior have also been observed in populations with expertise in other fields, such as music (Margulis et al., 2009) and cars (Herzmann and Curran, 2011). In the case of PSIs, continual favoring of L2-L1 over L1-L2 interpreting (Bros--Brann, 1976;Donovan, 2004;Gile, 2005;Seleskovitch and Lederer, 1989) could fine-tune the co-dependence between increased neural recruitment and overall processing speed for BT in particular. In this sense, as claimed elsewhere (García et al., 2019), the linguistic particularities of PSIs do not equally manifest across language mechanisms at large.
Interestingly, all these effects were captured in the delta-theta band (1-8 Hz), with null modulations in every other band tested. Note that oscillations within this frequency range have repeatedly proven sensitive to task-internal manipulations in various linguistic operations (Braunstein et al., 2012;Davidson and Indefrey, 2007;Rohm et al., 2001), including word translation (Grabner et al., 2007). More particularly, increased theta synchronization for PSIs over NIBs has been observed during early stages of lexico-semantic processing, particularly between hubs supporting sound-to-meaning and sound-to-articulation processes (Elmer and Kuhnis, 2016). Thus, greater delta-theta power during task-relevant processes could represent a signature of sustained training. In fact, increased modulations in this and other frequency bands have been identified as a hallmark of expertise in other populations characterized by elevated musical (Pallesen et al., 2015) and athletic (Chuang et al., 2013;Doppelmayr et al., 2008) skills. Therefore, although other frequency bands may capture SI-expertise effects in different non-active conditions, such as resting state (Klein et al., 2018), delta-theta band dynamics may afford sensitive markers of task-specific differences in this population. Interestingly, too, the absence of significant effects in the exploratory N400 analyses highlights the distinct sensitivity of oscillatory measures to capture neurocognitive effects within and between bilingual samples, as shown by previous studies (Vilas et al., 2019).
More particularly, considering that our contrasts between reading and translation tasks consistently yielded widespread power increases for the former in both groups, greater delta-theta power could constitute a marker of less effortful processing. Indeed, reading tasks proved to be behaviorally easier than their translation counterparts, and the more expert group (PSIs) also showed power increases in the same band. Still, this conjecture would need to be tested in specific experiments.
Taken together, these findings offer new insights and constraints for neurocognitive models of SI expertise. First, in line with recent reports (Hervais-Adelman et al., 2017;Klein et al., 2018;Van de Putte et al., 2018), we observed neurobiological patterns underlying varied linguistic tasks in PSIs, even when behavioral performance was not enhanced. This reminds us that a null effect on one dependent variable (e.g., RT) does not necessarily entail a null effect of the independent variable (expertise) itself, delimiting the scope of conclusions derived exclusively from either behavioral or neural data (García et al., 2019).
Second, L2-L1 processing, which proves dominant for the population under study, was typified by distinctively broad patterns of neural hypersynchronization in proportion to behavioral outcomes. This supports and extends the hypothesis that SI-expertise effects are characterized by demand-based domain specificity (García et al., 2019): although neural changes underlie diverse verbal functions in PSIs, they prove differentially marked for those domains more consistently taxed in the profession (here, BT). Importantly, the proposed linguistic nature of the observed effects is further supported by the absence of between-group differences in relevant non-verbal domains -namely, short-term memory, cognitive flexibility, and overall executive performance (Supplementary material 1, Table S1). This pattern is not entirely surprising. Although executive tests have often revealed advantages for PSIs relative to NIBs (e.g., Becker et al., 2016;Strobach et al., 2015), such differences do not hold for all relevant subdomains (García et al., 2019), and even those domain-general functions reported as enhanced in PSIs, including short-term memory (Babcock and Vallesi, 2017;Christoffels et al., 2006;Stavrakaki et al., 2012) and cognitive flexibility (Yudes et al., 2011), often prove similar between both populations (Henrard and Van Daele, 2017;K€ opke and Nespoulous, 2006;Signorelli et al., 2011). Indeed, as shown by graphical modeling outcomes, translation speed and executive processing play independent roles in SI performance (Christoffels et al., 2003). Accordingly, as postulated before, the expertise effects captured herein probably reflect differences in lexico-semantic processing per se, as opposed to discrepancies in executive skills between groups. However, this tentative conclusion should be directly tested via comparisons of oscillatory signatures during executive tasks in PSIs and NIBs (focused, for example, on working memory or cognitive flexibility skills).
Third, note that the differential oscillatory pattern underlying BT spanned vast portions of the scalp. Admittedly, interpreting experience may involve more focal effects for BT in other aspects of brain function, such as reduced hemodynamic activity in the caudate nucleus during SI (Hervais-Adelman et al., 2015a,b). Still, our results suggest that task-specific differences should also be acknowledged to rely on widespread transient plastic effects. This would speak to a differential recruitment of diverse cognitive resources for specifically trained functions, inviting new research on their nature and interaction.
More generally, our results attest to the complex and situationallydriven interplay of behavioral and electrophysiological signatures of expertise. Specifically, neural effects cutting across an overarching domain (language processing) may be accompanied by enhanced performance in a distinctively trained function (translation), with direct brain-behavior relations emerging only for the sub-function predominantly taxed in daily practice (BT). Tentatively, it would thus seem that neurocognitive markers of expertise could present a gradient of convergence, such that they are potentially dissociable for marginally relevant functions but intimately interrelated for specifically taxed skills. This conjecture opens new possibilities to test and refine models of expertise within and beyond the field of SI proper.

Limitations and avenues for further research
This work presents some limitations that invite future studies. First, although our sample had sufficient statistical power and proved larger than or similar to that of previous EEG studies assessing translation processes (Christoffels et al., 2013;Grabner et al., 2007;Jost et al., 2018) or comparing PSIs and NIBs (Elmer and Kuhnis, 2016;Elmer et al., 2010;Klein et al., 2018), it would be desirable to replicate this investigation with more subjects. Second, the frequency analysis was performed in predetermined frequency bands and temporal windows. Whereas this reduces data dimensionality and facilitates the interpretation of results, useful refinements could emerge from data-driven approaches. Third, complementary insights could be gained by examining neural correlates of other tasks beyond the scope of the present study, such as picture naming and verbal fluency -for behavioral approximations to these topics, see (Santilli et al., 2018) and (García et al., 2019). Fourth, note that the present protocol made exclusive use of written stimuli. Although this decision allows circumventing confounds associated to superior sound-processing skills in SIs (Elmer et al., 2014b) while favoring comparability with previous neuroscientific translation studies that also employed visual stimuli (Christoffels et al., 2013;Jost et al., 2018;Price et al., 1999;Quaresima et al., 2002), it would be useful for future studies to employ auditory stimuli in tandem with visual ones so as to better ascertain the scope of linguistic advantages in this population. Fifth, as a complement to word-level assessments, discourse-level tasks could be incorporated to meet the imperative of ecological validity. Sixth, note that the PSIs tested here possessed several years of experience and professional practice. Therefore, it could be interesting to replicate this study in a pre/post design with SI trainees, aiming to see how soon these neurocognitive differences emerge after the onset of systematic practice. In this sense, future research in the field could profit from the inclusion of comprehensive assessments of the participants' translation and interpreting skills, ideally through validated instruments (Schaeffer et al., 2019). Finally, our cross-sectional design does not allow establishing whether present findings were triggered by SI experience. Indeed, as noted elsewhere, it could well the case that PSIs possessed distinct neurocognitive profiles prior to entering the field (García et al., 2019). Here lies another reason to apply the present protocol in longitudinal or pre/post studies tracking SI trainees before and after training.

Conclusion
This is the first study showing a direct and selective link between behavioral and neural signatures of translation performance in PSIs. Our results support the idea that experience-related effects in this population manifest distinctively for specifically taxed abilities, as suggested by recent models of expertise. Further research in this direction can offer even finer insights on how human neurocognition adapts to recurring demands imposed on linguistic and otherwise cognitive systems.

Data availability statement
All demographic, behavioral, and preprocessed EEG data is publicly available on the Open Science Framework (García et al., 2019).

Declaration of competing interest
None to declare.