Context-Based Facilitation in Visual Word Recognition: Evidence for Visual and Lexical But Not Pre-Lexical Contributions

Abstract Word familiarity and predictive context facilitate visual word processing, leading to faster recognition times and reduced neuronal responses. Previously, models with and without top-down connections, including lexical-semantic, pre-lexical (e.g., orthographic/phonological), and visual processing levels were successful in accounting for these facilitation effects. Here we systematically assessed context-based facilitation with a repetition priming task and explicitly dissociated pre-lexical and lexical processing levels using a pseudoword (PW) familiarization procedure. Experiment 1 investigated the temporal dynamics of neuronal facilitation effects with magnetoencephalography (MEG; N = 38 human participants), while experiment 2 assessed behavioral facilitation effects (N = 24 human participants). Across all stimulus conditions, MEG demonstrated context-based facilitation across multiple time windows starting at 100 ms, in occipital brain areas. This finding indicates context-based facilitation at an early visual processing level. In both experiments, we furthermore found an interaction of context and lexical familiarity, such that stimuli with associated meaning showed the strongest context-dependent facilitation in brain activation and behavior. Using MEG, this facilitation effect could be localized to the left anterior temporal lobe at around 400 ms, indicating within-level (i.e., exclusively lexical-semantic) facilitation but no top-down effects on earlier processing stages. Increased pre-lexical familiarity (in PWs familiarized utilizing training) did not enhance or reduce context effects significantly. We conclude that context-based facilitation is achieved within visual and lexical processing levels. Finally, by testing alternative hypotheses derived from mechanistic accounts of repetition suppression, we suggest that the facilitatory context effects found here are implemented using a predictive coding mechanism.


Introduction
Efficient reading relies on automatized visual word recognition (Rayner, 1998), which in turn involves visual-perceptual, pre-lexical orthographic and phonological, and subsequent lexical-semantic processing levels (Coltheart et al., 2001;Carreiras et al., 2014). Efficiency in reading depends mainly on our familiarity with the units of language (Zoccolotti et al., 2009;Gagl et al., 2015) and on facilitation that arises from the predictive nature of linguistic contexts during natural reading . Contextual facilitation results in reduced brain activation, most prominently of the N400, a component of the event-related brain potential (ERP) peaking ϳ400 ms after word onset. The N400 reduction has typically been interpreted as reflecting facilitated processing at the lexical-semantic level of linguistic representation (for review, see Lau et al., 2008;Kutas and Federmeier, 2011). In line with this assumption, computational models like the strictly bottom-up sequential model of Laszlo and Armstrong (2014) successfully account for context-dependent N400 reduction effects by allowing neuronal fatigue within processing levels. Alternatively, however, it has also been proposed that contextual information (e.g., at the lexical-semantic level) can facilitate earlier stages of word recognition in a recurrent, top-down manner (for review, see Carreiras et al., 2014). Figure 1A visualizes these competing accounts of context effects on word recognition. Thus, the current model architectures disagree on the implementation of contextbased facilitation as either within a processing level or top-down from higher processing levels.
Computationally, Laszlo and Armstrong (2014) implemented context-based facilitation by a fatigue mechanism, assuming that recently active neurons are less likely to fire again (Grill-Spector et al., 2006). However, findings from semantic priming (Lau et al., 2013b) indicate an alternative mechanism, i.e., predictive coding, which assumes the suppression of perceptual signals that are consistent with context-based internally generated expectations about upcoming input (Friston, 2005). According to this model, one processes only the residual, i.e., unpredicted part of the input, which accounts for increased N400 activation when contextual expectations are violated. A third alternative would be sharpening (Grill-Spector et al., 2006), which assumes a reduction of neuronal firing only when neurons code the input suboptimally, thereby increasing the reliability of dissociating between inputs (Blank and Davis, 2016;Richter et al., 2018). To date, we are not aware of a direct comparison of the possible mechanisms of context-based facilitation in visual word processing.
Separation of processing levels in visual word recognition research remains challenging. Experimental priming paradigms ensure a high degree of control over context factors, but require the matching of large numbers of psycho-linguistic parameters, which is difficult (Sassenhagen and Alday, 2016). Alternative regression-based accounts include these parameters as covariates, which can be realized in more natural contexts (i.e., sentence reading; Dambacher et al., 2006) but often demands large datasets (Dufau et al., 2015). We here propose that some of these problems can be ameliorated by using learning paradigms to increase familiarity and to associate information at different levels of linguistic representation with previously unfamiliar items [e.g., pseudowords (PWs): pronounceable non-words] in a controlled fashion (Taylor et al., 2011).
Using this strategy, we here dissociate between facilitation at pre-lexical and lexical levels of word processing. First, we matched pre-lexical characteristics (phonological length and orthographic familiarity) across words and PWs, ensuring comparable levels of pre-lexical processing difficulty between conditions (cf. Yarkoni et al., 2008). Second, we used a learning paradigm to increase prelexical familiarity of a subset of PWs (cf. Glezer et al., 2015) and measured highly time-resolved brain activation using magnetoencephalography (MEG; experiment 1). In a second, behavioral, experiment, we also included PWs to which meaning was associated. To manipulate context-based facilitation, we used repetition priming (for a discussion of priming as context manipulation, see De-Long et al., 2014). Figure 1B shows, in detail, the expected electrophysiological and behavioral responses reflecting context-based facilitation at visual, pre-lexical, and lexical-semantic processing levels. For example, we expect an interaction of lexical-semantic familiarity (presence vs absence of lexical-semantic information) and context (prime/without context vs target/with context) reflected by a stronger activation decrease (repetition suppression) for words in contrast to meaningless PWs (as shown by Almeida and Poeppel, 2013). If restricted to the N400 time window, this pattern would indicate that facilitation is implemented exclusively at the lexical-semantic level, whereas earlier effects would suggest top-down facilitation from lexical-semantic to earlier processing stages. Source localization of these effects will help to specify these conclusions further. Finally, we aim at clarifying the mechanistic implementation of context-based facilitation, by comparing predictions from predictive coding, sharpening, and fatigue (Fig. 1C).  Carreiras et al., 2014) including expectations of "when" (cf. Barber and Kutas, 2007) and "where" in the brain the respective processes are implemented. Gray lines symbolize potential implementations of context-based facilitation either as within-level mechanism assumed in strictly bottom-up accounts (Laszlo and Armstrong, 2014) or, additionally, as a recursive top-down influence on hierarchically lower processing levels. B, Schematic representation of expected neuronal repetition suppression (left and central panels) and behavioral priming effects (right) with likely modulations by pre-lexical and lexical-semantic familiarity. We expect a reduction of neuronal activation and response times between primes and identical targets. At the lowest, i.e., visual processing level, we expect no familiarity modulation since all letter strings were a priori visually similar (left panel). For PWs that were familiarized through repeated exposure but without learning a new meaning, we expect selectively stronger neural repetition suppression in a "pre-lexical time window" around 150 and 300 ms (central left panel). Finding this interaction additionally at earlier time points would be evidence for a facilitatory top-down influence of pre-lexical familiarity onto earlier visual processing stages. Note that as a manipulation check, we also expect that familiar PWs should elicit increased activation already at the prime at the expected regions (Gagl et al., 2016;Laszlo and Federmeier, 2014). At the level of lexical-semantic processing, we expect stronger repetition suppression for words compared to meaningless PWs (central right panel) at around 400 ms and, in case of top-down modulation, also in previous time windows. During prime processing, activation should be highest for words reflecting lexical-semantic processing (Rabovsky et al., 2012). In behavior, we expect stronger priming effects for words and familiar PWs compared to novel PWs, which would indicate that both pre-lexical and lexical-semantic familiarity increase context-based facilitation. C, Schematic visualization of expectations for fatigue, sharpening, and predictive coding mechanisms, shown for the N400 time window (i.e., lexical-semantic processing) and the contrast of words versus novel PWs. The left panel shows an activation pattern reflecting a fatigue mechanism. This account assumes that the more activation is elicited on the prime (i.e., more semantic processing), the more neurons are "exhausted", resulting in a stronger reduction for words versus PWs. Sharpening (second panel from left) expects a reduction of irrelevant (i.e., noisy) representations, thereby amplifying the signal. Consequently, neuronal repetition suppression should be weaker for words

Experiment 1: MEG
In the first experiment, we investigated pre-lexical versus lexical-semantic contributions to context-based facilitation in visual word recognition at a neuronal level, using MEG. Pre-lexical properties (orthographic Levenshtein distance/OLD20; Yarkoni et al., 2008) were matched between words and both PW groups (i.e., familiarized vs novel), so that a priori, comparable levels of pre-lexical familiarity should lead to similar levels of pre-lexical activation across all three stimulus groups (as expected from, e.g., implementation of the MROM model: Grainger and Jacobs, 1996). However, the familiarization training increases the pre-lexical familiarity with the trained PWs. We expected effects of pre-lexical familiarity, i.e., increased activation of event-related fields (ERFs) for familiar in contrast to novel PWs (Laszlo and Federmeier, 2014;Gagl et al., 2016), and lexical familiarity, i.e., increased activation for words in contrast to novel PWs (cf. Rabovsky et al., 2012), irrespective of context. As an effect on context, we expected stronger neuronal repetition suppression for familiarized PWs compared to novel PWs and words on the pre-lexical level (Fig. 1B, central left panel) and stronger repetition suppression for words in contrast to the two PW groups at the lexical level (Fig. 1B,central right panel). If we find this pattern, expected for pre-lexical processing, from 150 to 300 ms at, e.g., left fusiform regions, one can assume within-level contextbased facilitation. One could come to a similar conclusion for lexical processing when the interaction pattern described above is found within the N400 time window at, e.g., left anterior temporal regions. Finding a similar pattern at earlier time windows would indicate top-down context-based facilitation.

Participants
A total of 38 healthy native speakers of German (26 females, mean age 23.0 Ϯ 2.8 years, range: 18 -29 years) recruited from university campuses participated in familiarization procedures and MEG recordings and were included in the final sensor level analyses. All participants were right-handed as assessed by the Edinburgh Handedness Inventory (Oldfield, 1971), had normal or corrected-to-normal vision, and normal reading abilities as assessed with an adult version of the Salzburg Reading Screening (unpublished adult version of Mayringer and Wimmer, 2003). A total of 19 further participants were excluded at different stages of the experimental procedure, due to the following reasons: Low reading skills (i.e., reading test score below 16th percentile; N ϭ 5), insuffi-cient performance during PW familiarization (i.e., accuracy for to-be-familiarized PWs Ͻ50% in the final learning session; N ϭ 2), self-reported developmental speech disorder (N ϭ 1), technical artifacts during the MEG measurement (N ϭ 4), insufficient number of trials after artifact rejection (i.e., participants with Ͻ15 repetition trials in at least one condition, outliers defined as Ͼ1.5 times below the lower quartile range of the number of valid trials across all participants and familiarity conditions; N ϭ 2), contraindication to MEG measurement (N ϭ 1, participant with retainer which might cause artifacts in MEG data), or drop out by choice of participants (N ϭ 4, participants did not finalize the experimental procedure). All participants gave written informed consent according to procedures approved by the local ethics committees (University Clinic of Goethe University Frankfurt, application N°107/15; and Department of Psychology, Goethe University Frankfurt, application N°2015-229) and received 10 € per hour or course credit as compensation.

Stimuli and presentation procedure
Words and PWs consisted of five letters, with the first letter in uppercase following convention for German nouns. PWs were generated by the Wuggy software (Keuleers and Brysbaert, 2010), conserving the phonological (i.e., sub-syllabic) structure of the input words; all PWs were pronounceable. Estimates of word frequency and orthographic Levenshtein distance 20 (OLD20; Yarkoni et al., 2008) were based on the SUBTLEX-DE database (Brysbaert et al., 2011). A complete list of stimuli including estimated variables is available at https://osf.io/fc69p/.
Sixty German nouns (logarithmic word frequency: mean Ϯ SE ϭ 2.14 Ϯ 0.12, range: 0.00 -4.03) and 120 pronounceable PWs were presented twice during MEG acquisition. In addition, 80 catch trials were presented (see section "Repetition priming" below). PWs were divided into two sets of 60 items, such that both PW lists and the set of words were matched on orthographic similarity (OLD20; group means: Ϯ 1 SD: 1.825 Ϯ 0.013; 1.717 Ϯ 0.026; 1.743 Ϯ 0.027) and number of syllables (1.883 Ϯ 0.063; 1.95 Ϯ 0.028; 1.933 Ϯ 0.032; for stimulus characteristics, see Table 1). Despite the high similarity of the word characteristics between groups, these characteristics were included in all post hoc linear mixed models (LMMs) to account for potential confounds from the parameters (for details, see analysis section). Participants were familiarized with 60 PWs before the actual repetition priming task was conducted in the MEG (see section "PW familiarization" below for details). The second group of PWs was never seen by the participants before the MEG continued reflecting a focus of processing resources on informative word representations and, thus, should reduce activations selectively for PWs (Kok et al., 2012). Predictive coding (two right panels) assumes a suppression of expected signals from the input. Thus, one would expect a suppression of the predictable signal rather than of the noise (Blank and Davis, 2016). For words, the additional lexical information can be used to predict the future target, resulting in stronger repetition suppression compared to PWs. The similar patterns for fatigue and predictive coding can be differentiated based on the probability with which identical prime-target pairs occur (Grotheer and Kovács, 2014), which we implemented in experiment 2. For predictive coding, one would expect stronger repetition suppression with higher repetition probability, whereas the fatigue account makes no differential predictions depending on repetition probability. New Research experiment. In addition, four further lists of 120 PWs each were generated as fillers for the familiarization procedure (one list per session).
Stimuli were presented using Experiment Builder software (SR-Research Ltd.). Words and PWs were presented in black bold Courier New font (14 pt.) in front of a white background. In the behavioral sessions, stimuli were presented on an LCD monitor with a refresh rate of 60 Hz, while during the MEG session, stimuli were projected with a refresh rate of 60 Hz onto a translucent screen.

PW familiarization
Participants visited the lab on the two days before the MEG experiment, and during each visit completed two familiarization sessions of ϳ20 min in length. The two previous days were chosen to take advantage of sleep consolidation effects (James et al., 2017). Each familiarization session started with reading aloud the PWs from a printed list. Reading errors were documented (mean across all sessions: 0.7%). Subsequently, participants performed a computer-based old/new recognition task in which the to-be-familiarized PWs were presented two times per session, randomly intermingled with a new set of 120 filler PWs for every session (total of 480 filler PWs across all four sessions). For every PW, participants had to indicate by button press as fast and accurately as possible, if it was familiar to them or not. PWs were preceded by two black vertical bars displayed above and below the center of the screen where participants were asked to fixate (500 ms; Fig. 2A), and presentation was terminated with the button press. LMM analyses with session (centered and z-transformed) as fixed effect and participant and item as random effects on the intercept were performed with the lme4 package (Bates et al., 2015) in R, version 3.4.1, -06-30 (R Development Core Team, 2008. All effects with t Ͼ 2, reflecting that the effect differs from zero by more than two SEs, were considered significant (note that p values cannot be computed in a reasonable way in the LMM approach; Kliegl et al., 2011). Note that for one participant, data of sessions 3 and 4 could not be saved due to technical issues. Old/new response sensitivity indices d' (Green and Swets, 1966) significantly increased across familiarization sessions from 1.15 in session 1 to 2.96 in session 4 (estimate ϭ 0.66, SE ϭ 0.039, t ϭ 17.17; pairwise tests between subsequent sessions: all ts Ͼ 5; Fig. 3A; for details, see Extended Data Fig. 3-1). This demonstrated that participants improved across sessions in distinguishing between familiar and novel (i.e., filler) PWs. Participants reached a high performance in the final session, with accuracies ranging between 70.0% and 99.2% for familiar and between 59.2% and 99.2% for filler PWs (Fig. 3B). Based on the strong improvement in sensitivity and the high performance in the final session, we conclude that pre-lexical familiarization of the trained PWs was successful.

Repetition priming
The repetition priming task during MEG recording was conducted on day 3 and included words, familiarized PWs, and novel PWs. At the start of each trial, participants had to fixate between two vertical black bars presented above and below the center of the screen (analogous to the familiarization procedure; cf. Fig. 2B). Stimulus presentation was initiated after an eye-fixation to the cued region was detected by an MEG compatible eye-tracker (Eyelink CL 1000, SR Research Ltd.), and comprised the successive presentation of two letter strings (prime and target) for 800 ms each, separated by an interval of 800 ms during which a string of five hash marks was shown. Both letter strings had to be read silently; the task served only to maintain attention and required a button press to catch trials, i.e., the word "Taste" (English: button) in either the first, second, or both positions. The silent read-ing task was chosen to avoid contaminating the neuronal response to words with motor responses; catch trials were excluded from analysis. The explicit fixation control before stimulus presentation assured that eyes were open Figure 2. Experimental procedures. A, For the PW familiarization procedure of experiment 1, in each learning session, 60 PWs were presented until response, intermingled by novel filler PWs, in an old/new recognition task; 500 ms before stimulus onset, two vertical bars indicated the center of the screen where participants were asked to fixate. The intertrial interval was 2000 ms. B, During MEG recording, participants performed a repetition priming task. Each trial consisted of a sequence of two letter strings (prime and target) presented for 800 ms each, separated by an interval of 800 ms during which a string of five hash marks was presented. Letter strings could be words, familiarized PWs, or novel PWs (120 trials each regarding the prime); 75% of trials were repetition trials, i.e., prime and target were identical (left). The remaining 25% were non-repetition trials in which two different letter strings were presented (middle). In this case, prime and target could be from the same condition or from two different conditions, with all combinations of conditions appearing equally often. Participants were instructed to silently read presented letter strings and respond only to rare catch trials (right). Before onset of the prime, two black vertical bars presented for 800 -1000 ms indicated the center of the screen where participants were asked to fixate. After presentation of the target, two gray vertical bars were presented for 1000 ms, indicating a blinking period of 1500 ms starting from onset of the bars. Before the onset of the next trial, a blank screen was presented for the remainder of the blinking period. C, In experiment 2, a paired-association task was used for familiarization of PWs with and without semantics. PWs were presented for 800 ms, followed by the presentation of an object image until button press (maximally 1500 ms). During the intertrial interval of 1000 ms, two vertical bars indicated the center of the screen where participants were asked to fixate. In the semantic condition, there was a reliable association between object and PW. In the familiarization only condition, in contrast, PWs and objects were randomly paired so that each pair occurred only once. D, In the subsequent naming task, each PW from the familiarization conditions with and without semantic associations was presented once. Participants named the object they associated with each PW, or responded "next" in case they did not associate a meaning with a PW. Before each PW presentation, two vertical bars framing the center of the screen were presented until button press by the experimenter. E, The repetition priming task involved in each trial a sequence of two letter strings presented for 800 ms each, separated by an interval of 800 ms during which five hash marks were displayed. The hash mark string was also presented for 800 ms before the onset of the first letter string. Letter strings could be words, familiarized PWs with and familiarized PWs without semantics, or novel PWs (180 trials each regarding the prime). Repetition probability was varied across blocks between 25%, 50%, and 75%. Participants were instructed to silently read presented letter strings and respond to the target whether they had an explicit semantic association with it, or not. During the intertrial interval of 800 -1200 ms, two vertical bars indicated the center of the screen where participants were asked to fixate. and directed toward the position where the stimulus was presented. Response hands were counterbalanced across participants and responses were recorded using a fiber optic response pad (LUMItouch; Photon Control Inc.); 100 ms after target offset, gray vertical bars were presented for 100 ms, indicating that participants were allowed to blink for a period of 1000 ms. Stimuli were presented at a viewing distance of 51 cm yielding horizontal visual angles of ϳ0.3°p er letter. The 60 letter strings per condition (words, familiar, and novel PWs) were each presented in two trials, once during each half of the experiment. As a consequence, we presented 120 trials per stimulus condition adding up to 360 trials; 75% (i.e., 270) of these trials were repetition trials, allowing the investigation of familiarity effects in a highly predictive context. The remaining 25% (i.e., 90) trials were non-repetition trials, in which each possible combination of words, familiarized, and novel PWs appeared equally often, i.e., 10 times. Also, we presented 80 catch trials resulting in a total of 440 trials. The repetition priming task lasted ϳ40 min, divided into three blocks separated by breaks of ϳ5 min.

MEG data acquisition
MEG datasets were acquired in accordance with guidelines for MEG recordings (Gross et al., 2013), using a 275 sensor whole-head system (Omega 2005; VSM MedTech Ltd.). Six sensors (MLF66, MLP31, MRF22, MRF24, MRO21, and MZC02) were disabled due to technical issues so that 269 sensors remained for data acquisition. Data were recorded at a sampling frequency of 1200 Hz using a synthetic third-order gradiometer configuration. Online filtering was performed with fourth-order Butterworth filters with 300-Hz low pass and 0.1-Hz high pass. Head positions of the participants relative to the gradiometer array were recorded continuously by three localization coils placed at the nasion and above both ear canal entrances using ear-plugs. Additionally, two electrodes placed centrally on each clavicula recorded an electrocardiogram (ECG), while two pairs of electrodes placed distal to the outer canthi of both eyes, and above and below the right eye, respectively, recorded an electrooculogram (EOG). The impedance of each electrode was below 5 k⍀ for EOG electrodes and below 20 k⍀ for ECG electrodes, measured with an electrode impedance meter (Astro-Med GmbH).

Structural magnetic resonance (MR) image acquisition
Structural MR images were acquired for 34 participants with a 1.5 T Siemens magnetom Allegra scanner (Siemens Medical Systems) using a standard T1 sequence (3D MPRAGE, 176 slices, 1 ϫ 1 ϫ 1 mm voxel size). To enable co-registration of MR images with MEG data, vitamin E capsules were placed at the positions of two of the MEG head localization coils (i.e., above both ear canal entrances using ear-plugs); the nasion could be identified anatomically in structural MR images. Fiducial coordinates were identified in SPM12 (http://www.fil.ion. ucl.ac.uk/spm/software/spm12/).

MEG sensor level analyses
MEG data were analyzed with FieldTrip (version 2011 11-21 for preprocessing and version 2013 01-06 for all remaining sensor level analyses; http://fieldtrip.fcdonders.nl; Oostenveld et al., 2011) under MATLAB (version 2012b, The MathWorks Inc.), except for Figure 4A,B, which was realized with MNE-Python (https://martinos.org/mne/stable/index. html; Gramfort et al., 2013Gramfort et al., , 2014. Parallel computations were performed using GNU parallel (Tange, 2011). Catch trials and any other trials during which participants made a button press were excluded from analysis. MEG data were segmented into epochs of 2600 ms in length, lasting from -160 to 2440 ms with respect to the onset of the prime.
Individually for each participant, trials were selected for analysis in which the head position fell within a range of 5 mm (across all blocks) relative to the majority of other trials. Trials contaminated with sensor jump and muscle artifacts were rejected automatically, using the FieldTrip routine for automatic artifact detection. For jump artifact detection, a 9 th order median filter was applied to the data, while for muscle artifact detection, an 8th order Butterworth IIR filter (110 -140 Hz) was applied. The filtered data were z-transformed and averaged across sensors. Trials were rejected if for any time point the z value exceeded a threshold of z ϭ 20 for jump artifacts and z ϭ 6 for muscle artifacts, following standards established for the local measurement characteristics. Trials contaminated with eye blink, eye movement, or heartbeat artifacts were cleaned using independent component analysis (ICA; Makeig et al., 1996). Components whose time courses correlated with EOG and ECG electrodes were rejected, using as threshold a correlation coefficient of r Ͼ 0.1, which sufficiently removed artifacts according to visual inspection. After these procedures, an average of 51.2 repetition trials (range: 26 -79) per condition could be retained. Non-repetition trials were averaged across con-ditions for analysis, with on average 52.6 trials per participant available (range: 29 -80).
Before computation of ERFs, a 20-Hz low-pass filter was applied to data epochs to increase the signal-tonoise ratio (cf. Lau et al., 2013a). In addition, to ensure that the low filter did not mask more transient components, we performed the main analyses after filtering at 40-Hz low pass. Original epochs were split into separate epochs for prime and target stimulus, ranging from -110 to 800 ms with respect to each stimulus onset. Epochs were baseline corrected by subtracting the average activation between -110 and -10 ms from each time point. For each sensor, we identified the participants for which the recorded magnetic field averaged across repetition trials and time lay outside the range of the mean across all participants Ϯ3.29 SDs; i.e., detecting extreme outliers outside the 99.9% confidence interval. The signal of these noisy sensors (33 sensors in total; one to nine sensors in ten participants), per participant, was approximated by trial-wise interpolation from activation in neighboring sensors. ERFs were then calculated for each subject and condition (repeated words, repeated familiar PWs, and repeated novel PWs, as well as non-repetition trials combined across all familiarity conditions), separately for prime and target, by averaging the epochs across all trials. ERFs were compared between conditions using cluster-based permutation tests (Maris and Oostenveld, 2007) for dependent samples, corrected for multiple comparisons across time points (-110 to 800 ms) and sensors at cluster level. To compute interaction statistics, we used the permANOVA functions by Helbling (2015; https:// github.com/sashel/permANOVA/). Clusters were defined as spatially and temporally adjacent samples with F values exceeding an uncorrected ␣-level of 0.001 (cf. Eklund et al., 2016). The cluster-level statistic was calculated using the standard approach, i.e., taking the sum of F values within a cluster (Maris and Oostenveld, 2007). Empirical cluster-level statistics were compared to the distribution of cluster-level statistics obtained from Monte Carlo simulations with 5000 permutations, in which condition labels were randomly exchanged within each subject. Original cluster-level statistics larger than the 95th percentile of the distribution of cluster-level statistics obtained in the permutation procedure were considered to be significant. Note, we use the terms "significant time window/sensors" for convenience. However, we are aware that the temporal and spatial extents of clusters obtained with the permutation procedure are subject to variations based on the signal-to-noise ratio, number of trials, and the selected cluster threshold. As a consequence, we will not interpret the exact values precisely and rather focus on the condition differences within the obtained clusters.
First, as a general check of our experimental manipulation, we assessed context effects by computing a 2 ϫ 2 interaction between the experimental factors repetition congruency (repetition vs non-repetition trials, reflecting whether context-based processing was indeed possible; referred to as R) and prime/target effect (reflecting the absence vs presence of a preceding context; referred to as C). As the low number of non-repetition trials did not allow separate analyses of these effects for the different conditions, data were pooled across familiarity conditions (words, familiar, and novel PWs). Within each familiarity condition, the number of repetition trials was randomly stratified to match the number of non-repetition trials. In the second analysis, we examined in repetition trials how familiarity (words/lexical familiarity vs familiar PWs/prelexical familiarity only vs novel PWs) and context (prime vs target) interact. This analysis served to examine the effects of different familiarity types on the neuronal repetition effect.
To determine the nature of significant interaction and main effects, we performed post hoc LMM analyses for pairwise differences between relevant conditions. All post hoc tests were based on participant-specific and condition-specific ERF values averaged across sensors and time points from the respective significant cluster, and included participant and item as random effects on the intercept. Since not all trials entered the analyses due to exclusion of artefactual trials, which might have affected the matching across letter string conditions, OLD20 and number of syllables, both z-transformed and centered, were entered as additional fixed effects.
To rule out the possibility that our baseline correction approach, i.e., using separate baselines for ERFs elicited by prime and target stimulus in a trial, has created artificial effects due to the presentation of hash marks only before the target, we performed the analyses of repetition congruency by prime/target and familiarity by prime/target a second time, using the period before the prime as a common baseline for correction of ERFs to both stimuli. Of in total 24 significant clusters from the analyses after separate baselining, 17 were also found significant in the analysis after common baselining. Therefore, in the results and discussion sections, we will focus on those clusters replicated with the common baseline approach. Clusters with durations Ͻ 30 ms were not interpreted (cf. Dikker and Pylkkänen, 2013, for a similar approach), which led to the rejection of three clusters of ϳ20 ms. A comparison of significant clusters from both analyses can be found in Extended Data Figures 4-1, 5-1.
As a further sanity check of the separate baselines approach, we additionally report a peak-to-peak analysis for the repetition by familiarity interactions as well as main effects. For this analysis, the positive (in case of right sensors) and negative (in case of left sensors) peaks of the ERFs were identified per participant, condition, and sensor (restricted to the time window Ϯ150 ms around the peak latency of grand average ERFs and the interval between 0 and 500 ms). In case of central sensors close to the midline (sensors MZC01, MZC03, MZC04, MZF01, MZF02, MZF03, MZO01, MZO02, MZO03, and MZP01), we separately decided whether to select the positive or negative peak, depending on which of the two peaks was absolutely higher in the across-participant ERFs. We decided against taking this approach in the majority of sensors because the ERFs typically declined during later time windows, in many cases reaching a value higher than the actual peak in absolute terms. Therefore, selecting the positive peak in the case of right sensors, and the negative peak in the case of left sensors, was the best compromise between automatic peak determination and avoidance of misplacing the actual peak value with a value that falls within the time range of decline of the ERF. We then subtracted the preceding peak value of respective other polarity (between stimulus onset and detected peak) from the already defined peak value. Statistical analyses were then performed on the absolute peak difference, using the cluster-based permutation procedure as described above, defining clusters solely based on spatial adjacency between sensors due to the lack of the temporal dimension. Given its independence from the pre-stimulus baseline, hash mark strings presented before the target cannot influence this analysis. However, a limitation of this analysis is that it cannot detect significant differences occurring at time ranges before and after the peak. Therefore, the results of this analysis did not influence whether a specific cluster was interpreted or not.

MEG source localization
Source localization was performed for those 34 participants of whom we could obtain anatomic MR images, using FieldTrip, version 2016 10-24. We created individual source grids for each participant by transforming the anatomic MR images to a standard (i.e., MNI space; Collins et al., 1994) T1 template from the SPM8 toolbox (http://www.fil.ion.ucl.ac.uk/spm). A regular 3D dipole grid (10-mm resolution) based on the T1 template was then warped with the inverse of the resulting transformation matrix. This procedure resulted in individual dipole grids for each participant, in which each grid point was located at the same brain area across participants. For each grid point and participant, lead fields were computed with a single shell forward model of the inner surface of the skull (Nolte, 2003). Before source localization, ERFs were down-sampled to 300 Hz to minimize computing costs. Source locations were computed for significant contrasts of interest from the ERF statistics as proposed by Gross et al. (2013). The procedure followed Manahova et al. (2018; the original code is available at https://data.donders.ru.nl/collections/di/dccn/DSC_3018012.15_439?0; see also Dijkstra et al., 2018;Mostert et al., 2018). Source localization was performed on ERFs by estimating twodimensional dipole moments at each grid and time point using linearly constrained minimum variance (LCMV) spatial filters (van Veen et al., 1997). For the main prime/target effect, source localization was performed on the ERF difference between prime and target stimulus, averaged across all knowledge conditions in repetition trials. For main effects of familiarity, source localization was performed on the respective conditions separately and subtracted afterward (e.g., familiar -novel PWs averaged across prime and target). Interactions between repetition congruency and prime/target effect were resolved by performing source localization on the difference ERFs between stratified repetition and non-repetition trials, separately for prime and target, and then subtracting source activations of the prime from those of the target. Analogous, for interactions between prime/target and fa-miliarity effects source localization was performed on the difference ERFs between prime and target, separately for each letter string condition, and then subtracted. The data covariance was estimated over the time interval of the respective significant contrast and regularized using shrinkage (Blankertz et al., 2011) with a regularization parameter of 0.01. Two-dimensional dipole moments were reduced to a scalar value by taking the norm of the vector. This value reflects the contribution of a particular source location to sensor level activation not only in magnitude but also in dipole orientation. The latter is crucial for also detecting differences caused by different neuronal populations within the same source location. The norm of the vector is a positive value and subject to a positive bias due to noise. To counter this bias, we employed a permutation procedure (1000 permutations). For analyses on separate conditions, the sign of half of the trials was randomly flipped. For analyses on condition differences, condition labels were randomly exchanged across trials. The square of the dipole's norm averaged across all permutations was taken as noise estimate. This noise estimate was subtracted from the square of the true data, and the data were divided by the noise estimate. Negative values were Topographies represent activation at 400 and 2000 ms, which allows a comparison of activation after 400 ms of the onset of prime and target, respectively. C, Topographical map represents F values of significant sensors averaged across the significant time window. Non-significant sensors are set to zero. Surface plot represents source locations of the effect in signal-to-noise ratio (SNR) thresholded at 50%. D, ERF time course averaged across significant sensors (of left hemisphere only, shown in topography in C). The significant time window is marked by a yellow shaded black box. Red lines correspond to prime and blue lines to target; solid lines correspond to repetition (Rep) and dashed lines to non-repetition (Non-Rep) trials (averaged across all familiarity conditions). E, Boxplots represent activation averaged across sensors and time points within the left hemisphere cluster, for repetition trials (left) and non-repetition trials (right). Colored dots and lines represent individual participants. Asterisks indicate significant results (t Ͼ 2) from post hoc LMMs. set to zero, and the square root was taken. Finally, the signal of each source location was normalized by its variance to counter the depth bias. For visualization, source locations thresholded at 50% of the maximum source activation were plotted on cortical surfaces using the nilearn package (Huntenburg et al., 2017) in Python. Brain regions were identified from the MNI coordinates of source maxima using the Harvard-Oxford cortical structural atlas (Desikan et al., 2006).

Analysis code and data accessibility
The raw data, stimulus lists, and analysis code of both experiments described in the article are freely available online at https://osf.io/fc69p/.

Results
During the MEG measurement, participants correctly identified 94% of catch trials, indicating that they were attending to the presented letter strings.

Repetition suppression phenomenon
As manipulation check of context effects, we investigated the interaction between prime/target and repetition congruency (repetition vs non-repetition trials) effects, combined over all familiarity conditions. Repetition trials (Fig. 4A) but not non-repetition trials (Fig. 4B) showed repetition suppression, i.e., reduced activity at the target stimulus (around 2000 ms into the trial or 400 ms after onset of the target word). This interaction was significant at bilateral frontal sensors in the N400 time window (280 -550 ms post-stimulus onset). Source localization revealed most prominently the left superior temporal gyrus (peak activation, extending into anterior temporal cortex albeit with weaker activation), as well as left occipital pole, left inferior occipital cortex, and the junction of left middle temporal, angular, and supramarginal gyri (Table 2; Fig.  4C,D). The interaction reflected a significant decrease from prime to target in repetition trials (post hoc LMM: estimate ϭ 2.22e ؊14 , SE ϭ 0.19e ؊14 , t ϭ 11.61; Fig. 4E, left) and a significant but descriptively weaker increase in non-repetition trials (post hoc LMM: estimate ϭ -0.58e ؊14 , SE ϭ 0.20e ؊14 , t ϭ 2.94; Fig. 4E, right; see also Table 3). This replication of established repetition suppression effects (Deacon et al., 2004;Summerfield et al., 2011) is an important prerequisite for our main analyses investigating the effect of different familiarity conditions on repetition suppression. Sources of the effect were also compatible with previous localizations of the N400 within superior temporal gyrus (Helenius et al., 1998;Vartiainen et al., 2009), and anterior temporal cortex (Lau et al., 2013aLau and Nguyen, 2015).

Prime/target effects
We here refer to context effects reflected in repetition suppression from prime to target in repetition trials only. Such effects were found in multiple time-windows, spanning time ranges from 100 to 690 ms after stimulus onset ( Fig. 5A-D). All effects were localized to bilateral occipital cortices (see example of the earliest cluster C1 in Fig. 5A and source coordinates in Table 2). See also Figure 5D for the sensor topographies of clusters C2-6. The majority of these clusters were not modulated by familiarity; only the frontal cluster C3 was qualified by a significant interaction with familiarity, indicated by the overlap in time and space with the interaction cluster (see section "Pre-lexical and lexical familiarity effects on the repetition suppression phenomenon" below). Activation refers to peak activation in 10 Ϫ9 signal-to-noise ratio.

Familiarity effects
To reiterate, due to our stimulus selection procedure, words should have comparable levels of pre-lexical (orthographic/phonological) familiarity but higher lexical familiarity (i.e., lexical-semantic information associated with words) than novel PWs. In contrast, as a result of the familiarization training, familiarized PWs should have higher levels of familiarity than novel PWs specifically at the level of pre-lexical processing. Thus, we had assumed that effects of pre-lexical familiarity should be reflected in ERF differences between familiar PWs and both novel PWs and words, while effects of lexical familiarity should be reflected in ERF differences between words and both novel and familiar PWs (cf. Fig. 1B for visualization of these hypotheses). Differences between the familiarity conditions, averaged across prime and target (i.e., representing main effects of familiarity), occurred at two topographic clusters: At left posterior sensors, localized to the left angular/supramarginal gyrus ( Fig. 5E; Extended Data  Table 2), familiar PWs elicited a more negative ERF amplitude between 290 and 380 ms than both words and novel PWs (Fig. 5F,G; Table 4; Extended Data Table  4-1). From the more negative ERF response specific to familiar PWs, we can conclude that our familiarization procedure was successful in modulating the neurophysiological processing of these PWs. The second cluster, again, occurred at frontal sensors and was qualified by a significant interaction with context (see "Pre-lexical and lexical familiarity effects on the repetition suppression phenomenon" below). The main effect in this cluster is therefore only visualized in Extended Data Figure 5-3 (see also Table 4 and Extended Data Table 4-1) and not discussed further.

Pre-lexical and lexical familiarity effects on the repetition suppression phenomenon
We had expected that context effects reflected in repetition suppression for target relative to the identical prime be differentially influenced by pre-lexical versus lexical familiarity (cf. Fig. 1B for visualization of these hypothe-ses). To assess the influence of familiarity on context effects, we examined the familiarity (words vs familiar vs novel PWs) by prime versus target interaction in repetition trials (as only these included a valid predictive context for the target). We found a significant interaction between 300 and 480 ms at bilateral frontal sensors (Fig. 5H,I; see also Table 5 for a post hoc statistic controlling for OLD20 and number of syllables). Post hoc LMMs revealed that lexical but not pre-lexical familiarity reliably modulated the repetition effect: While during prime presentation the negative-going ERF amplitude was largest for words (words vs novel PWs: estimate ϭ 2.18e Ϫ14 , SE ϭ 0.47e Ϫ14 , t ϭ 4.67; words vs familiar PWs: estimate ϭ 3.11e Ϫ14 , SE ϭ 0.41e Ϫ14 , t ϭ 7.56; no difference between familiar and novel PWs: estimate ϭ -0.71e Ϫ14 , SE ϭ 0.40e Ϫ14 , t ϭ 1.76), it was smallest for words during target presentation (words vs novel PWs: estimate ϭ -1.20e Ϫ14 , SE ϭ 0.37e Ϫ14 , t ϭ 3.28; words vs familiar PWs: estimate ϭ -0.66e Ϫ14 , SE ϭ 0.37e Ϫ14 , t ϭ 1.81; familiar vs novel PWs: estimate ϭ -0.69e Ϫ14 , SE ϭ 0.33e Ϫ14 , t ϭ 2.06; see also Fig. 5J and Extended Data Table 5-1). Repetition suppression, thus, was stronger for words than for PWs. This effect was localized to the left anterior temporal cortex ( Fig. 5H; Extended Data Fig. 5-2B; Table 2).

Control analyses
To assess potential influences from low-pass filtering, we performed an additional control analyses with a 40-Hz low-pass filter. Results did not differ qualitatively from the results described in the previous sections obtained with a 20-Hz low-pass filter (Extended Data Fig. 5-4).
We also evaluated the robustness of effects against different choices of baselines by interpreting only those clusters that were significant when prime and target activation were both corrected with the baseline before the prime instead of separate baselines. In addition, we performed a peak-to-peak analysis (cf. Materials and Methods for further details). Significant results from the peakto-peak analysis strongly support the interaction between prime/target and familiarity effects at left frontal sensors,  Figure 5E-G and Extended Data Figure 5- the main effects of familiarity at left posterior and left frontal sensors, as well as main effects of prime versus target at bilateral frontal sensors, resembling the effects of clusters CxF, F1, F2, and C3 in Figure 5 and Extended Data Figure 5-4 (Extended Data Figs. 5-1, 5-5, including also further clusters from the peak-to-peak analysis). Due to the high similarity between standard baseline corrected ERF analysis and peak-to-peak analysis, we conclude that the presented results can be reproduced with a different analysis strategy and therefore are not artificially introduced by the specific choice of baseline correction.

Discussion Experiment 1
The main finding from experiment 1 (MEG study) was that only lexical familiarity interacted with context (here implemented as contrast prime vs identical target) to facilitate visual word recognition at the N400 window only. Please note that in the following, for the sake of brevity, we will subsume the processing of words and PWs under the term visual word recognition, as we assume similar pre-lexical processing for words and novel PWs reflecting the orthographic familiarity (OLD20) match. The finding of stronger repetition suppression for words was consistent with previous studies (Fiebach et al., 2005;Almeida and Poeppel, 2013; but see Deacon et al., 2004;Laszlo and Federmeier, 2007;Laszlo et al., 2012) and identified sources of the effect were compatible with previous localizations of the N400 within the anterior temporal cortex (Lau et al., 2013aLau and Nguyen, 2015). In contrast, we could not identify a pre-lexical modulation (i.e., an increased reduction of activation for familiarized PWs) at any time window. However, we observed a more negative-going amplitude for familiarized PW in contrast to novel PW and words, which is indicative of a pre-lexical familiarity influence irrespective of context. This indicates that our explicit manipulation of pre-lexical familiarity was successful. Also, we found that the pre-lexical familiarity effect was present at the end of the expected time window (Barber and Kutas, 2007) and localized to the left angular/supramarginal gyrus, indicating phonological processing Carreiras et al., 2014). We also found strong context effects without modulation of prelexical and lexical familiarity. Earliest the effect was present around 100 ms in the occipital cortex. The first presentation elicited a much stronger N100 response compared to the second presentation in repetition trials. In sum, we found an interaction of lexical familiarity and context in the N400 time window but only a main effect of pre-lexical processing.
The interaction of lexical familiarity and context at the N400 indicated within-level context-based facilitation at the lexical level. We related this interaction to lexical level processing since words, i.e., meaningful stimuli, differentiated from PWs, i.e., meaningless stimuli. This is in line with previous work (i.e., priming or sentence paradigms: Rugg, 1985;Simos et al., 1997;Helenius et al., 1998;Halgren et al., 2002;Lau et al., 2009Lau et al., , 2013aVartiainen et al., 2009) associating this time window and brain locations with lexical-semantic processing. Within-level facilitation was indicated by the finding that the interaction was selective for the N400 time window without indications of interactions at previous time windows. Finally, regarding the mechanistic implementation, the interaction pattern was in line with the expectation of fatigue and predictive coding (Fig. 1C). This is as sharpening proposes a suppression of the noise in the neuronal signal. Therefore, the difference between words and PWs should be easier to detect at the target (i.e., stronger difference; Kok et al., 2012), which was not the case.
Nevertheless, due to the nature of the task in experiment 1, we could not investigate whether familiaritybased and context-based brain activation differences translate to behavior. Also, we cannot rule out one further potentially confounding influence, i.e., that words and PWs differ not only in the availability of lexical-semantic knowledge but also qualitatively with respect to their word status; i.e., although the familiarity of some PWs was temporarily enhanced by the training procedure, the expertise with actual words may be qualitatively different. Moreover, the new experiment included a repetition probability manipulation (i.e., the likelihood of prime and target being the same letter string) that would allow differentiating between fatigue and predictive coding, since only the latter would predict a systematic influence of repetition probability across trials. Finally, in experiment 1, we were surprised that no evidence for an influence of pre-lexical familiarity on context-based facilitation was found. To replicate the MEG pattern in behavior and systematically examine the role of word status and implement a repetition probability manipulation, we ran a second, behavioral, repetition priming experiment.

Experiment 2
In experiment 2, we implemented the explicit investigation of word status by adding a third group of PWs. With a paired association task, we associated semantic content to these non-words. Similar as in experiment 1, we also included familiar PWs without meaning. Note, PWs with and without semantic associations were visually/ perceptually familiarized to the same degree. Therefore, the two groups of PWs only varied regarding their associated semantic meaning. Including this additional lexical familiarity condition allowed us to examine potentially different roles of word status and the presence of semantic associations. We measured behavioral response times in a repetition priming paradigm. Participants had to indicate whether or not a letter string had a semantic association. A yes response would be valid for words and familiarized PWs with semantic associations, but not for novel and only perceptually familiarized PWs.
As stated in the discussion of experiment 1, we also implemented a repetition probability manipulation (i.e., the likelihood of prime and target being the same letter string). Repetition probabilities varied across three blocks, to investigate whether the priming effect (i.e., faster responses for repeated vs non-repeated targets) increases when the local task context allows predicting that the prime is highly likely identical to the target. In previous studies with different visual stimuli higher repetition probability enhanced priming effects, mainly supporting predictive coding (Summerfield et al., 2008(Summerfield et al., , 2011Grotheer and Kovács, 2014;Olkkonen et al., 2017).

Participants
A total of 24 healthy native speakers of German recruited from university campuses (16 females, mean age 23.1 Ϯ 3.4 years, range: 19 -31 years, 22 right-handers) were included in the final data analysis. All participants had normal or corrected-to-normal vision, and normal reading abilities as assessed with the adult version of the Salzburg Reading Screening (unpublished adult version of Mayringer and Wimmer, 2003). Further participants were excluded at different stages of the experiment due to the following reasons: Low reading skills (i.e., reading test score below 16th percentile; N ϭ 4), insufficient performance during PW familiarization (i.e., accuracy for semantic or familiar PWs Ͻ50% in the final learning session; N ϭ 3), or failure to complete the experimental protocol (N ϭ 2). Four participants were excluded after data analysis due to insufficient performance (Ͻ25% correct for nonrepeated words). All participants gave written informed consent according to procedures approved by the local ethics committee and received 10 € per hour or course credit as compensation.

Stimuli and presentation procedure
A total of 60 German nouns (half natural and half manmade; logarithmic word frequency: mean Ϯ SE ϭ 1.93 Ϯ 0.09, range: 0.00 -3.30) and 180 pronounceable PWs with characteristics similar to experiment 1 were presented in a repetition priming task. PWs were divided into three sets, each of which was matched to the word set on orthographic similarity (OLD20, Yarkoni et al., 2008; words: 1.538 Ϯ 0.038; PWs: 1.605 Ϯ 0.032, 1.542 Ϯ 0.045, and 1.596 Ϯ 0.044) and number of syllables (1.833 Ϯ 0.059; 1.95 Ϯ 0.028; 1. 967 Ϯ 0.023; 1.9 Ϯ 0.039, respectively; Table 1). Participants were perceptually familiarized with one set analogous to experiment 1, and additionally learned semantic associations for a second set within a paired-association task (see section "PW familiarization" below for details). The third set of PWs was never seen by the participants before the priming task. For the familiarization procedure, two sets of 60 object images each were chosen from the Bank of Standardized Stimuli (BOSS; Brodeur et al., 2010Brodeur et al., , 2014 such that German object names assigned to the images were matched between the two sets for logarithmic word frequency (set means: 2.093 Ϯ 0.081; 2.070 Ϯ 0.077), OLD20 (set means: 1.639 Ϯ 0.054; 1.630 Ϯ 0.053), and number of syllables (set means: 2.000 Ϯ 0.071; 2.000 Ϯ 0.071). Object names were determined by having four independent participants write down for each object the name they considered most suitable; only objects for which at least three participants provided the same name were selected. The two sets of object images finally selected were matched on available ratings of familiarity (set means: 4.364 Ϯ 0.040; 4.333 Ϯ 0.043), object agreement (i.e., rated similarity between an object imagined by the participants on perceiving the object's name, and the actual object image; set means: 3.910 Ϯ 0.056; 3.901 Ϯ 0.064), and rated subjective visual complexity (set means: 2.426 Ϯ 0.058; 2.475 Ϯ 0.066; Brodeur et al., 2014), analogous to procedures reported by Breitenstein et al. (2007).
Six variants of the familiarization task were prepared, across which the assignment of the three PW sets to the familiarized, i.e., familiar versus semantic, as well as to the novel condition was varied (Table 6). In addition, the assignment of the two object image sets to the familiarized PWs with and without semantic associations was varied. Note that for 18 of the 24 participants, the six experimental versions, as well as the order of blocks and response hands in the repetition priming task (see "Repetition priming" below), were counterbalanced. In addition, six participants were included from the pilot investigation in which this was not the case (all had the same response hands and the initial block had a repetition probability of 25%). Results did not differ qualitatively when these participants were included or not. Stimulus presentation procedures were identical to those of behavioral sessions of experiment 1 ( Fig. 2A), with the exception that the background was set to gray.

PW familiarization
Participants performed five PW familiarization sessions in the course of three consecutive days, i.e., two sessions each on day 1 and 2, and one session on day 3 (before the repetition priming task). Each session lasted about 1 h, Novel ϭ PWs first shown in the repetition priming task. Participants refers to the number of participants assigned to each version. and participants could take a short break after the first half, as well as a mandatory 1-h break before the next session. Each session consisted of reading aloud each PW (mean error rate across sessions: 1.4%), a computerbased paired-association task with congruent versus incongruent parings of PWs and object images, and a naming task. While one set of PWs was familiarized prelexically as in experiment 1, i.e., merely through repeated exposure ("familiar PWs"), one set was additionally associated with semantic information ("semantic PWs"). The paired-association procedure was adapted from previous studies (Breitenstein and Knecht, 2002;Breitenstein et al., 2007;Dobel et al., 2010), however, using visual instead of auditory PWs and naturalistic photographs of objects instead of line drawings (see section "Stimuli and presentation procedure" above). Furthermore, we used an explicit instead of an implicit learning instruction to establish strong associations between PWs and the assigned meanings.
Familiar and semantic PWs were presented in random order for 800 ms, followed by an object image (horizontal and vertical visual angles 15.8°) for 1500 ms or until response (Fig. 2C). During the ITI of 1000 ms, two vertical black bars indicating the center of the screen where participants were asked to fixate were presented. Each PW was presented four times in the first and four times in the second half of each session (960 trials in total per session). Semantic PWs were arbitrarily but abovechance (i.e., six out of eight presentations) matched with object images so that participants could learn to associate their meaning over the course of the familiarization sessions. This ratio was chosen so that despite successful learning, false alarms could be investigated which provide important information on participants' sensitivity. In contrast, familiar PWs were followed by a different object image in each trial.
Participants were asked to learn a meaning for the presented PWs based on the frequency with which the PWs were paired with certain object images. They were explicitly informed about the inconsistent pairings for half of the PWs. Participants were instructed to silently read the presented PWs and to respond as accurately and quickly as possible, whether a presented object image matched the preceding PW or not. In addition, they were encouraged to guess if insecure. Participants responded by pressing one of two buttons on a keyboard with either the left or right index finger. To prevent potential response biases, the assignment of response hand and response varied from trial to trial (by presenting a red bar indicating non-match on one side and a green bar indicating match on the other side of the object image). In the first familiarization session, participants completed a short practice block of ten trials before the start of the actual pairedassociation task.
In the naming task (Fig. 2D), each PW from the pairedassociation task was presented once. Participants were instructed to name its associated object if an association could be retrieved, or to respond "weiter" (German for "next") whenever this was not possible. The experimenter wrote down the participants' responses and logged the three possible responses (correct, incorrect, next) into the presentation software. Responses were considered correct whenever a name suitable for the corresponding object was provided (e.g., "cabin" instead of "barn"). Participants did not receive feedback.
LMMs (including participant, object image and PW as random effects on the intercept; see experiment 1 Materials and Methods) revealed that d' for the pairedassociation task significantly increased across sessions from 0.41 in session 1 to 2.06 in session 5 (main effect of session: estimate ϭ 0.57, SE ϭ 0.033, t ϭ 17.11; Fig. 3C; see Extended Data Fig. 3-1 including post hoc analyses for pairwise sessions). This indicates that participants improved in identifying matching and non-matching PWobject combinations. In the final familiarization session, participants reached high mean accuracies of 93.17% (range: 74.72-99.72) for the identification of matching objects for semantic PWs, and 89.65% (range: 60.47-99.38) for the identification of non-matching objects for PWs familiarized without semantics (Fig. 3D). Importantly, participants also demonstrated high average accuracies of 95.51% (range: 88.14 -100) for semantic PWs in case they were presented with a non-matching object (Fig. 3D), indicating that their high performance for matching PWobject combinations cannot be attributed to a response bias, i.e., responding "match" whenever a semantic PW was presented.
In the PW naming task, which was administered at the end of each familiarization session, LMMs (including participant and item as random effects on the intercept) revealed that d' significantly increased from 0.23 in session 1 to 2.43 in session 5 (main effect of session: estimate ϭ 0.77, SE ϭ 0.040, t ϭ 19.41; Fig. 3E; see Extended Data Fig. 3-1 including post hoc analyses for pairwise sessions). In the final session, participants named the correct object for between 51.67% to100% of semantic PWs (mean 78.61; Fig. 3F, left) and refrained from a response for 61.67% to100% of PWs familiarized without semantics (mean 90.28; Fig. 3F, right), indicating that they indeed learned the corresponding meaning for semantic PWs.

Repetition priming
Following the fifth familiarization session on day 3, participants completed a repetition priming experiment after a break of at least 1 h. Experimental procedures were analogous to those described for experiment 1, with the following exceptions: Semantic PWs were presented as additional familiarity condition, and no catch trials were presented. The prime stimulus in each trial was preceded by 800 ms of hash mark presentation. The intertrial interval varied between 800 and 1200 ms. Furthermore, the repetition probability was varied across the three experimental blocks. 15 participants first completed a block with 25% repetition probability, followed by 50% in the second and 75% in the last block; the remaining nine participants completed the blocks in the reverse order. Participants were informed about the repetition probabilities at the start of each block. Their task was to silently read the presented letter strings and respond as accu-rately and quickly as possible to the second letter string in each trial, whether they could explicitly associate a meaning or not (button presses on a keyboard with left/right index finger; dominant vs non-dominant hand for yes response: 13 vs 11 participants, respectively). This task was chosen to elicit the same response for semantic PWs as for words. Each letter string (i.e., word or PW) was presented once per block, either in the repetition or in the non-repetition condition. In total, 240 trials (60 per condition) were presented in each block. Letter strings were used at maximum twice for non-repetition trials; in this case, they were combined with two different letter strings. Before the task, eight practice trials were completed. The total duration of the priming task was around 45 min.

Analyses
Analogous to the analysis of the PW familiarization procedure, behavioral data of the repetition priming task were analyzed using LMMs allowing random effects of both participant and items (prime and target stimulus) on the intercept, as well as analysis of imbalanced data (Baayen et al., 2008). We mainly focused on response times of correct responses, but also investigated accuracies using generalized LMMs with a binomial link function. Response times were log transformed to account for their skewed ex-Gaussian distribution.
We first performed an analysis with factors repetition congruency (repetition vs non-repetition trials), reflecting the main manipulation of context effects in experiment 2, and repetition probability (25%, 50%, or 75%). To assess pre-lexical and semantic contributions to behavioral context and familiarity effects, we investigated the four-way interaction between repetition congruency, repetition probability, pre-lexical, and lexical familiarity. The latter two were manipulated orthogonally, such that familiarity was entered as two factors coding pre-lexical (0: novel PWs and words; 1: familiar PWs with and without semantics) and lexical familiarity (0: novel and familiar PWs without semantics; 1: semantic PWs and words). Since context provided by the prime stimulus might override familiarity effects (Kretzschmar et al., 2015), we additionally investigated the three-way interaction between pre-lexical familiarity, lexical familiarity, and repetition probability in non-repetition trials only (i.e., in the absence of valid contextual information). Note that for repetition priming analyses, we set behavioral responses from the first block (i.e., with 75% repetition probability) of one participant to NA, because she reported a misinterpretation of the task instruction that was clarified for the final two blocks.
All (generalized) LMMs included the interactions of all fixed effects described so far. Since not all trials entered the analyses (due to miss trials and for the response time analysis due to exclusion of trials with incorrect responses), which might have affected the match across letter string conditions, OLD20 and number of syllables were included as additional fixed effects. All fixed effects were centered and z-transformed. For each significant interaction, pairwise differences between conditions were investigated using post hoc LMMs including only the relevant conditions. Behavioral data and analysis scripts are published under https://osf.io/fc69p/.

Results
In the semantic association judgments of the repetition priming experiment, average accuracies for repetition trials were high, albeit not at ceiling, across all repetition probabilities (86.9%, 85.8%, and 83.9% for 75%, 50%, and 25% repetition probability, respectively; Extended Data Fig. 6-1A), as well as across all familiarity conditions with the exception of familiarized PWs with semantic associations (90.7%, 88.1%, 72.2%, and 91.1% for novel PWs, familiarized PWs without and with semantic associations, and words, respectively; Extended Data Fig. 6-1B). The lower accuracy for semantic PWs indicates that participants did not establish a semantic association with all (but yet the majority of) PWs, which is also consistent with their performance in the final naming session (see Analyses section and Fig. 3F). As a consequence, we only used correct trials for the response time analysis. Accuracies in non-repetition trials were overall lower (82.6%) compared to repetition trials (88.5; Extended Data Fig. 6-1A). Statistical analyses of accuracies can be found in Extended Data Tables 10-1, 10-2. In contrast to the MEG analysis, we included familiarity effects related to pre-lexical and lexical familiarity as two separate factors, since pre-lexical familiarity was manipulated orthogonally to lexical familiarity (cf. Materials and Methods). In the following, we report the effects on response times most relevant for our hypotheses, while Tables 7-10 provide a detailed overview of all statistical results.

Repetition priming
For a first manipulation check we investigated the influence of repetition probability on context effects irrespective of familiarity conditions ( Fig. 6A and statistics in Table 7). Here, context effects refer to the classical priming effect reflected in the repetition congruency contrast (repetition vs non-repetition). Response times showed a significant interaction between this priming effect and repetition probability. The interaction revealed a decrease in response times with increasing repetition probability (main effect of repetition probability: estimate ϭ -0.060, SE ϭ 0.0029, t ϭ 20.43) which was stronger for repetition (estimate ϭ -0.085, SE ϭ 0.0044, t ϭ 19.28) compared to non-repetition trials (estimate ϭ -0.022, SE ϭ 0.0037, t ϭ 6.06); i.e., the priming effect (difference between repetition and non-repetition trials) was smaller for a repetition probability of 25% vs 50% (estimate ϭ -0.040, SE ϭ 0.0055, t ϭ 7.36) and smaller for 50% vs 75% (estimate ϭ -0.043, SE ϭ 0.010, t ϭ 4.26; Fig. 6A; Table 7). This finding indicates that context effects increase when they can be expected more reliably.

Familiarity effects
To investigate the influence of pre-lexical and lexical familiarity in the absence of valid contextual information, we focused on non-repetition trials. We observed a significant interaction between pre-lexical and lexical familiarity (Table 8). This interaction was driven by the strong difference in response times for the two semantic letter string groups: Semantic PWs showed the longest response times (all ts Ͼ 4 for post hoc contrasts of semantic PWs vs the other three conditions; for details, see Table 9), reflecting the specific difficulty of retrieving semantics for a newly acquired vocabulary, particularly in case of unfulfilled expectations. This notion is also in line with the accuracy data (Extended Data Fig. 6-1B). How-ever, faster response times for words compared to novel (estimate ϭ -0.039, SE ϭ 0.0064, t ϭ 6.06) and familiar PWs (estimate ϭ -0.022, SE ϭ 0.0063, t ϭ 3.52; Table 9) indicate facilitated processing of letter strings with both fully established semantic associations and word status. In addition, response times were faster for familiar versus novel PWs (estimate ϭ -0.016, SE ϭ 0.0046, t ϭ 3.49).

Influence of pre-lexical and lexical familiarity on context effects
Repetition probability did not interact with pre-lexical or lexical familiarity (all ts Ͻ 1, including the three-way interaction; see Table 10). However, a significant interaction between repetition congruency and lexical familiarity revealed stronger priming effects for letter strings with semantic associations (i.e., words and semantic PWs) in comparison to PWs without semantic associations (estimate ϭ -0.033, SE ϭ 0.0030, t ϭ 11.13; Table 10). In repetition trials, the response times for PWs with associated semantics were lower than for the other PW conditions (pairwise post hoc contrasts: semantic vs novel PWs: estimate ϭ -0.048, SE ϭ 0.0059, t ϭ 8.11; semantic vs familiar PWs: estimate ϭ -0.051, SE ϭ 0.0060, t ϭ 8.52; Table 9). This indicates that the involvement of semantic information increases context effects dramatically, even reversing familiarity effects found in the absence of context-based facilitation.

Discussion Experiment 2
In general, the behavioral results replicated the main MEG findings. Context-based facilitation, here implemented as repetition congruency, on response times was stronger for words and PWs with semantic associations in contrast to PWs without meaning. The interaction of lexical familiarity and context replicated the modulation of lexical information on context-based facilitation found in the N400. We also observed that recently increased prelexical familiarity, in familiarized PWs without meaning, resulted in faster response times compared to novel PWs in non-repetition trials, i.e., in the absence of valid contextual information. This finding is compatible with the increased activation to familiar PWs within the left angular/supramarginal gyrus shown in the MEG. Finally, we found strong general priming effects that replicate the strong context effects demonstrated in the MEG results. Once more, we found no evidence for an interaction of pre-lexical familiarity and context-based facilitation.
A word status effect, i.e., words versus PWs with associated meaning, was also found in response times, but the difference in response times was much smaller when the contextual information was valid. This pattern indicates that even recently learned letter strings use semantic meaning to facilitate word recognition on a lexical level in a predictable context (cf. Tamminen and Gaskell, 2013;van der Ven et al., 2015). Also, we could identify a strong repetition probability effect. Here a higher repetition probability resulted in faster response times in repetition trials replicating previous studies with different visual stimuli (Olkkonen et al., 2017;Barbosa and Kouider, 2018). We expected this finding when one implements a predictive coding mechanism. Thus, fatigue is ruled out as a mechanism for context-based facilitation.

Discussion
In the two experiments of the present study, we found evidence for context-based facilitation of visual word recognition within the lexical-semantic processing level based on a predictive coding mechanism. Most prevalent was the increased facilitation, reflected in reduced brain activation at the left anterior temporal cortex around 400 ms and faster behavioral responses, when semantic information was present (Fig. 7). We found no evidence for context-based facilitation through pre-lexical (i.e., orthographic and/or phonological) familiarity. Also, we could not detect evidence for top-down facilitation, as there was no influence of lexical information on context effects in earlier time windows (i.e., Ͻ400 ms) associated with visual and pre-lexical processing. At the level of visual processing, we found familiarity-unspecific repetition effects in the occipital cortex around 100 ms. Combined, we take this pattern as evidence for context-based facilitation within lexical and visual processing levels (Fig. 7).

Implications for neurocognitive models of visual word recognition
One of the main goals of this study was to investigate whether context-based facilitation of visual word recognition is implemented via top-down feedback from higher to lower  processing levels, or restricted to within each processing level; i.e., assessing the architecture of context-based facilitation. The two alternative architectures were formalized in previous neuro-cognitive models of visual word recognition. Laszlo and Armstrong (2014) implemented a strictly feedforward model including within-level context-based facilitation. The model architecture brought forward by Carreiras et al. (2014) additionally included top-down connections. The present evidence favors the architecture implemented by Laszlo and Armstrong (2014). As described in the previous section, at the lexical processing level we could find within-level context-based facilitation but no evidence for cross-level top-down facilitation. The latter is a central component of the architecture described by Carreiras et al. (2014).
Despite not finding evidence for pre-lexical influences on context effects in the present study, we could identify an activation cluster in the left angular/supramarginal cor-  tex, showing a pre-lexical familiarity effect. At the sensor level, the activation in response to familiarized PWs differed from novel PWs and words without an interaction with context. In behavior, we also observed facilitated recognition of pre-lexically familiarized in contrast to novel PWs when no valid contextual information was available. These findings reflect that pre-lexical familiarity has a central role in visual word recognition (Xue and Poldrack, 2007;Bermúdez-Margaretto et al., 2015;Glezer et al., 2015), but no evidence for an increased context-based facilitation within the pre-lexical processing levels, as proposed by Laszlo and Armstrong (2014), was found. One could argue that the learning paradigm did not build up a sufficiently strong pre-lexical representation to influence context-based facilitation. In our opinion, this is less likely for two reasons. In both experiments, we could find pre-lexical facilitation when no valid context-based expectations could be formed. This finding is a manipulation check showing that PW learning was successful. The second reason is that in the behavioral experiment, the learned lexical-semantic information was successfully used to increase context-based facilitation (for similar results, see Tamminen and Gaskell, 2013;van der Ven et al., 2015). Both these findings strongly indicate that the learning paradigms used here were effective in influencing processing of the learned PWs. Thus, these findings underline the surprising result that additional pre-lexical familiarity did not increase context-based facilitation. In addition, a previous study found no reliable influence of OLD20 on context effects in sentences (Payne et al., 2015). Thus, learning-independent manipulations of prelexical familiarity did also not modulate context-based facilitation.
Previous studies using text-or sentence-based context manipulations found top-down influences on visual or pre-lexical processing (Dambacher et al., 2006;Kim and Lai, 2012;Lee et al., 2012;Brothers et al., 2015). However, time point (i.e., N170 vs P2 vs N200/250 component) and direction of these top-down effects were highly inconsistent (for review, see Nieuwland, 2019). One limitation of sentence studies might be that word predictability out of a sentence context, reflecting lexical-semantic topdown information, and pre-lexical familiarity (e.g., OLD20) are naturally confounded. For example, in the Potsdam Sentence Corpus used by Dambacher et al. (2006), sentence-level predictability and OLD20 (i.e., item-level word familiarity) correlated with an r of -0.24. Typically, these studies control for word characteristics like word frequency (r with predictability: 0.33), which is also associated with semantic word characteristics (e.g., r with the semantic neighborhood ϳ0.75; Yap et al., 2012;Goh et al., 2016), but not orthographic characteristics like OLD20. This confound pattern might indicate that only the combined availability of predictable sentence-level context and high orthographic familiarity enables early context effects, which should be explicitly investigated in future studies.

Mechanistic implementation of context-based facilitations
As pointed out in the previous section, the interaction pattern at the N400 component, in particular the reduction of the difference of words against PWs, was informative to determine that predictive coding was the most probable mechanism underlying the context-based facilitation phenomena. At the prime, words showed a stronger N400 in contrast to PWs. At the target, this difference was reversed. This pattern rules out sharpening, as in sharpening the expectation is a suppression of the noise in the neuronal signal (cf. Kok et al., 2012;Blank and Davis, 2016;Richter et al., 2018). Therefore, the difference between words and PWs should be easier to detect (i.e., Figure 7. Summary of our findings (top) and implications for the comprehensive connectionist model of visual word recognition (cf. Carreiras et al., 2014;bottom). In the upper part of the figure, we present MEG findings for each processing level separated for the letter string conditions (words and PWs). On the right, we present the behavioral pattern from response times. Note, all data figures are presented in a simplified form (combining individual data points and excluding participant outliers outside of 1.5 times the interquartile range above the upper quartile and below the lower quartile) allowing a more lucid presentation of identified effect patterns. For more details see Figure 5 for the MEG data and Figure 6 for behavior. Our evidence supports context-based facilitation within visual and lexical processing levels (bottom). stronger difference). The N400 activation shows the opposite pattern when comparing prime (words Ͼ PWs) and target amplitude (words Ͻ PWs; Fig. 7). We consider the change in effect direction as evidence against a sharpening mechanism. Evidence against a fatigue mechanism was the finding that a high repetition probability, across trials, resulted in stronger context effects in response times. This finding was only expected by predictive coding and previous evidence from neuronal (Summerfield et al., 2008(Summerfield et al., , 2011Todorovic et al., 2011;Lau et al., 2013b;Grotheer and Kovács, 2014;Mayrhauser et al., 2014;Delaney-Busch et al., 2019) and behavioral investigations (Olkkonen et al., 2017;Barbosa and Kouider, 2018) came to similar conclusions.
In addition, the interaction between with versus without context (i.e., prime vs target) and valid versus invalid context (i.e., repetition vs non-repetition) at the N400 also provides evidence against fatigue. A fatigue mechanism cannot explain the increased activation for unexpected targets in non-repetition trials compared to primes. At the same time, this increase fits well with predictive coding. In experiment 1, the repetition probability was 75%. As a consequence, a repetition was likely expected in every trial. Irrespective of repetition or non-repetition trials, this expectation is transformed in a prediction and, in case of a non-repetition trial, the prediction is not met resulting in a prediction error. The increase in the N400 amplitude for non-repeated targets versus primes might indicate a higher prediction error for mispredicted versus unpredicted stimuli (Hsu et al., 2015(Hsu et al., , 2018. Thus, these findings indicate that a predictive coding mechanism offers the most appropriate explanation for context-based facilitation described here. The current interpretations concerning the architecture and mechanistic implementation of context-based facilitation are not necessarily compatible. First, our favored architecture (Laszlo and Armstrong, 2014) implemented a fatigue mechanism. As pointed out in Figure 1C, the expected patterns from fatigue and predictive coding are relatively similar. Only our repetition probability and repetition congruency manipulations allowed the differentiation of predictive coding and fatigue. We expect that the implementation of a fatigue mechanism will not be able to simulate the effect of repetition probability presented here.
Still, another incompatibility is prevalent. We could not find evidence for the assumptions concerning the processing levels involved in context-based facilitation and the architecture proposed by the predictive coding theory (Rao and Ballard, 1999;Friston, 2005). Predictive coding assumes that, at all processing levels, one integrates all available information before the presentation of a stimulus to facilitate later stimulus processing (Friston, 2005). As a consequence, one could expect the integration of pre-lexical familiarity of the learned PWs into the prediction process. If this were the case, we should have found the interaction of pre-lexical familiarity and context-based facilitation. Also, the predictive coding theory assumes a hierarchical architecture in which top-down besides within-level predictions facilitate processing (Friston, 2005). Once more, we could not find evidence for an architecture that implements topdown facilitation. However, we, on the other hand, provide evidence that a core mechanism of predictive coding (i.e., suppression of the informative part of expected sensory signals) is computationally implemented during visual word processing, to achieve context-based facilitation within, e.g., lexical processing levels.
Finally, one can speculate that specific lexical and pre-lexical information is transformed after completion of lexical access (i.e., after the N400 time window). The retrieved pre-lexical and lexical-semantic information might be used to predict the future stimuli already at the sensory level (Rao and Ballard, 1999). Note, predicting away information at the sensory level optimizes processing at later levels since, as prominently proposed in the predictive coding models, only the residual, i.e., unpredicted information, is processed at higher levels (Gagl et al., 2018). In line with this notion is that for visual and lexical processing a reduction of activation was found. Still, for the implementation of a sensory prediction as suggested here, the information has to be transformed, i.e., from lexical to visual information, and held active until the presentation of the next stimulus. At the prime, the late occipital context effect of higher activation for prime versus target (i.e., C6; Fig. 5) might reflect the result of such a transformation process. We speculate that at this point in time top-down information might be used to prepare visual processing levels for the upcoming target presentation. When the subsequent target is in accordance with the prediction, based on predictive coding one can expect that the neuronal activation at the visual level is low. This expectation is met by the early context effect found in the occipital cortex (i.e., C1 cluster). Here, we expect that future research, e.g., using explicit connectivity investigations or specifically investigating the interval between prime and target, might allow specifying the information content integrated at the prime to facilitate processing of the target.

Conclusion
In sum, our investigation of context-based and familiarity-based facilitation of visual word recognition indicated within-level facilitation at visual and lexical processing levels. We found no support for hierarchical, topdown facilitation from a predictive (higher-level) context (e.g., word semantics) to lower levels of processing (visual, pre-lexical). At a mechanistic level, we could identify predictive coding as the most likely candidate for the implementation of facilitation processes (as compared to fatigue and sharpening). A novel approach of our study was the explicit manipulation of pre-lexical (i.e., orthographic and phonological) familiarity, via a PW familiarization training procedure. We could not find support for context-based facilitation at the pre-lexical level but could identify a context-independent pre-lexical familiarity effect in the left angular/supramarginal gyrus. Thus, we conclude that context-based facilitation relies on information about visual and lexical-semantic features of upcoming words. Interestingly, in natural reading, visual information is typically available through para-foveal pre-processing (Schotter et al., 2012;Gagl et al., 2014), while lexical-semantic information is available through previous text or sentence context Hawelka et al., 2010). This analogy might indicate that context-based facilitation in reading mainly operates by visual and lexical representations. Investigating this dichotomy further in future studies might provide exciting avenues for refining the understanding of contextual influences on efficient word recognition during reading.