Emerging neural specialization of the ventral occipitotemporal cortex to characters through phonological association learning in preschool children

ABSTRACT The ventral occipitotemporal (vOT) cortex serves as a core region for visual processing, and specific areas of this region show preferential activation for various visual categories such as faces and print. The emergence of such functional specialization in the human cortex represents a pivotal developmental process, which provides a basis for targeted and efficient information processing. For example, functional specialization to print in the left vOT is an important prerequisite for fluent reading. However, it remains unclear, which processes initiate the preferential cortical activations to characters arising in the vOT during child development. Using a multimodal neuroimaging approach with preschool children at familial risk for developmental dyslexia, we demonstrate how varying levels of expertise modulate the neural response to single characters, which represent the building blocks of print units. The level of expertise to characters was manipulated firstly through brief training of false‐font speech–sound associations and secondly by comparing characters for which children differed in their level of familiarity and expertise accumulated through abundant exposure in their everyday environment. Neural correlates of character processing were tracked with simultaneous high‐density electroencephalography and functional magnetic resonance imaging in a target detection task. We found training performance and expertise‐dependent modulation of the visual event‐related potential around 220ms (N1) and the corresponding vOT activation. Additionally, trained false‐font characters revealed stronger functional connectivity between the left fusiform gyrus (FFG) seed and left superior parietal/lateral occipital cortex regions with higher training performance. In sum, our results demonstrate that learning artificial‐character speech–sound associations enhances activation to trained characters in the vOT and that the magnitude of this activation and the functional connectivity of the left FFG to the parieto‐occipital cortex depends on learning performance. This pattern of results suggests emerging development of the reading network after brief training that parallels network specialization during reading acquisition. HIGHLIGHTSArtificial character‐speech sound training induced preferred N1 and vOT activation.N1 and vOT BOLD tuning depends on training performance in prereaders.Functional connectivity of left FFG and SPL also depends on training performance.Level of expertise to character types modulates the N1 and vOT BOLD activation.Results suggest a phonologically guided N1 and vOT tuning in children.


Introduction
The development of functional specialization in specific cortical areas is critical for information processing in various domains (Houd e et al., 2010). The left ventral occipitotemporal (vOT) cortex is important for processing a variety of visual categories Hasson et al., 2003;Kourtzi and Kanwisher, 2001;Yovel and Kanwisher, 2004), and its crucial role in orthographic processing is widely accepted and has been demonstrated in numerous studies (Baker et al., 2007a;Binder et al., 2006;Bruno et al., 2008;Cohen et al., 2002;Dehaene and Cohen, 2011;Dehaene et al., 2002Dehaene et al., , 2005Glezer et al., 2009;Martin et al., 2015). The left vOT shows increasing visual specialization to print in readers along a posterior-to-anterior axis (Kronschnabel et al., 2013;Vinckier et al., 2007), and a specific area in the left mid fusiform gyrus, called the visual word form area (VWFA; Cohen et al., 2000Cohen et al., , 2002, shows preferential activation to print (Baker et al., 2007b;Cohen et al., 2003;Vinckier et al., 2007) and whole word forms (Glezer et al., 2009;Glezer and Riesenhuber, 2013). As an electrophysiological correlate of this vOT activation, the characteristic occipitotemporal negativity in the event-related potential (ERP) after around 170 ms (N1 also referred to as N170) shows similar specialization in adults (Bentin et al., 1999;Brem et al., 2006;Maurer et al., 2005), reflected in more pronounced amplitudes with print, especially over the left hemisphere. This specialization of the vOT/N1 to print is regarded as a key component for efficient and fluent reading, which develops with reading acquisition and is often reduced in impaired readers (Ben-Shachar et al., 2011;Paulesu et al., 2001;Shaywitz et al., 2002).

Functional specialization of the ventral occipitotemporal cortex (vOT)
Neuroimaging research has provided important insights into the functional specialization of the left vOT cortex to print during reading acquisition in children (Brem et al., 2010;James, 2010;Maurer et al., 2011;Saygin et al., 2016), adults (Dehaene et al., 2010;Pegado et al., 2014), and after symbol training in primates (Srihasam et al., 2012). Learning drives the expertise reflected in the preferential neural responses of the left vOT (Ben-Shachar et al., 2011;Boros et al., 2016;Olulade et al., 2015;Price and Devlin, 2011;Xue et al., 2006), the corresponding N1 Maurer et al., 2006Maurer et al., , 2011, and the visually evoked fields of the magnetoencephalogram (Caffarra et al., 2017;Parviainen et al., 2006) after around 150-250 ms. The process of learning print has been simulated in adults using unknown and artificial scripts Hashimoto and Sakai, 2004;Maurer et al., 2010;McCandliss et al., 1997;Song et al., 2010) and pseudoword learning paradigms (Glezer et al., 2015). Functional magnetic resonance imaging (fMRI) studies have revealed stronger activation in the bilateral vOT for trained letters and characters than untrained ones in adults upon phonological (Hashimoto and Sakai, 2004;Xue et al., 2006;Xue and Poldrack, 2007) and visual object (Song et al., 2010) association training and in the right vOT for semantic training (Xue et al., 2006), but decreased activation upon specific visual form training (Xue and Poldrack, 2007). ERP training studies largely converge by showing a rapid development of visual N1 amplitudes over occipitotemporal regions for trained associations in adults Maurer et al., 2010) within a few training sessions. In accordance with the results of word-form training, pseudoword-form training resulted in a selective decrease in the blood-oxygen-level dependent (BOLD) response of the VWFA to trained pseudowords compared to untrained ones using a rapid neural adaptation paradigm (Glezer et al., 2015). The results of training studies in adults thus provide evidence that the neural tuning of the vOT is dependent on the type of training and show various contributions of visual familiarity, phonological and semantic processes in shaping its functional activation (Xue et al., 2006).
Most insights into the development of the network for print processing are derived from studies that presented either real words or letter strings. Recent findings in prereaders suggest that early structural connectivity precedes and determines the location of later functional specialization to print in the left vOT (Saygin et al., 2016). This specialization to new visual categories such as words emerges in formerly weakly specialized cortical regions of the visual system rather than in areas of the visual cortex with established cortical preferences, such as for faces (Dehaene-Lambertz et al., 2018). Print specialization of the N1 shows an inverted U-shaped learning and expertise activation curve during normal reading acquisition . The emergence of early coarse specialization, for instance between familiar letter strings and false-fonts, was confirmed by several studies showing changes after brief training (Brem et al., 2010), several months (Eberhard-Moscicka et al., 2015;Zhao et al., 2014), and years of reading experience Coch and Meade, 2016;Maurer et al., 2006;Saygin et al., 2016). In contrast, fine levels of discrimination, for instance between letter and number strings , show a more protracted development, gradually emerging from age 7 (Coch and Meade, 2016;Eberhard-Moscicka et al., 2015). Varying levels of prereading skills have been associated with increased activation and involvement of occipitotemporal areas during processing of printed words Specht et al., 2009), and the characteristic left-hemispheric distribution of the N1 to print has been related to individual phonological abilities in children (Sacchi and Laszlo, 2016). Even at preschool age, there is some evidence for early coarse categorical differentiation in an atypical right lateralization of the N1 that depends on letter knowledge  and is related to future reading outcomes Brem et al., 2013). This atypical right-lateralized N1 activation in prereaders may reflect a degree of visual familiarity with a given visual category such as letters but as yet unestablished expertise and connections to corresponding linguistic information such as phonological representations (Maurer et al., 2010). Such a bilateral or right-lateralized N1 resembles nonlinguistic visual category processing such as occurs with faces and objects (Rossion et al., 2003). In the course of reading acquisition and emerging expertise with words, one would expect a more prominent impact of phonological associations on the visual N1 distribution and thus increasing lateralization of the N1 over the left hemisphere. Early signs of specialization related to children's letter knowledge have recently been shown in five-year-old prereading children in a specific oddball EEG response to letter vs. false-font strings over the left occipitotemporal cortex (Lochy et al., 2016) and in enhanced activation to letters compared to false-fonts with increased reading ability (Centanni et al., 2018).
Previous results thus suggest that the preferential response to print is likely to be initiated rapidly through suitable training and that the extent of specialization to print depends on reading performance, expertise, and risk factors for reading problems in children. What remains largely unclear so far is whether the vOT specialization is a purely perceptual process that can be explained by increased visual familiarity with a specific category, whether vOT specialization is mostly shaped by higher-order processing areas, or whether both visual familiarity and increased expertise with linguistic information contribute to the vOT specialization. Specialization of the vOT as a result of increasing visual familiarity would involve increased perceptual tuning to letters, strings, and whole words with exposure to print regardless of whether associations with phonology or lexical information have been established. Alternatively, such specialization may involve pre-existing connections with anterior language areas (Hannagan et al., 2015;Saygin et al., 2016) and phonologically guided tuning (Brem et al., 2010;Sandak et al., 2004;Schlaggar and McCandliss, 2007) or prediction errors resulting from matching and integrating feedforward sensory inputs and information from other areas (Price and Devlin, 2011;Stevens et al., 2017). The latter concept would require some expertise to connect the visual information to higher-level linguistic or magnitude information. Here, we aim to clarify to what extent letter-speech sound learning has an impact on activation of bilateral vOT and how prereading familiarity with letters and numbers drives initial functional specialization and the consequent preferential cortical activations observed in the left vOT during child development.

Preferential processing of single characters
Relatively few studies have examined the processing of the building blocks of printed words, i.e. single letters. One important step for later reading outcomes is early automatization to letter processing through accumulating familiarity with letters in the everyday environment prior to formal literacy acquisition. Indeed, letter naming is often impaired in children with developmental dyslexia and is one of the most promising predictors of preschool reading outcome (Semrud-Clikeman et al., 2000;Wolf et al., 1986). Moreover, selective neural processing of letters starts early in the development of visual processing (Herdman, 2011;Miller and Wood, 1995;Rey et al., 2009) and seems to recruit a distinct, more anterior, lateral area of the vOT (Flowers et al., 2004;Gros et al., 2001;James et al., 2005;Joseph et al., 2006) than letter strings, for which preferential activation was reported in a "letter form area" (LFA), posterior to the classic VWFA (James et al., 2005;Tagamets et al., 2000;Thesen et al., 2012). However, modulation of the classic VWFA has also been reported when categorizing single letters (Pernet et al., 2005;Polk et al., 2002). Similar to words, the visual N1 response to familiar letters showed stronger amplitudes than to unfamiliar ones , but disfluent letter typefaces also produced stronger N1 than fluent ones, suggesting greater attentional demands for processing difficult-to-read scripts and pseudoletters (Herdman and Takai, 2013;Keage et al., 2014). However, changes in single-letter processing with reading acquisition and development have been less studied. Cantlon et al. (2011) reported coarse alphanumerical tuning when comparing letters and numbers with objects (faces, shoes), but no fine tuning between numbers and letters in four-year-old prereading children. Furthermore, sensorimotor letter training induced a stronger activation in bilateral fusiform gyri of prereading children with letters than with pseudoletters and shapes (James, 2010). A recent study by Centanni et al. (2018) suggests that the greater specificity in the fusiform cortex activation to letters than to false-fonts is associated with reading ability and a reduction of the left fusiform face area in kindergarteners. Similar to studies on word reading (Maisog et al., 2008;Martin et al., 2015;Richlan et al., 2009) one study showed that the activation to single letters in the bilateral fusiform gyri of reading-impaired school children was lower than in typical readers .
When assessing the natural course of reading development by tracking the processing of written words by children in longitudinal or cross-sectional studies Dehaene et al., 2010;Shaywitz et al., 2007;Turkeltaub et al., 2003), learning-related changes are likely to reflect mixed effects of age, general maturation, and emerging phonological, lexical, and semantic associations that develop partly in concert. Consequently, such studies cannot fully explain the extent to which visual familiarity or emerging feedforward and feedback circuits to and from phonological and semantic association areas drive functional brain specialization to print. We consider visual familiarity as the capacity to perceptually discriminate between specific and visually similar character categories (letters, numbers, false-fonts), facilitated by prolonged exposure to stimuli. Children usually gain a high level of familiarity during the course of development due to exposure and experience in their everyday environment. In addition, we consider various types of expertise that go beyond visual categorization, as a result of formal instruction, practice and exposure and involve manipulation and association with other modalities. Expertise allows the coupling of the visual units (letters, numbers and trained false-fonts) with phonological, lexical, semantic and/or magnitude information and thus describes the effect of reciprocal interactions between the visual system and multiple top-down processes (Harel, 2016). Here, our first aim was to clarify the impact of learning new phonological associations on visual processing of characters. Through false-font speech--sound association learning, we simulated the first step of formal reading acquisition in prereaders usually taught after school enrolment. After the training, we compared the brain activation to the trained false-font characters to passively viewed, visually matched false-font characters with high-density simultaneous EEG-fMRI recordings. We expect emerging expertise only in the trained characters in contrast with the passively viewed characters (untrained false-font), but comparable levels of moderate visual familiarity for both trained and control false-font. Studying prereaders, who do not yet possess a functional reading network, overcomes undesirable interference with an existing reading system, which is an obstacle encountered in studies on literate adults (Hashimoto and Sakai, 2004;Maurer et al., 2010). Using false-font characters instead of real letters allows the effects of phonology on character processing to be clarified more directly due to the absence of semantic and lexical associations. The second aim of our study was to provide more detailed insights into how character types such as letters and numbers may be encoded in the brains of prereading children, who have accumulated a varying amount of visual familiarity and expertise. Given the age of these prereading children, we expect high visual familiarity with real letters and numbers, a high level of expertise with numbers, but only rudimentary expertise with letters' phonological associations and no expertise and moderate visual familiarity with passively viewed false-font characters. These diverging levels of expertise based on associations with phonological and semantic/magnitude information are assumed to shape the functional activation of the vOT and N1 ERP in preschool children.

Participants
A group of 31 native German-speaking, prereading kindergarten children (aged 6.7 AE.3, 16 males) completed four parts of an audiovisual target detection task in a simultaneous EEG/fMRI session. A core group of 18 children at varying risk for developmental dyslexia (8f, mean: 6.7þ-0.36y; Table 1) met data quality criteria for both EEG and fMRI analyses and was included in the main analysis. Additional analyses were performed with extended EEG (N ¼ 23) and fMRI groups (N ¼ 24, Inline Supplementary Table S1), who met data quality criteria in either the EEG or fMRI modality.
Children's risk for developmental dyslexia varied and was estimated from their parents' reading history assessed with the Adult Reading History Questionnaire (ARHQ; Lefly and Pennington, 2000). Parental values above 0.3 in the ARHQ indicate a familial risk for the child (Lefly and Pennington, 2000). Each child's individual familial risk score for developmental dyslexia was defined as the higher parental ARHQ value (0.53 AE 0.2) and was included in all statistical analyses as a covariate to control for familial risk.
Subjects' nonverbal intelligence scores (IQ > 85, 108 AE 13.3) were within or above the normal range, as estimated with the block design test of the Wechsler Intelligence Scale for Children (HAWIK-IV;Petermann and Petermann, 2007). All children had normal or corrected to normal visual acuity and no diagnosis of attention-deficit/hyperactivity disorder or other neurological or cognitive impairments. The parents gave written informed consent and the children gave oral assent. The local ethics committee of the Canton of Zurich and neighboring Cantons in Switzerland approved the study. All participants received vouchers and presents as compensation.

Behavioral assessment
The subjects' behavioral characterization and learning achievements were assessed in two separate sessions preceding the simultaneous EEG/ fMRI measurement (Table 1). Reading status was tested with a short list of twenty simple, one-or two-syllable (2-5 letters) words frequently found in common first-grade textbooks and written in upper-case letters. In Switzerland, formal reading instruction starts with school enrolment at age 6-7, after the end of kindergarten. As usual at this age, most children were able to name and/or partly write the (upper-case) letters of their given names, and very few children were able to decode a few short words written in upper-case letters (mean 2.9 AE 3.1 words, see Table 1) despite not having received formal reading instruction. In contrast to letters, number instruction (counting from 0 to 20, magnitude comparisons, but not calculations) is given in kindergarten, so all children were able to name the numbers 1-6 used for the EEG-fMRI task with 100% accuracy. Letter and number knowledge was assessed by asking the children to pronounce all 26 upper-and lower-case letters of the Latin alphabet and to name twenty-one numbers including all single digits from one to nine. In addition, we also assessed children's phonological awareness with a behavioral test battery (Test zur Erfassung der phonologischen Bewusstheit und Benenngeschwindigkeit -TEPHOBE see Mayer, 2011) including four subtests (onset and rime synthesis, rhyming, initial sound categorization, phoneme synthesis), rapid automatized naming (RAN) of objects, nonword repetition (Mottier Test, see Wild and Fleck, 2013), and vocabulary (Marburger Sprachverst€ andnistest für Kinder, see Elben and Lohaus, 2000, Table 1). According to the phonology and RAN screening (TEPHOBE), 3 out of 18 children exhibited an increased phonological risk and 6 out of 18 exceeded the risk score in RAN.

Artificial grapheme-phoneme correspondence (GPC) training
Between 1 and 5 days (mean: 2.3d) prior to the simultaneous EEG/ fMRI session, all participants trained the correspondences between six false-font characters and familiar speech sounds with an adapted version of the GraphoGame phonics training program Lyytinen et al., 2007Lyytinen et al., , 2009. To simulate reading acquisition, children were assigned to train the association between natural speech sounds and one of two false-font (FF) character sets using an adaptive randomization approach (Fig. 1a, training set 1 n ¼ 10, training set 2, n ¼ 8). During each trial of the training, the FF characters of the other set (untrained control FF) were presented on the upper part of the screen while the trained FF characters were presented in the middle of the screen. This design allowed us to visually familiarize children with both sets even though only one set was actively trained. Both sets of FF characters were based on six lower-case letters of the Latin alphabet. We rearranged parts of each letter (b, d, m, t, u, z) also appearing in the letter (LET) condition in Swiss School font with the Font Creator (Version V4.5) to form new "pseudoletters". These false-font characters were comparable in size and width to the LET characters. In addition, we included two mirrored FF characters in each FF set to match the difficulty of mirror-image letters (b-d), which also occur in the Latin alphabet. In this way, we aimed to keep the visual content of FF and LET constant. During the training, the visual stimuli were presented on a laptop positioned in front of the child. The sounds, spoken by a female voice, were presented over headphones. The children had to choose the correct FF grapheme corresponding to the heard phoneme. The numbers of visual distractors (1-3) changed according to the accuracy rate of the previous trial. The training consisted of a single session with 131 trials divided into ten training levels. Struggling children completed supporting levels to train specific correspondences with a high error rate (84 AE 39.7 trials). The training lasted until each child was able to match the six speech sounds to their corresponding characters or until each supporting level was repeated maximally three times.
Training duration and achievement was calculated for the complete training session. Training duration varied between children: faster learners needed less time to successfully learn the associations in the artificial GPC training. To account for the varying number of distractors, accuracy was calculated using a weighting factor defined as the number of presented items proportional to the maximum possible number of presented items (Karipidis et al., , 2018. On the day of the neuroimaging session, all subjects repeated the learned associations (mean duration: 5 AE 0.9 min, weighted accuracy 79 AE 15%) to test whether the children remembered the previously trained associations. Weighted accuracy (81 AE 9%) and training duration in the training session (19 AE 3.9 min) were used to characterize the children's performance in the artificial GPC training Lyytinen et al., 2009). Correlations of training parameters (duration, weighted accuracy) with behavioral assessments have already been described in Karipidis et al. (2017) but were recomputed for the specific subgroup of children analyzed in this article (see Inline Supplementary Table S2). Because the groups and results largely overlap, the results of these correlations are not further discussed, and the reader is referred to the corresponding articles (Karipidis et al., , 2018.

Audiovisual target detection task
Participants performed an implicit audiovisual target detection task ( Fig. 1), which was divided into four parts of 375s each to maintain the attention of the young children (Karipidis et al., , 2018. A pediatric protocol was used and the task was embedded in a story. Each part of the task included a different character type: to examine effects related to GPC training two different sets of FF characters were used of which one was trained prior to the EEG-fMRI session (TFF) whereas the other was only passively presented and served as control FF characters (CFF). The effect of expertise with different culturally meaningful character types was furthermore studied by presenting letters (LET), digits (DIG) as compared with the control false-font (CFF; Fig. 1a). All stimuli were presented in unimodal visual and auditory, and audiovisual congruent and incongruent conditions using Presentation ® software (Version 16.4, www.neurobs.com). Every part consisted of 16 blocks (4 blocks/condition) whereby unimodal and bimodal blocks (15 items/block) alternated pseudorandomly separated by fixation periods of 6 or 12s.
Six targets corresponding to the modalities of the conditions, i.e. either unimodal visual or unimodal auditory (cat, parrot, tortoise or shovel or their sounds) or bimodal presentation of a picture with sound (see Fig. 1a) requiring a button press were presented in addition to a total of 54 stimuli per condition to maintain children's attention. The stimuli within each block were presented pseudorandomly for 613 ms with an interstimulus interval of 331/695 ms (Fig. 1b). Here, we focused on the unimodal visual condition (for analyses of audiovisual conditions see Karipidis et al. (2017)). Visual information was presented using video goggles (VisuaStimDigital, Resonance Technology, Northride, CA), auditory information over in-ear headphones (MR confon GmbH, Magdeburg). Characters were presented in black in the middle of a grey background (mean visual angles horizontally/vertically TFF: 2.9 /4.8 ; CFF: 2.7 /4.8 ; LET: 2.8 /4.8 ; DIG: 3 /6.7 ). In-scanner target detection accuracy (ACC) was high (89 AE 12.7%) and reasonable reaction times (RT) were recorded in all four parts and for all character types (Inline Supplementary Table S3). Performance did not significantly differ between the four parts (ACC: F (3,15) ¼ 1.5, p ¼ 0.227; RT: F (3, 15) ¼ 0.9, p ¼ 0.466). Responses of two participants were not logged due to technical problems and therefore not included in the response analysis.

EEG and fMRI acquisition
Using an MR-compatible 128-channel EEG system (Net Amps 400, EGI HydroCelGeodesic Sensor Net) simultaneous EEG-fMRI recordings were performed on a Philips Achieva 3 T scanner (Philips Medical Systems, Best, The Netherlands). Continuous EEG at a sampling rate of 1 kHz (DC-filter) was recorded with 128 scalp and two electrocardiogram (ECG) electrodes. To reduce gradient residuals during simultaneous EEG-fMRI recordings, the scanner clock and the EEG system were synchronized (Mandelkow et al., 2006). Electrode impedances were kept below 50 kΩ. The recording reference was located at Cz, the ground electrode (COM) posterior to Cz. Potential electrode vibration artifacts were minimized by covering the electrodes with a bandage retainer net and by turning off the helium pump of the MRI scanner during image acquisition.
A 32-elements receiver head coil was used to acquire 189 vol for each part of the task using a T2*-weighted whole-brain gradient echo-planar image sequence (EPI) with the following parameters: SofTone factor: 3, slices/volume: 31, repetition time (TR): 1.98s, echo time (TE): 30 ms, slice thickness: 3.5 mm, slice gap: 0.5 mm, flip angle: 80 , field of view (FOV): 24 Â 24 cm 2 , in plane resolution: 3 Â 3mm 2 , sensitivity-encoding reduction factor: 2.2. Specific emphasis was given on reducing scanner noise and improving auditory stimulation by using sound-absorbing over-ear headphones, a sound-absorbing mat in the MR-bore and a Sof-Tone sequence. A custom-made head pad for the EEG net was used, to reduce head movement and to ensure comfort. Additionally, a field map scan to perform B0 correction was recorded. T1-weigthed images were recorded with a 3D MP-RAGE sequence (slices: 176, TR/TE: 6.8/3.2s, voxel size: 1x1x1mm 3 , flip angle: 9 , FOV: 27 Â 25.4 cm 2 ).

ERP analyses
Analyses were conducted using VisionAnalyzer 2.1 (BrainProducts GmbH, Munich, Germany). Channels with an overall poor data quality were topographically interpolated (range: 0-5 channels, mean: 1.57 channels SD: AE0.06). Due to continuous artifacts on the cheek electrodes, we excluded four electrodes from further processing and analyses (E43, E48, E119, E120). In addition, each data set was visually inspected and periods with major artifacts were manually excluded. After MR artefact removal using the average template subtraction method (Allen et al., 2000) and ballistocardiogram correction using sliding average template subtraction, the data was filtered (0.1-30 Hz and 50 Hz Notch) and down sampled (500 Hz). Independent component analysis (ICA; Jung et al., 2000) was applied to exclude blinks, eye movements, and residual ballistocardiogram artifacts. After artefact corrections, the data was rereferenced to the average reference (Lehmann and Skrandies, 1980). Trials with remaining artifacts exceeding AE200 μV or identified by visual inspection were excluded. The data was segmented from À102 ms to 498 ms after visual presentation and averaged character type-wise. Core group (n ¼ 18) grand averages included a mean of 41 epochs per character type (means: TFF ¼ 45, CFF ¼ 45, LET ¼ 48, DIG ¼ 40; range: 19-54 epochs). Using the global field power (GFP) maxima of the mean ERPs over all four character types, the interval of the N1 was defined as AE30 ms (194-254 ms) around the GFP peak. The mean amplitude values within these intervals were further analyzed. In the inline supplementary analyses 1, we also report statistics regarding effects related to training and character type processing in the preceding P1 ERP interval (102-162 ms).

Electrodes of interest analyses
To examine print specific activations over the posterior scalp sites for the N1 interval, the mean amplitudes over a left (LOT), middle (MO), and right (ROT) electrode cluster were computed (Fig. 2a). These left, middle and right electrode clusters comprised of the following electrodes (LOT: E65, E68, E69, E70, E73; MO: E81, E82, E75, E74; ROT: E83, E88, E89, Fig. 1. Character types, artificial GPC false-font training sets, and task design. (a) The implicit audiovisual target detection task was divided into four parts, each including one character type: Trained (TFF), control false-fonts (CFF), letters (LET), digits (DIG). Children had to press the response button whenever a visual, auditory or audiovisual target (picture or sound of animal or tool) appeared (last column). During the GPC training, children learned to associate one set of six artificial graphemes to known phonemes: The stimulus sets (1/2) for TFF and CFF were counterbalanced across subjects (set 1, N ¼ 10, set 2, n ¼ 8). (b) Illustration of the sequence and timing of one visual stimulation block. Each part included four visual blocks among blocks of auditory and audiovisual stimulation. E90, E94). Statistical analyses for GFP and LOT/MO/ROT amplitudes were performed using linear mixed models (LMM), including repeated measurements within each subject (SAS 9.4, SAS Institute, Cary NC). To investigate training and character type effects random intercept models with fixed factors electrode cluster (LOT, MO, ROT), training (TFF, CFF) or character type (LET, DIG, CFF) and ARHQ as covariate of no interest, including the specific random intercept for each subject were computed (for details, see chapter 2.7.2). Post-hoc t-tests with Tukey-Kramer correction for multiple comparisons were performed. Correlational analysis (Fig. 2d) was performed to determine relations between trained false-font amplitudes and training performance (duration and weighted accuracy), using SPSS (Version 22.0). For normally distributed data Pearson correlation and for non-normally distributed data Spearman correlation was used.

FMRI analyses
Data was preprocessed and analyzed using SPM12 on MATLAB R2015b. After field map correction, images were spatially realigned and unwarped, slice time corrected, coregistered, segmented, and normalized using the deformations derived from the segmentation and a pediatric brain template created with the Template-O-Matic toolbox for the age range 5.9-8.5 years (Wilke et al., 2008). After resampling (3x3x3mm 3 ), the data was smoothed with an isotropic 6 mm full width at half maximum Gaussian kernel. Volumes with more than 1.5 mm scan-to-scan movement were repaired by linear interpolation using the ArtRepair toolbox (Mazaika et al., 2011) and children with more than 10% repaired scans were excluded from analyses. None of the included data sets contained more than 6.35% repaired scans. Due to technical problems and excessive movement at the end of the task, the analyses of two children included only the first three visual blocks (instead of four blocks) for one of the parts. Including six predictors (auditory, visual, congruent, incongruent, target, and response) and six movement parameters for each participant and each part of the experiment (TFF, CFF, LET, DIG), a random-effect generalized linear model (GLM) was calculated. We report results of 2 nd -level random effect analyses one-sample t-tests to characterize the general activation for each character type against baseline and 2 nd -level t-tests based on first level contrasts to determine differences between visual character types of the experiment (Fig. 3a). Furthermore, we also computed whole-brain voxel-wise correlation of the BOLD response to trained false-fonts with training duration (Fig. 3b). ARHQ was included as covariate of no interest in all second level analyses.

Region of interest analyses
We performed ROI analyses based on the results of a meta-analysis (Martin et al., 2015) in the left ventral occipitotemporal cortex (lvOT) and its right hemispheric homologue (rvOT). A sphere with 8 mm radius (MNI coordinates x, y, z: AE52, À60, À14, Fig. 3c) was created using MarsBaR (Brett et al., 2002) with its centers in the inferior temporal gyri. Beta values of the bilateral ROIs were extracted and an LMM with fixed factors hemisphere, and training or character type and ARHQ as covariate of no interest was computed, including a specific random intercept for each subject. To compute LMMs and correlations, the same procedure as for the EEG analyses was used. To examine a potential number sensitive area in more detail, an additional analysis of a literature based spherical ROI in the number form area (NFA; Abboud et al., 2015) of the right hemisphere is presented in the inline supplementary analysis 2 (MNI coordinates x, y, z: 55, À43, À20; r ¼ 8 mm).

Linear mixed model (LMM) analyses in ERP and ROI data
As detailed in the corresponding method sections, we used linear mixed model (LMM) analyses and post-hoc t-tests (Tukey Kramer corrected) to analyze ERP N1 and fMRI ROI data regarding effects of falsefont training (TFF vs CFF) or character type (LET, DIG, CFF). The random effect consists of a subject dependent random intercept. In LMM fixed and random effects explain differences between subjects and the variability within subjects respectively. First, we used LMM models including main effects and interaction terms for all models. Because none of the models (N1, vOT ROI) yielded any significant interaction effects (for N1 amplitude: cluster*training F (2,85) ¼ 0.82, p ¼ 0.4449; cluster*character type F (4,133) ¼ 0.09, p ¼ 0.9849; for ROI vOT beta values: hemisphere*training: F (1,51) ¼ 0.88, p ¼ 0.3513; hemisphere*character type: F (2,81) ¼ 0.08, p ¼ 0.9193) we subsequently only included the main effects in the models (training/character type, cluster/hemisphere) in addition to the covariate ARHQ and report the results of these models in the main text. Only for the significant main effects, we also provide the effect size (f 2 Selya et al., 2012). For all LMM analyses studentized conditional residuals were computed to identify and exclude potential outliers. To correct for variance inhomogeneity, an outlier cutoff of three standard deviations from the mean was used for all analyses (Roth et al., 2007). The number of outliers for each LMM analysis is given in the inline Supplementary Table S4. In addition, QQ-plots were inspected to ensure the assumption of normality and homoscedasticity of predicted versus conditional residual plots.

Connectivity analyses
Seed to voxel functional connectivity analysis was performed using weighted GLM as implemented in the CONN toolbox (Whitfield-Gabrieli and Nieto-Castanon, 2012). The normalized anatomical image of each participant was segmented into white matter (WM), grey matter (GM) and cerebrospinal fluid (CSF) masks. Preprocessed functional data was band-pass filtered from 0.009 to 0.08 Hz and influences of motion, WM and CSF were regressed out using the CompCor strategy (Behzadi et al., 2007). To examine functional connectivity associated with GPC training and character type differences we defined a seed region (seed FFG , see Fig. 3d) within the anatomical left fusiform gyrus (FFG: Talairach Daemon (TD) database (Lancaster et al., 2000); WFU Pickatlas, version 2.4 (Maldjian et al., 2003)) that showed functional activation to either TFF or CFF (logical operation: FFG AND (TFF OR CFF). The seed was defined using MarsBaR (Brett et al., 2002) and for functional activation we applied cluster-level FWE corr p < 0.05 on a cluster-defining threshold (CDT) of p < 0.001. This left FFG seed was flipped to the right to also examine functional activation of the right hemispheric homologue.
For seed-voxel analyses, the residual time course for each seed was extracted and used to generate first-level correlation maps by computing Pearson's correlation coefficients to the time course of all other voxels. To perform second level GLM analyses, the first-level correlation coefficients were converted to normally distributed z-scores using the Fisher transformation. The individual familial risk score was entered as a betweensubject covariate of no interest. In addition, we created a grey matter mask using the tissue probability mask of grey matter of the pediatric brain template. All voxels with a probability >0.5 were defined as grey matter. Within the grey matter mask, a cluster-based FWE-corr threshold of p < 0.05 was applied on a voxel-wise uncorrected threshold of p < 0.001. In analogy to whole-brain correlational fMRI analysis for TFF, we examined the correlation of FFG seed-to-voxel connectivity with training duration (Fig. 3d). Furthermore, one-sample and paired t-tests were computed on regression coefficients to yield functional connectivity maps for each character type (TFF, CFF, LET, DIG; see inline Supplementary Fig. 4 and inline supplementary Table 7) against baseline and their differences (training effect, character type differences).

Results
3.1. Modulation of the visual N1 ERP by false-font training and character type expertise 3.1.1. Effect of false-font training on N1 (Fig. 2b, inline Supplementary  Fig. S1) The LMM of the N1 (194-254 ms) GFP mean amplitude with fixed factor character type (TFF, CFF) revealed no significant effect of training [F (1,17) ¼ 0.01, p ¼ 0.9298] suggesting no global amplitude differences. However, a significant negative correlation of training duration and the N1 GFP to TFF (r ¼ À0.459, p ¼ 0.043; Fig. 2d) was found, showing that shorter training duration was associated with an enhanced N1 negativity.
When examining local differences in the posterior N1 negativity, the  3.1.2. Effect of character type on N1 (Fig. 2c, inline Supplementary Fig. S1) The LMM of the N1 GFP mean amplitude with fixed factor character type (LET, DIG, CFF) revealed no significant main effect [F (2,34) ¼ 1.86, p ¼ 0.1719] suggesting no global amplitude differences. The analysis regarding character type based on an LMM for the three electrode clusters with the fixed factor character type (LET, DIG, CFF) and cluster (LOT, MO, ROT) revealed a significant character type effect [F (2,137) ¼ 40.64, p < 0.0001, f 2 ¼ 0.2034], but no effect of electrode cluster [F (2,137) ¼ 1.56, p ¼ 0.3572]. The character type effect was driven by the significantly stronger N1 for DIG than for the other types (post hoc t-tests: DIG < LET: t (137) ¼ 4.51, p < 0.0001, DIG < CFF: t (137) ¼ 8.93, p < 0.0001).
For additional analyses of training and character type in the preceding P1 interval, please see inline supplementary analysis 1. The results of the extended group (n ¼ 23) supporting the results of the core group regarding the training and the character type effects are presented in the inline Supplementary Fig. S1.
To summarize, N1 ERP results in the core and extended groups show a pronounced training-related difference between TFF and CFF and a character type-dependent modulation of the N1 amplitude when comparing DIG, LET and CFF over the occipitotemporal scalp.

Modulation of the BOLD signal related to training and character type expertise
Next, we report results of second-level whole brain voxel-wise random effect analyses to characterize activation differences evoked by visual processing of the four character types using a cluster-based familywise error corrected (FWE corr ) threshold of p < 0.05 (on a clusterdefining threshold (CDT) of p < 0.001). As expected, all four character types showed pronounced occipitotemporal activation. This activation was bilateral for TFF, CFF and LET but only reached significance in the right hemisphere for DIG ( Fig. 3a & Inline Supplementary Table S5).

Effect of training
No cluster survived cluster extent correction for the contrast of TFF vs CFF on whole brain level, but because of a special focus on this contrast the uncorrected results (p < 0.001) are reported and illustrated in the inline Supplementary Fig. S2 and Table S6).
The training duration of the artificial GPC training correlated significantly with the TFF BOLD response in the left vOT but no other region, demonstrating that a higher activation was associated with faster learning (Fig. 3b, Table 2). Importantly, this whole brain finding survived an even stronger correction (p (FWE corr ) < 0.01, using a CDT of p < 0.001), was also confirmed by the supportive analyses of the enlarged fMRI-group (Inline Supplementary Fig. S3) and thus can clearly be considered as a robust result, unaffected by potential inflated falsepositive rates (Eklund et al., 2016).

Effect of character type
Direct character type contrasts revealed only minor differences in whole brain analyses, mostly driven by single letter processing: LET showed more activation than CFF in the right superior temporal gyrus and the left middle frontal gyrus extending to superior frontal areas when compared with DIG (Fig. 3a, Table 2).
To summarize the whole brain results, single characters activate predominantly bilateral occipitotemporal areas, training duration correlated with activation to trained false-font characters in the left vOT and direct character type contrasts yielded only minor differences in temporal and frontal areas.

ROI analysis in the left and right vOT
The emergence of preferential activation to print in the vOT was further investigated in two a-priori bilateral, literature defined ROIs. Similar to the analysis of the visual ERP N1 we performed two separate LMMs to examine effects related to false-font training and character type expertise.

Effect of false-font training on vOT BOLD
First, the LMM analysis with fixed factors hemisphere (lvOT, rvOT) and training (TFF, CFF) revealed a significant main effect for training [F (1,52) ¼ 4.44, p ¼ 0.0399, f 2 ¼ 0.0833]. The training effect was driven by the significantly stronger activation for TFF compared to CFF (post-hoc ttests: t (52) ¼ 2.11); see also inline Supplementary Fig. S3b for the extended fMRI group).
Additional analyses of the character type effect in a potential number form area ROI, did not yield any differences between character types (inline supplementary analysis 2).
To summarize, in concordance with the ERP N1 results also the ROI results in the vOT exhibited a training effect in the form of enhanced beta values for TFF than CFF. An effect of character type in the vOT was specifically found for LET as compared with CFF. The ROI analysis furthermore pointed to enhanced activation in the right vOT. The effects of training and character type were confirmed by additional analyses of the extended fMRI group. Importantly, the analyses of the extended group also indicated a significant difference between DIG and CFF, suggesting initial alphanumeric tuning (see inline Supplementary  Fig. S3b). 3.4. Enhanced functional connectivity to superior parietal/lateral occipital regions for trained false-fonts Second level random effect results (FWE corr. of p < 0.05, using a CDT of p < 0.001) of bilateral FFG seed-based functional connectivity analyses are summarized in the inline Supplementary Fig. S4 and inline Table 7 for each character type and the corresponding contrasts.
There was no significant difference in the functional connectivity from the left or right FFG seed regions between TFF and CFF. The functional connectivity of the left FFG seed region for TFF showed a significant negative correlation with training duration to a cluster in the left superior parietal gyrus/lateral occipital cortex (LOC) (Fig. 3d, Table 2).

Discussion
Studying the emerging functional activation preference to print in visual areas as a result of letter-speech sound training in the developing brain is highly relevant because it can inform about future reading outcomes of children. Such insights are especially important for children at an increased risk for developing reading problems such as the group examined in this study. Preschool neural measures related to language processing and learning performance may critically contribute to the early identification of children with poor reading outcomes Brem et al., 2013;Hoeft et al., 2007b;Karipidis et al., 2018;Maurer et al., 2009;Raschle et al., 2012) and to provide timely and efficient support for struggling learners.
Focused on the case of the well-known functional specialization of the left vOT/N1 to print in literates, this study aimed to clarify which processes drive the often reported preferential activation to print by comparing the effect of varying expertise and visual familiarity on the processing of characters in prereading children at varying risk for developmental dyslexia. Our approach included, first, an artificial GPC training that led to learning of phonological associations to print. After training, we examined the sensitivity of vOT/N1 activations to the trained false-fonts compared to control false-fonts that children had passively viewed but not actively associated with speech sounds. We further relate the activation and functional connectivity to trained falsefont characters to learning performance in the preceding training session. Secondly, we compared activation patterns of artificial false-font characters to culturally meaningful characters, i.e., letters and digits, for which preschool children exhibit different levels of expertise. This comparison yielded further insights into how different levels of expertise related to phonological, semantic and magnitude associations may shape the vOT and N1 specialization in the brain of preschool children.

Neural specialization to print after phonological association training and modulation by training performance
Learning to associate artificial characters to speech sounds was reflected in preferential activation for trained compared with passively viewed characters in the visual N1 ERP and in the vOT cortex BOLD response. This result demonstrates the development of preferential activation after short artificial GPC training and critically extends previous knowledge about the development of functional specialization to print in children (Brem et al., 2010;Centanni et al., 2018;Dehaene--Lambertz et al., 2018;Fraga Gonzalez et al., 2016;Fraga Gonz alez et al., 2014;James, 2010;Maurer et al., 2006). We show that a short (<30min) artificial GPC training already induces rapid adaptation in the functional response of the ventral occipitotemporal cortex in preschool children. Changes in the vOT activity (Hashimoto and Sakai, 2004;Song et al., 2010;Xue and Poldrack, 2007) and stronger N1 activation have previously been reported when adults learn print-like stimuli Maurer et al., 2010;McCandliss et al., 1997) and with increasing literacy (Dehaene et al., 2010;Pegado et al., 2014). Studies in poor reading children within the first years of school, related N1 sensitivity to words to the outcome of a relatively short grapheme-phoneme training (Fraga Gonzalez et al., 2016) and to the children's response to reading intervention (Molfese et al., 2013). In addition, preferential activation to real words has also been shown in prereaders by training real grapheme-phoneme correspondences over several weeks, even though the children's reading skills were still rudimental after training (Brem et al., 2010).
Such training effects suggest that learning grapheme -phoneme correspondences is a key factor of specialization processes and preferential activation to print in the vOT. In addition, here we quantified the ease of learning by the duration of the adaptive artificial GPC training. Accounting for the ease of learning in the GPC training revealed that the faster children learned new associations, the stronger was the corresponding preferential response of the vOT and N1 to single trained falsefonts and the functional connectivity of the left FFG seed region to the left SPL/LOC. This result supports the notion of expertise-dependent activation in the vOT Maurer and McCandliss, 2007;Price and Devlin, 2011) emerging in the prereading brain through graphophonological association training. The preferential vOT activation to trained characters may be indicating a process of building up abstract neural letter form representations, similar to the notion of the VWFA as a "prelexical hub" for computation and storing abstract orthographic word forms (Dehaene et al., , 2005. Increasing visual familiarity through training may partly explain the results of this study. The negative correlations of the N1 and BOLD signals with training duration, however, clearly argue against a pure visual familiarity effect because longer training durations, and thus more exposure to the false-fonts, were associated with decreased and not increased brain responses to trained false-font characters in the children. Instead, the negative correlation of the N1 and vOT activation with training duration, reflecting performance in the artificial GPC training, supports the notion of a visual expertise effect related to learning of phonological associations to print. Thus, the preferential activation to trained characters in the vOT may result from establishing novel circuits to higher order cognitive areas in the prereading brain, which may in turn, provide predictions about the content of the visual input (Price and Devlin, 2011). In such a model, the left vOT could be best described as an "integrative hub", matching and evaluating the information received from different brain regions (Carreiras et al., 2014;Price and Devlin, 2011). The negative correlation of increased functional connectivity between the left FFG seed and the left superior parietal/lateral occipital cortex with training duration is in line with previous findings of character-speech sound association learning in adults indicating that this region exerts a critical role in cortical plasticity related to learning novel letters (Hashimoto and Sakai, 2004). Our results in young children show that this is also the case during the prereading stage. Together with the ERP and the functional activation findings our connectivity results indicate a phonologically guided tuning of the preferential vOT/N1 activation.

vOT/N1 activation to character types with varying expertise
In addition to the experimental manipulation of expertise through phonological association training, we also compared character types (LET, DIG vs. CFF) naturally varying in their level of expertise regarding phonological, semantic and/or magnitude associations that is built up and refined over the course of child development. Our analyses revealed that different character types show distinct encoding in bilateral vOT and in the corresponding visual N1 at preschool age already.
First, the N1 ERP mean amplitude showed the most pronounced activation over all character types for digits. N1 amplitude modulations have been strongly associated with the level of expertise in specific visual categories (Rossion et al., 2002;Tanaka and Curran, 2001), including print Maurer et al., 2005). The pronounced ERP to digits may thus simply reflect the advanced neural specialization to this visual character category, with high expertise at preschool age. However, purely visual specialization developed through increased familiarity and exposure would hardly explain why letters showed weaker N1 amplitudes than digits, as we would expect ample visual familiarity with both letters and digits at this age. Thus, the increased N1 responses of prereaders to digits seem to be driven by expertise related to phonological, semantic, and magnitude associations.
This coincides nicely with the description that gaining visual experience goes along with the recruitment and involvement of multiple highlevel areas and the notion of visual expertise as an "enhanced, ecological form of visual object recognition that emerges from the reciprocal interactions between multiple top-down factors, such as semantic knowledge, attention, and task relevance, and the visual system." (Harel (2016), p 88). Unlike the ERP results, an equal BOLD response was detected for digits and letters in the vOT cortex. Our results are thus in line with the findings of Cantlon et al. (2011), showing no fine tuning between numbers and letters in prereaders in our implicit task. Neither was such fine tuning observed for digits and letters in a region specialized for number-form processing in adults or school children (Abboud et al., 2015;Shum et al., 2013). This suggests the absence of preferential activation to digits over letters or false-fonts in the putative number form area at this developmental stage. Our results coincide with those of Cantlon et al. (2011) in showing higher activation to alphanumeric characters (letters, digits) than to false-font characters. This alphanumeric tuning only approached a trend in the main analysis but was significant in the enlarged fMRI sample (see inline supplement Fig. 3b). Moreover, our pattern of alphanumeric tuning in bilateral vOT extends the findings of the former study in showing an even finer level of tuning, because the processing difference between objects and alphanumeric characters (Cantlon et al., 2011) may be expected to be greater than between the alphanumeric and matched false-font characters used in our study.
Given the absence of any processing difference between letters and digits in the vOT, it is difficult to disentangle the contribution to vOT specialization of visual familiarity from that of expertise with phonological and magnitude information. Visual familiarity clearly contributes to the specialization of vOT activation, but mere visual familiarity does not explain why letters and trained false-fonts resulted in similar vOT BOLD and N1 activation (additional direct comparisons of TFF and LET yielded no significant differences neither in the N1 ERP nor vOT BOLD responses: both p corr > 0.6). The children were exposed to the trained false-fonts for only about 20 min; it is unlikely that such brief training led to visual familiarity comparable to the accumulated visual exposure to letters over the first six years of life.
Alternatively, with regard to the suggestion of the VOT as an integrative hub, temporally co-occurring predictions from multiple cognitive systems such as phonology, semantics and/or magnitude may induce higher prediction errors as compared to character types where none (untrained false-fonts) or only weak (letters) predictions are expected given the low level of phonological or lexical expertise for such characters (Brem et al., 2010;Maurer et al., 2005;Park et al., 2014). Such a model would also account for the differential N1 amplitude to trained and untrained false-fonts reported in the previous section, because relatively stable phonological associations may initiate feedback projections from phonological areas.

Contribution of attentional mechanisms and lateralization of functional specialization
A series of previous studies suggest that the activation of the vOT or N1 is modulated by attentional mechanisms through its link to the dorsal attention network (Christophel et al., 2018;Cohen et al., 2008;Luck et al., 2000;Vogel et al., 2012;Vogel and Luck, 2000;Yoncheva et al., 2015). Vogel et al. (2012) used resting state functional connectivity analyses to show that the posterior part of the visual word form system is strongly connected to the dorsal attention system including the frontal and inferior parietal cortex. This connectivity suggests an important role in directing attention to the critical information within words and sentences. Importantly, this connectivity to the dorsal attention system was diminished in children and dependent on reading ability, indicating that the contribution of the attentional network becomes more important with fluent reading (Vogel et al., 2012).
Other studies examined the effects of attentional processes on the visual N1. The ERP study by Yoncheva et al. (2015) showed that attentional focus on either single grapheme-phoneme correspondences or whole words during training has an effect on how words consisting of such trained characters are explicitly processed in the N1 after training: A more left-lateralized N1 negativity was found after training graphophonological associations but not after whole-word association training. Our findings of activation preference to specific visual character types (TFF, DIG, LET) reflected in both the vOT and N1 ERP activation may thus also be attributed to the contribution of attentional mechanisms during the graphophonological association training or during the implicit task. The greater functional connectivity from the lvOT to the left SPL/LOC for trained false-font characters with shorter training duration may correspondingly indicate the increased involvement of general attentional mechanisms of a parietal network in processing the newly learned false-font characters. Previous studies emphasized the increased involvement of the posterior parietal cortex in preorthographic processing of multi-element character strings (Lobier et al., 2012) and showed higher connectivity from VWFA to SPL in normal reading as compared to children with dyslexia (van der Mark et al., 2011). Importantly, effects on the N1/vOT in previous studies were especially pronounced when attention was specifically manipulated, such as in spatial attention allocation tasks , in working memory tasks with explicit recall of attended or unattended stimuli (Christophel et al., 2018), when comparing choice reaction to simple reaction time tasks , and in object expertise tasks with attended and unattended items (Harel, 2016). No such explicit manipulation of attentional focus on our core stimuli (TFF, CFF, LET, DIG) was done in our implicit target detection task. Moreover, attention was directed to unrelated and qualitatively different target items (drawings and environmental sounds), and it is thus rather unlikely that activation differences in character type processing are solely attributable to attention allocation in our task. However, experimental paradigms that specifically manipulate control over attentional top-down influences during task performance would be useful in improving our understanding and clarifying the potential impact of attention on vOT specialization.

Neural specialization to characters beyond the vOT
In our fMRI analysis, direct character type contrasts at whole-brain level revealed only minor differences, mostly driven by single letter processing. First, the activation pattern in the right superior temporal gyrus (STG) was mainly driven by higher activation to letters than to passively viewed false-font characters. The STG has shown anatomical alterations in relation to reading skills, such as reductions in grey matter volume in dyslexic children (Richlan et al., 2013) and in at-risk children at prereading age (Black et al., 2012;Raschle et al., 2011a) and increased grey matter volume in subjects who learned to read in adulthood (Carreiras et al., 2009). Learning to read and increased reading performance is usually accompanied by increases in lateralization of occipitotemporal reading networks to the left hemisphere Shaywitz et al., 2007;Turkeltaub et al., 2003). In our young children, BOLD analyses of the vOT and whole brain analyses indicate some stronger engagement of the right than the left hemisphere. Whether or not this stronger right-hemispheric activation and the enhanced activation for letters in the right STG reflect some compensatory processing (Shaywitz et al., 2003) or atypical lateralization of language processing in children at heightened risk for dyslexia Hoeft et al., 2011) needs to be clarified in future studies. Nevertheless, the STG region has been associated with the processing of speech and phonology, and the left STG in particular has been implicated in graphophonological decoding (Jobard et al., 2003). Given the lower hemispheric specialization in young children (Ossowski and Behrmann, 2015) increased activation in the right hemisphere could also indicate early attempts to match letter and speech-sound information van Atteveldt et al., 2004). Second, letters showed stronger activation in the left middle frontal gyrus (MFG) of the dorsolateral prefrontal cortex than did digits, which could indicate increased attentional resources and/or enhanced control functions implicated in processing characters with less expertise (Houd e et al., 2010).

Limitations and further questions
Several limitations of the present study should be discussed. The first is the rather small sample examined here. The application of simultaneous EEG and fMRI in children as young as kindergarteners is unique and allows the analysis and comparison of emerging specializations with high temporal and spatial resolution. This approach made visible convergent patterns of vOT BOLD and N1 ERP, such as enhanced activation to false-fonts after training and modulation by training duration. However, it also revealed divergent ERP/BOLD patterns such as the dominant N1 negativity over the posterior scalp to digits as compared with the more general alphanumeric tuning in the vOT BOLD signal. However, such simultaneous EEG-fMRI is also challenging: Motion artifacts distort both fMRI and EEG data quality, and the limited test time available with such young children affects the duration, number of trials, signal-to-noise ratio and robustness of the task. As a consequence, and given also the stringent motion artefact exclusion criteria that we applied, only 18 children with sufficient EEG and fMRI data quality remained for the core analyses, and this sample included two left-handed children. Despite a higher incidence of right dominant and bilateral language networks (Szaflarski et al., 2002) in left-handed subjects than right-handed ones, we still included the two left-handed children. Language lateralization is not expected to be fully mature in our preschool children and left-hemispheric language lateralization dominates in left-handed subjects (Somers et al., 2015;Szaflarski et al., 2002). We addressed these problems by providing additional analyses of extended ERP and fMRI groups including 23 or 24 children in the inline supplement and by verifying these analyses without the left-handed children. These analyses largely support the analyses of the smaller core sample.
Secondly, our preschool children all had a varying risk for developmental dyslexia, most of them either from the parents' or a sibling's reading history. A better understanding of graphophonological association learning and emerging neural specialization in such children is of special relevance for improving the early identification of children with poor reading outcomes. However, our focus on at-risk children raises the question of how representative our results are for the general population, given that at-risk children have shown alterations in brain structure and function in previous studies Maurer et al., 2003;Raschle et al., 2011a). It should be noted that our children had a varying familial risk level, as can be seen in the widely differing ARHQ values. To address the varying risk level, we corrected for the individual familial risk score by including the maximum ARHQ risk factor in all statistical analyses as a covariate of no interest, and we also address the potential differences of such a risk group in the discussion section. However, it is essential that future studies examine emerging specialization in a low-risk sample.
Finally, we acknowledge that it is difficult to quantify the degree of visual familiarity or expertise children gained with the control false-font characters during the training, because they were only passively presented and not actively trained. It may have been the case that some children did not attend to these stimuli and did not gain visual familiarity with these control characters. Therefore, we cannot fully disentangle the impact of phonological association learning from the effect of increasing visual familiarity by comparing trained and passively viewed false-font characters in this study. Even though it is unlikely, given our explanations in the previous sections, that the difference between trained and control false-fonts relied entirely on visual processes, this aspect should be tested in a future study by having children train two sets of characters, one with phonological associations and the other in a purely visual approach. Such a design, when also controlling for attentional factors, could further help to quantify the contributions of different aspects and processes to neural specialization of the vOT.

Conclusion
Here, we show altered visual character processing in the prereading brain after short artificial grapheme-phoneme correspondence training (<30 min) as a model mimicking one of the first steps in reading acquisition. Emerging functional specialization was reflected in more pronounced activation for trained false-font characters than for untrained ones, irrespective of imaging modality, and a strong relationship between the ease of learning and activation in the left vOT. Moreover, we show distinct expertise-dependent activation differences for the vOT and the visual N1, including emerging alphanumeric tuning in the visual word from system at that early age. As grapheme-phoneme learning is considered the core principle of acquiring alphabetic languages (Blomert, 2011), learning performance in such model training and the associated level of neural specialization in prereaders might indicate later success in reading acquisition (Karipidis et al., 2018). These novel insights into learning-dependent early specialization of the vOT to print may thus contribute to the identification of prereaders at risk for reading difficulties and hence, allow the development of early targeted interventions. Finally, the current insights also pinpoint the parallel development of functional circuits and cortical specialization in the developing brain during learning.

Competing financial interests
The authors declare no competing financial interests. Further information and requests for resources should be directed and will be fulfilled by the Lead Contact, Silvia Brem (sbrem@kjpd.uzh.ch). Some restrictions apply for the sharing of raw and processed data for ethical reasons, because this would compromise participant confidentiality and privacy. The artificial grapheme-phoneme correspondence training was developed at the Department of Child and Adolescent Psychiatry and Psychotherapy, Psychiatric Hospital, University of Zurich using the GraphoGame platform provided by the University Jyv€ askyl€ a and is subjected to copyright. Inline Fig. S1. N1 ERP amplitude analyses with extended group (n ¼ 23). Verification of N1 EEG analyses with an enlarged group of 23 subjects (Inline Table S1). In this group, 12 children trained FF set 1, and 11 children with FF set 2. a) The N1 amplitude analyses substantiate the results of the main text in showing the same character type and training effects. N1 potential field maps of the grand averages (first row, in μV) for each character type and the statistical t-maps (below), illustrating the differences between character types. Effect related to training: The LMM for the electrode clusters with fixed factor training (TFF, CFF) and cluster (LOT, MO, ROT) revealed a significant effect of training [F(1,112) ¼ 9.64, p ¼ 0.0024, f 2 ¼ 0.086; TFF < CFF, t(112) ¼ -3.1 but no effect of cluster [F(2,112) ¼ 0.53, p ¼ 0.592]. Effect related to character type: The LMM for the electrode clusters with fixed factor character type (LET, DIG, CFF) and cluster (LOT, MO, ROT) revealed a significant effect of character type [F(2,174) ¼ 37.89, p < 0.0001, f 2 ¼ 0.287) but no effect of cluster [F(2,174) ¼ 2.06, p ¼ 0.1308]. Contrasts between character types indicated significantly stronger N1 amplitudes to DIG as compared with LET (t(174) ¼ 4.35, p < 0.0001) and CFF (t(174) ¼ 8.69, p < 0.0001). b) The GFP for the N1 TFF correlated marginally with training measures (training duration: Pearson r ¼ À0.40, p ¼ 0.073).

Inline Supplementary
Inline Fig. S2. Effect of training in MTG.
fMRI whole brain analysis of the core group (n ¼ 18): Training effect. The whole brain analysis of the contrast TFF vs CFF showed a bilateral middle temporal gyrus (MTG) activation (MNI x ¼ 49/-56, y ¼ À12/-6, z ¼ À15/-18, Inline supplementary table S6) which was more pronounced for trained as compared to control characters. Given that this activation did not survive the a priori defined cluster correction for multiple comparisons, this marginal effect shall be interpreted with care. Activation in the MTG has been related to integration of speech sounds and print in the process of reading acquisition and the bilateral activation could be explained by a diminished lateralization in language processing in young children (activation on axial and sagittal slices shown for p < 0.001 (uncorrected) cluster size left MTG k ¼ 20, right MTG k ¼ 17).
Inline Fig. S3. Whole brain correlation and vOT ROI analyses with extended group (n ¼ 24).
To verify the training effect, fMRI data of an enlarged group, including 24 children (Inline Table S1), was analysed. In this group, 12 children trained with FF set 1 and the other with FF set 2. This enlarged fMRI group consisted of data sets meeting data quality criteria for TFF and CFF. The additional analyses confirmed the results of the core group. (a) Correlation of BOLD responses on whole brain level to TFF in the left vOT ROI with children's training duration in the GPC training. Faster learning was correlated with significantly higher activation in the left vOT (small volume corrected p(FWEcorr) < 0.05). Below, the negative correlation of the significant vOT cluster activation is plotted against training duration for illustration purposes. (b) ROI analyses in the vOT. Top: The LMM with fixed factors hemisphere (l, r), and training (TFF, CFF) showed a significant main effect of training [F(1,70) ¼ 8.43, p ¼ 0.0049, f 2 ¼ 0.1175]. When lowering our stringent motion quality inclusion criteria we were able to rerun the same LMM including for the character type effect (LET, DIG, CFF) and hemisphere (lvOT, rvOT) in the extended sample (n ¼ 24), whereby one child did not conduct the DIG part of the experiment. This analysis showed a significant main effect of character type [F(2,107) ¼ 6.02, p ¼ 0.0033, f 2 ¼ 0.1069] and no significant main effect of hemisphere [F(1,107) ¼ 2.03, p ¼ 0.1573]. Comparing character types revealed stronger activation for LET than CFF (t(107) ¼ 2.72, p ¼ 0.0209), and for DIG than CFF (t(107) ¼ 3.2, p ¼ 0.0051).
Inline Fig. S4. Functional connectivity results with left and right FFG seeds.

Inline Supplementary Analysis 1. P1 interval (102-162 ms)
Despite differences between visual categories in the N1 ERP interval, previous studies also indicated some effects in the preceding posterior P1 (~100 ms) ERP. Category effects on the P1 have been demonstrated between words and objects (e.g. Schendan et al. (1998)) and also between words and symbol strings in adults  though less consistently than in the visual N1. Based on previous findings on P1 differences we also examined the P1 interval in the current study. In analogy to the N1 analyses we used the global field power (GFP) maxima of the mean ERPs over all four character types to define the interval of the P1 as AE30 ms (102-162 ms) around the GFP peak. The same three posterior electrode clusters (LOT, MOT, ROT) were used for analyses.
The LMM analysis of the P1 global field power (GFP) mean amplitude (102-162 ms) with fixed factor training (TFF, CFF) revealed neither a significant training effect [F(1,17) ¼ 0.14, p ¼ 0.7112] nor a significant character type effect [F(2,34) ¼ 1.98, p ¼ 0.1543] and thus no global differences in the scalp potential related to graphophonological association training or character type. The LMM for the predefined posterior electrode clusters with fixed factor training and cluster (LOT, MO, ROT) revealed neither a training [F(1,87) ¼ 2.60, p ¼ 0.1103] nor a cluster effect [F(2,87) ¼ 1.09, p ¼ 0.3395]. The LMM for the predefined posterior electrode clusters with fixed factor character type and cluster (LOT, MO, ROT) showed a significant character type effect [F(2,140) ¼ 4.57, p ¼ 0.0120, f 2 ¼ 0.045827] driven by a stronger P1 mean amplitude to LET compared to CFF [post-hoc t-tests: LET > CFF: t(140) ¼ 3, p < 0.0089]. These results indicate some character type differences prior to the N1 in the posterior bilateral positivity P1 after around 130 ms. In contrast to the N1, no modulation of the P1 through phonological association learning was detected for false fonts.

Inline Supplementary Analysis 2. No character type effects in the number form area
To examine preferential activation to numbers in more detail, activation of a potential number sensitive area was analysed, by determining the beta values to all four character types in a literature based spherical ROI in the number form area (NFA; Abboud et al. (2015)) of the right hemisphere (MNI coordinates x, y, z: 55, À43, À20; r ¼ 8 mm). This additional ROI analysis in the NFA did not reveal number sensitive activation differences in our kindergarten children [character type: F(3,50) ¼ 0.99, p ¼ 0.4034].