Learning and retrieving holistic and componential visual-verbal associations in reading and object naming

Understanding the neural processes that underlie learning to read can provide a scientific foundation for literacy education but studying these processes in real-world contexts remains challenging. We present behavioural data from adult participants learning to read artificial words and name artificial objects over two days. Learning profiles and generalisation confirmed that componential learning of visual-verbal associations distinguishes reading from object naming. Functional MRI data collected on the second day allowed us to identify the neural systems that support componential reading as distinct from systems supporting holistic visual-verbal associations in object naming. Results showed increased activation in posterior ventral occipitotemporal (vOT), parietal, and frontal cortices when reading an artificial orthography compared to naming artificial objects, and the reverse profile in anterior vOT regions. However, activation differences between trained and untrained words were absent, suggesting a lack of cortical representations for whole words. Despite this, hippocampal responses provided some evidence for overnight consolidation of both words and objects learned on day 1. The comparison between neural activity for artificial words and objects showed extensive overlap with systems differentially engaged for real object naming and English word/pseudoword reading in the same participants. These findings therefore provide evidence that artificial learning paradigms offer an alternative method for studying the neural systems supporting language and literacy. Implications for literacy acquisition are discussed.


Background
In recent years the education literature has settled upon phonics as the only evidence-based method of teaching reading (Torgerson et al., 2006;Wyse and Goswami, 2008). Indeed, in the UK, the Rose Review (Rose, 2006) recommended that synthetic phonics, which involves explicit instruction in letter-sound decoding and blending, should underlie early reading instruction. This provides children with the primary skill of being able to translate print to sound. Whole-word methods of reading instruction instead argue for the primacy of meaning in reading, with knowledge of letter-to-sound mappings being acquired through exposure to meaningful text. In this case the primary skill of reading should not be translating print to sound, but instead print to meaning. Correspondingly the focus of early learning in wholeword reading schemes is to recognise whole 'sight words', rather than decoding the letter-sound correspondences within each word. Thus, many are sceptical of whether, in line with the Rose Review, phonics should be taught "first and fast". While experimental data has an important role to play in this activity, Wyse and Goswami (2008) note that very few naturalistic studies comparing different methods of reading instruction meet rigorous experimental standards. In this paper we consider whether laboratory studies of holistic and componential visual-to-verbal learning may offer a way to address educational questions in a controlled manner.
The distinction between recognising whole-words and decoding letter-by-letter in the educational literature is mirrored to a large extent by findings from cognitive research on reading. Cognitive models of reading, such as the Dual Route Cascaded (DRC) and triangle model, reflect the distinction between holistic and componential processing by suggesting that the meaning of a written word can be accessed in more than one manner (Coltheart et al., 2001;Plaut et al., 1996). For example, in the DRC model, words can be read componentially by decoding letter-by-letter (sub-lexical route), or can be mapped onto their pronunciations and meanings directly by recognising the whole word form (lexical route). It is the componential relationship between visual and phonological forms in alphabetic languages that means we can read pseudowords, e.g. 'spape', using our knowledge of lettersound mappings. In contrast, to read an irregular word (e.g. 'pint') we must have whole-word knowledge to know that it does not sound similar to words that share the same orthographic components ('mint', 'hint', etc.). In the triangle model (Plaut et al., 1996) the mappings between written (orthographic) and spoken (phonological) forms are componential; this model does not contain whole-word, or lexical, representations of this information. However, in this model, the relationship between a familiar word's written form and its meaning is more holistic and item-specific, since the form-to-meaning mapping cannot be broken down into sub-components, at least for monomorphemic words (i.e. most monosyllabic words). Furthermore, this itemspecific knowledge is proposed to be important for irregular word reading, helping them to be pronounced differently from similarly spelled regular words. Thus, both the DRC and the triangle model propose that reading involves both componential and whole-word knowledge, with the former being more important for pseudowords or less familiar words, and the latter more important for words, in particular those with irregular spellings.
Although both componential (sub-lexical) and holistic (lexical) processes are involved in skilled reading it is not clear how the relative importance of these skills changes as we learn to read. The goal of the present study was to advance our understanding of the initial stages of reading acquisition by exploring the neural basis of componential and holistic processing. To do so we compared learning to read an artificial alphabetic orthography with systematic symbol-to-sound mappings with learning names for novel objects with arbitrarily associated names.

Neural bases of holistic and componential processes in reading
The ventral occipito-temporal (vOT) cortex, including posterior and anterior fusiform, inferior temporal, and lateral occipital regions, has been suggested to play an important role in visual processing of orthographic information (Cohen et al., 2000;Cohen et al., 2002;Dehaene et al., 2002). A variety of evidence suggests that these visual processes are hierarchically organised with componential representations of individual letters and letter sequences in posterior temporal and occipital regions, and more holistic representations of whole words in anterior temporal regions (Dehaene et al., 2005;Taylor et al., 2013). For example, Mechelli et al. (2005) found that posterior fusiform activation was greater for pseudowords than for irregular words such as 'pint', whereas anterior fusiform showed the reverse profile. Likewise, Vinckier et al. (2007) showed a hierarchy of neural representations of letter strings in vOT: more posterior vOT activated for all stimuli (including consonant strings and false fonts), whereas mid-to anterior-fusiform regions were only activated for letter sequences that contained familiar letter combinations. In addition, Seghier et al. (2008) found that adult readers who were slower at reading pseudowords than irregular words showed additional activation in both left inferior parietal and left posterior occipito-temporal cortices, reflecting increased effort in componential reading processes. In contrast slower reading of irregular words was associated with increased activation in left anterior occipito-temporal and left ventral inferior frontal regions. These findings support the idea that posterior fusiform and occipitotemporal cortex process parts of words whereas anterior fusiform processes whole-word forms. Debate continues concerning whether this vOT hierarchy includes brain regions that uniquely contribute to reading (Dehaene and Cohen, 2011), or are shared with other domains in which visual and phonological information is associated, e.g. object naming .
In addition to these posterior occipito-temporal regions, a number of other brain areas have been shown to contribute to componential reading processes, as highlighted by contrasting pseudoword and word reading (see review and meta-analyses by Taylor et al. (2013), Cattinelli et al. (2013)). Pseudoword relative to word reading activates left inferior frontal and precentral gyri, which are involved in phonological output processes, left inferior parietal cortex, which may be involved in mapping letters to sounds, and left posterior occipito-temporal cortex, which may contribute to sub-lexical analyses of written word forms. The reverse contrast of word relative to pseudoword reading, capturing holistic reading processes, activates left middle temporal and angular gyri, regions which may support semantic processing (see Taylor et al. (2013) for discussion).
In summary the componential and holistic processes that underlie reading appear to be supported by different neural systems (holistic reading in anterior vOT regions, componential reading in posterior vOT, inferior parietal cortex, and inferior frontal gyrus). However, as discussed at the outset, the relative role of holistic and componential processes in learning to read is not clear. Experimental evidence of the relative contribution of these neural systems in the initial stages of reading instruction might therefore contribute to a scientific understanding of debates between phonic and whole-word approaches to reading acquisition.

Neural contributions to learning to read
There are two broad methods by which neuroscientists have studied the brain changes associated with the emergence of literacy (see Dehaene et al. (2015) for a review). The first of these is to explore neural activity in children at different stages of learning to read. Activation in vOT to words has been shown in young children in tasks involving sub-lexical processing such as single letter naming (Turkeltaub et al., 2008) and associating letters with sounds (Brem et al., 2010), but also for lexical tasks such as single word reading (Church et al., 2008). Furthermore, a meta-analysis of 40 imaging studies showed that both child and adult readers showed activation in left vOT, inferior frontal, and posterior parietal regions (Martin et al., 2015). However, there were also age-related differences: activation was more consistently observed in posterior fusiform regions for adult than child readers, possibly reflecting increased sensitivity in adults to the differences between letters and control stimuli. Tracking neural changes in a single group of children over four years, Ben-Shachar et al. (2011) showed that the sensitivity of left vOT to written words increased as reading improved, and that this was correlated with sight word naming accuracy but not with measures of pseudoword reading. Furthermore, the spatial extent of the cortical region sensitive to visual words increased as children got older before decreasing until reaching adult level. This changing response may reflect the region initially becoming more engaged for orthographic inputs before later becoming more efficient as specialisation takes place, following an inverted-u shaped profile (Ben-Shachar et al., 2011;Price and Devlin, 2011). Taken together, these results suggest that vOT regions become more sensitive to orthographic information with increased age/proficiency but it is not clear whether this change is linked to holistic or componential reading processes.
Parietal activation in children has primarily been shown in tasks involving mappings between visual words and sounds, (e.g., Bitan et al., 2006Bitan et al., , 2007aBitan et al., , 2007bCao et al., 2006;Hoeft et al., 2007). For example children making spelling (orthographic) or rhyme (phonological) judgements about visually presented words showed increased activation in bilateral inferior/superior parietal lobules for spelling compared to rhyme judgements (Bitan et al., 2007a). Likewise Hoeft et al. (2007) found that activation in left inferior parietal lobes correlated with composite behavioural measures of phonics ability in children. Further evidence that parietal regions support the componential aspects of reading early in development comes from Cao et al. (2015) who compared adult and child English and Chinese speakers in a visual word rhyming task. Reading skill in English speaking children was correlated with activation in left inferior parietal lobule. The same was not true for Chinese speaking children, lending support to the idea that early reading in English, with its reliance on componential letter-sound mappings, engages left parietal regions more than logographic reading in Chinese readers.
One problem with studies comparing children and adults is that it can be difficult to distinguish neural changes due to increased proficiency from changes due to maturation. The second approach to studying literacy-related changes in the brain circumvents this problem by examining functional and structural changes in adults who learned to read later in life. Dehaene et al. (2010) found that, compared to illiterate adults, both adults reading from childhood and late-learners showed greater activation to written words in left fusiform gyrus as well as language regions such as left superior temporal sulcus and left inferior frontal gyrus. Furthermore, adults who learned to read later in life showed increased grey matter in angular gyri, dorsal occipital, middle temporal, supramarginal, and superior temporal gyri, in comparison to illiterate adults (Carreiras et al., 2009).
In summary, evidence from studies of beginning readers and exilliterate adults has shown increased contributions of vOT regions with increased reading skill. It remains unclear, however, whether these contribute to holistic or componential processing of written words. Evidence for componential processes seems to point to inferior parietal regions which might play a preferential role in initial stages of acquisition. This might be taken as consistent with the componential, phonics-based educational literature introduced at the outset which similarly suggests that initial stages of teaching should focus on componential decoding skills. One possible challenge however, is that these studies with children and adults have only explored relatively late stages of reading acquisition. It would be extremely difficult to attempt to scan children in their first months of literacy learning (in the UK, this would require scanning 4-year old children since reading instruction begins at that age). Therefore, in the present work we explore the initial stages of reading instruction for adults learning to read in an artificial orthography. To the extent that changes in vOT and parietal brain activity for holistic and componential learning parallel activation seen during reading development we may be confident in attributing neural changes to the balance of these two underlying processes.

Using artificial orthographies to study the early stages of literacy acquisition
Laboratory-based learning paradigms offer a valuable method to address questions about reading, allowing a degree of experimental control impossible to achieve in naturalistic learning situations. Here we review previous studies that provide methods for distinguishing holistic and componential learning of artificial stimuli. One such artificial language learning study taught two groups of participants to read a single set of stimuli with either alphabetic (componential) or logographic (holistic) mappings to phonology over the course of eight days (Mei et al., 2013). Imaging results showed increased leftlateralisation for the more componential orthography, particularly in posterior fusiform regions. Likewise, ERP studies have reported left lateralised fusiform responses for componential as opposed to holistic learning of artificial orthographies (Yoncheva et al., 2010(Yoncheva et al., , 2015. The current study adapts and extends the methods used by a previous study that compared the neural bases of learning holistic and componential visual-verbal mappings using artificial objects or artificial written words. Taylor et al. (2014a) taught participants to name artificial objects and to read words written in an artificial orthography, whilst neural activity was measured with fMRI. Imaging results showed that learning to name objects preferentially activated bilateral anterior fusiform gyri, whereas learning to read words activated bilateral inferior parietal cortices. Taylor et al. therefore suggested that anterior fusiform is associated with whole item visual-verbal associations whereas inferior parietal cortex is involved in componential visualverbal mappings. However, this componential interpretation of parietal contributions remains controversial; Takashima et al. (2014) have suggested that extensive training on a small set of words written in Greek script provided fMRI evidence for holistic representations in brain regions close to the inferior parietal cortex (such as the angular gyrus, precuneus, and middle temporal gyrus). A key question from these results, then, is how, where and when do holistic representations of written words emerge?

The current study rationale
Here we set out to track the changes in neural activity that occur over the first two days of learning to read and name artificial words and objects. In doing so we build on previous work on the neural systems previously identified in learning holistic and componential visualverbal associations. We extend this earlier work by asking: (1) do we see a similar neural dissociation during retrieval of holistic and componential visual-verbal associations. Furthermore, we ask whether neural activity associated with retrieval of written words offers evidence of the emergence of whole-word representations, (2) in the context of overnight consolidation and (3) by comparison with untrained words. Finally, we assess: (4) the extent to which neural activity associated with reading and naming artificial words and objects overlaps with regions involved in holistic and componential processing of real words, objects, and pseudowords. As we will explain below, these four elements substantially extend the findings of Taylor et al. (2014a) while using similar methods.
In the current study, we trained participants outside of the scanner to read artificial words written in an artificial orthography, and to name artificial objects. Critically, the artificial written words had a componential and systematic mapping between the visual (letters) and verbal (sounds) forms, whereas artificial objects had a holistic and arbitrary relationship between the visual and verbal forms. That is to say, if participants successfully learn the componential letter-sound mappings of the written words then they should be able to generalise this knowledge in order to read unfamiliar words. By contrast because the relationship between the visual and verbal forms of an object is arbitrary and holistic; it is not possible to name an unfamiliar object.
Whilst Taylor et al. (2014a) focussed on measuring neural activity during the learning of holistic and componential visual-verbal mappings by training participants during scanning, here we focus on activation during retrieval of knowledge previously acquired outside of the scanner. This design allowed us to dedicate more time to testing neural activity associated with reading or naming of trained items during scanning. We were also able to adopt an event-related fMRI design in which written words and objects were intermixed, in contrast to the blocked design used by Taylor et al. (2014a). This may be more sensitive to detecting activation differences between words and objects (Josephs and Henson, 1999) and increases the likelihood that any activation differences between reading written words and naming objects are due to the immediate demands of processing of componential and holistic visual-verbal associations, and not due to longerterm differences in the strategy adopted in different testing blocks.
In order to investigate the time-course over which holistic representations of written words may emerge we adapt the train twice, test once design used by  to explore the neural effect of overnight consolidation for spoken words. Following the method used by Davis et al., half of the words and objects in the current study were learned on day 1 and the other half on day 2, with scanning taking place following day 2 training. This design allows an opportunity for overnight consolidation of day 1 but not day 2 items. In line with complementary learning systems accounts  we might anticipate differences between neural responses for items scanned following a night of sleep as compared to items learned on the same day as scanning. Using this design does not allow us to distinguish whether any consolidation effects come about due to the processes of sleep or due to time elapsed since learning. Nevertheless the design allows us to compare initial and longer-term changes during the earliest stages of learning to read. It will be for future research to determine the underlying cause(s) of any changes observed.
In addition to reading or naming all the trained words and objects from days 1 and 2, participants also read three sets of untrained artificial words during scanning. These conditions therefore permitted a comparison of trained and untrained wordsallowing us to assess activation differences that might parallel those seen between English words and pseudowords. Two sets of these untrained words were the written forms of the object names learned on days 1 and 2. The spoken forms of these items were therefore familiar (trained) while the orthographic forms remained unfamiliar (untrained) (Fig. 1). This manipulation allowed us to determine if familiarity with the phonological form of a word (in the context of object naming) results in differential activation as compared to the third set of untrained words which were completely unfamiliar. For these comparisons, we can therefore assess the possibility (raised by research on spoken word learning, ) that holistic lexical representations are enhanced during overnight consolidation and hence that these effects may differ for words learned on day 1 or day 2. These comparisons would be very difficult to achieve in a naturalistic setting.
In order to validate the use of this laboratory-learning approach in relation to the brain systems ordinarily engaged for word reading and object naming we also included a functional localiser scan in which participants read English words and pseudowords and named familiar objects. This allowed us to ask whether the neural systems that support reading of artificial words and naming of artificial objects are the same as those used for real word reading and object naming. This may help us in ascertaining whether a neural distinction between holistic and componential processing applies equally in artificial and real reading and object naming.
In summary then we address four major questions in this study: 1) Are the same neural systems involved in the retrieval of holistic and componential visual-verbal associations as previously shown to support learning? 2) How does overnight consolidation impact on neural representations of recently learned words and objects? 3) What do comparisons of reading trained and untrained words suggest about neural systems for whole word representation and generalisation to pseudowords? 4) To what extent do the neural systems involved in reading and naming artificial words and objects overlap with the systems involved for real words and objects?

Participants
25 right-handed native English speaking adults aged 18-40 took part in a study approved by the Cambridge Psychology Research Ethics Committee. No participants reported having dyslexia, speech, or language impairment or any pre-existing neurological condition that would preclude participation in functional MRI. Five participants were excluded due to poor performance during scanning ( < 20% correct on any condition) leaving 20 participants in the main analyses. An additional participant did not complete the functional localiser run and so the localiser analysis reports the results from 19 participants.

Experimental Stimuli
Three sets of 36 monosyllabic consonant-vowel-consonant (CVC) pseudowords were used in the experiment (e.g., "pag", "zon"). Each pseudoword set was assigned to one of three conditions in a counterbalanced manner over participants: objects, words, and an additional set of untrained words (Appendix A). These pseudowords were constructed from the same set of 12 consonants (b, d, f, g, k, m, n, p, s, t, v, z) and four vowels (ɒ, ɛ, ae, ʌ) as used in Taylor et al. (2014a). Segment position was matched across stimulus groups with each consonant appearing three times each in onset and coda position while each vowel appeared 9 times in each item set. Each of the three stimuli sets was further split into two groups of 18 items to be trained on days 1 and 2. Unfamiliar visual symbols (artificial letters) were mapped to the 16 phonemes in a one-to-one manner meaning that the written forms of items had consistent letter-sound mappings (see Fig. 1D for examples). Spoken forms of the pseudowords were recorded by a female native English speaker in a soundproof booth and digitised at a sampling rate of 44.1 kHz.
Stimuli for the functional localiser were 120 monosyllabic items chosen from the updated Snodgrass and Vanderwart item set (Magnié et al., 2003). These were randomly assigned on participant-by-participant basis to appear either as objects for naming (60 pictures) or as words for reading (60 written words). This prevented potential priming effects that would occur if all items appeared as both words and objects for each participant. 98 of the 120 items (81.6%) had grapheme-tophoneme correspondences that are classified as regular according to the DRC model of reading (Rastle and Coltheart, 1999, Appendix C), and the mean log frequency of the items based on the Zipf scale (Van Heuven et al., 2014) was 4.47 (SD=0.56), i.e. relatively high frequency. 120 monosyllabic pseudowords were generated from the ARC nonword database (Rastle et al., 2002) and were pairwise matched to these Snodgrass and Vanderwart items for letter length and orthographic neighbourhood size (S & V item: mean length (SD)=4.09 (0.82), orthN mean (SD)=8.30 (5.14); Pseudowords: mean length (SD)=4.20 (0.75), orthN mean (SD)=8.27 (4.39)). Snodgrass and Vanderwart items and pseudowords were further matched pairwise for initial phoneme as this factor has been reported to have the most impact on reading and naming latencies (Rastle et al., 2005). 60 of these pseudowords were randomly selected for each participant. Localiser items (words, pseudowords, and objects) were therefore matched at a group level but not for each individual participant.

Experimental procedure
The experiment used a train twice, scan once design in which behavioural and neural responses to items learned on day 1 and day 2 (hereafter day 1 and day 2 items) can be compared in a single scanning session performed on day 2 (Fig. 1A). Participants learned different items over two days and then completed a combined fMRI and behavioural testing session following training on the second day. As a consequence of testing only once this efficient scanning design removes effects of practice on neural responses (e.g., for a longitudinal design) and avoids the neural variability that would be caused by testing on two occasions or scanning two different groups of participants ).

Training
Training took place over two days with 36 spoken pseudowords being associated with 18 artificial written forms and 18 artificial object pictures on each day. Participants completed eight runs of training on each day (four each of word and object training) consisting of alternating word and object runs (Fig. 1B). Within each run 18 items were presented across 6 blocks that alternated between training and testing. During trials in the training block participants passively viewed the visual form of each item onscreen for 3500 ms and then heard the spoken phonological form 500 ms after the visual onset. Six items therefore appeared in each training block. During the testing blocks, the same six items appeared in a different order (Fig. 1B). The visual forms were presented and participants read/named the item aloud during the 3500 ms in which the item was onscreen. Responses were recorded and scored offline and were deemed correct if the participant's response included all three phonemes in the target item in the correct order. At the end of the training on day 2, participants completed a short practice session of the task used in the scanner with real words and objects. Train/test structure used during the training sessions with alternating periods spent learning and being tested on words and objects. Within each run participants were trained then tested on 6 items at a time until all 18 items had been tested. (C) Timeline of fMRI scanning runs including testing runs, top-up, and localiser runs. In testing runs 1 and 2 participants were presented with each of the trained items twice in a see-think trial and once in a see-speak trial. Half of the untrained items were similarly presented across these two testing runs. Testing runs 4 and 5 followed the same structure with all trained items presented again, and the other half of the untrained items. (D) Time line showing the structure of the two trial types presented during the scanning runs: see-think and see-speak trials. Word and object trials were presented in a random order for each participant with the constraint that see-speak trials always followed a see-think trial for the same item. However, see-think trials were also presented in isolation so that activity for these two trial types can be separated at the analysis stage. See-speak trials were identical to the see-think trials except that the green background cued participants to say the appropriate word/object name aloud rather than covertly. (E) Timeline showing trial structure of the localiser run. Familiar words, objects, and pseudowords were presented onscreen. Participants read/named all items aloud during the silent interval between scans. (F) The goal of each of the reported analyses of the neuroimaging data and the conditions compared.

MR data acquisition
Functional magnetic resonance imaging data were acquired using a 3 T Siemens Trio scanner (Siemens Medical Systems, Erlangen, Germany) with a 32 channel head coil. Responses were recorded with a dual-channel MRI microphone (FOMRI II, Optoacoustics). Audio stimuli were processed using the Sensimetric EQ Filtering 2.1 software for presentation over Sensimetric S14 headphones in the scanner. Visual stimuli were presented using a monitor mounted at the rear of the scanner bore, viewed via an angled mirror attached to the head coil.
We used a rapid sparse imaging event-related design with a repetition time (TR=3500 ms) longer than the acquisition time (TA=2000 ms), which allowed a gap of 1500 ms during which spoken responses could be recorded in the absence of scanner noise. This silent period between scans meant that participants could hear their own voice when speaking, and additionally reduced the impact of motioninduced artefacts on the acquired images (Peelle, 2014;Perrachione and Ghosh, 2013). Each of the four word/object test runs involved acquisition of 195 images (including 6 initial dummy scans to allow for T1 equilibrium). Image acquisition consisted of 32 transverse oblique axial slices, angled to avoid the eyes. Each slice was 3 mm thick and consisted of a 64×64 matrix of 3×3 mm voxels. There was a 0.75 mm gap between adjacent slices such that the total image volume allowed for whole brain coverage including the cerebellum, except for a few cases in which the very top of the parietal lobe was not covered. To assist in anatomical normalisation, we also acquired a T 1 -weighted structural volume using a magnetisation prepared rapid acquisition gradient-echo protocol (repetition time=2250 ms, echo time=2.99 ms, flip angle=9°, 1 mm slice thickness, 256×240×192 matrix, resolution -1 mm isotropic).

Scanning procedure
The scanning session consisted of 6 scanning runs lasting 72 min in total (Fig. 1C). Four of these runs tested artificial word reading and object naming. After two of these runs, participants completed a 'topup' run where they were reminded of the items in the same manner as in the training sessions (i.e. paired presentations of visual and verbal forms of each item). This allowed another opportunity to learn the items and so increased the number of correct responses included in the analysis. However, neural data from this top-up run will not be reported in this manuscript. A functional localiser run that involved reading real words and pseudowords and naming real objects completed the scanning session. At the start of scanning a high-definition MPRAGE structural scan image was also acquired (see above for details).
Participants completed four reading/naming runs of 11 min duration while in the scanner (Fig. 1C). Each run contained 9 testing blocks of 63 s each with a rest period of 10.5 s between blocks. 18 trials appeared in each block of testing, made up of 12 see-think trials and 6 see-speak trials. All 36 trained words and 36 trained objects, along with half of the untrained words (n=18) and half of the written forms of trained objects (n=18) were split to appear across runs 1 and 2. The remaining half of the untrained words and written objects appeared in runs 3 and 4, along with a second presentation of all 36 trained words and 36 objects. Consequently untrained items were not repeated and so remained novel, while participants had two opportunities to name each of the trained items.
Critical to our design was that half of the see-think trials were followed by a see-speak trial in which the same item was presented (Fig. 1D). During see-think trials the items appeared on a white screen and participants were instructed to recall but not articulate the spoken form. For the see-speak trials items appeared on a green screen and participants were instructed to say the phonological form of the item aloud. Each trial lasted 3500 ms starting with a visual item presented for the first 1500 ms followed by a single functional brain volume being acquired in the remaining 2000 ms. Including see-think and see-speak trials was important to the design for several reasons. First, the time between scans was not long enough for participants to read a novel word and say it aloud. As the see-speak trials always followed immediately after a see-think trial participants had already retrieved the item pronunciation on the previous trial and could articulate its spoken form in the short period between scans. Second, this design ensured that the majority of trials were not affected by head movements due to articulation (as there were double the number of seethink trials as see-speak trials), and prevented anticipation of articulation on the subsequent trial, since participants could not predict whether a see-think trial would be followed by a see-speak trial. Third, as articulation only took place on see-speak trials, subtraction of see-speak from see-think trials will remove activation associated with articulation, and reveal activation associated with covert phonological retrieval on see-think trials. Finally, it is possible that activation differences between words and objects may in part be driven by visual differences between these two types of stimuli. As the same visual form was presented on successive see-think and see-speak trials, subtraction of see-speak from see-think trials may also reduce the impact of these visual differences. We return to this issue in the discussion.
During the localiser task participants were presented with 60 items in each of three conditions: written English words, pseudowords, and real objects. The same event-related sparse imaging design was used (TR=3500 ms, TA=2000 ms). Items were randomised and appeared onscreen for the 1.5 s of silence between scans using the same block structure as above; a 63 s block containing 18 trials followed by 10.5 s rest. 186 EPI images were acquired (~13 min scanning time).

MRI preprocessing
Image processing and analysis of all EPI data were performed using SPM8 (Wellcome Trust Centre for Functional Neuroimaging, London, UK) in conjunction with AA software version 4 (Cusack et al., 2014). The first six volumes of each scanning run were discarded to allow for equilibration effects. Images for all scanning runs for each participant were realigned to the first image in the first scanning run (Friston et al., 1995) and the resulting mean image was co-registered to the T1 structural image. Normalisation of structural images to standard MNI space was calculated using tissue probability maps (Ashburner and Friston, 2005), and these warping parameters were then applied to all functional images for that participant. Normalized functional images were resampled to 2 mm isotropic voxels and spatial smoothing was applied using a kernel full-width-half-maximum of 8 mm. For the functional imaging analyses described below we used an event-related analysis implemented in the SPM8 software. Accordingly, event times were convolved with the SPM8 canonical hemodynamic response function following the recommendations of Perrachione and Ghosh (2013). Movement parameters estimated at the realignment stage of preprocessing were added as regressors of no interest. All analyses used a voxelwise threshold of p < 0.001 combined with cluster extent-based FWE-corrected threshold of p < 0.05 unless otherwise stated.

Artificial words and objects analysis
First level models were constructed from all event types seen during testing runs (see-think and see-speak events for each of 7 conditionswords day 1, words day2, objects day 1, objects day 2, written objects day 1, written objects day 2, untrained words). The events were additionally split according to whether or not the response for each trial was correct or incorrect, leading to 28 event types in the first level model. Although all event types seen during testing were modelled, second level analyses focussed only on trials in which participants responded correctly, to ensure a fair comparison between conditions even if differences in accuracy were observed. In order to assign accuracy to see-think trials (in which there was no behavioural response) we assumed that accuracy in the see-speak trials could be applied to the corresponding see-think trials for that same item. A second level model was constructed in SPM using 8 conditions derived for trained items that were responded to correctly involving three factors: trial-type (see-think vs see-speak), item type (words vs objects), and day of learning (day 1 vs day 2) (Henson and Penny, 2003, Technical Report). To assess neural responses for generalisation items (which were only presented in written form), a further second level model was constructed in which we compared responses for trained words (day 1 vs day 2), with the written form of object names (day 1 vs day 2), and untrained words leading to a 5-level factor that was crossed with trial-type (see-think vs see-speak). F-contrasts were constructed to identify significant main effects and interactions and, where significant effects were found, t-contrasts were used to explore the specific effects.

Functional localiser analysis
We used an overt reading/naming task in order to overcome the challenge posed by Price and Devlin (2011) that passive viewing of words may induce greater covert naming than passive viewing of objects. To reveal the neural systems involved in holistic as opposed to componential visual-verbal processing of real world stimuli, we used the contrast [objectswords]. To ensure that this comparison revealed engagement of different neural representations, as opposed to differences in processing effort, this contrast was conducted after taking response time differences into account, using the approach proposed by Taylor et al. (2014b). This involves building a regression model that includes one parametric modulator to model the effect of RT on BOLD signal (irrespective of condition), and additional parametric modulator(s) representing the different stimulus conditions. Activation associated with this second parametric modulator then reflects the differences in neural response between conditions over and above effects due to response time differences.
The contrast [pseudowordswords] was included to reveal the neural systems involved in componential as opposed to holistic mapping between visual and verbal representations. Unlike the contrast between words and objects, we do not partial out the effects of response time when comparing words and pseudowords. Following the framework of Taylor et al., (2014b) both words and pseudowords should engage the same reading related brain regions, but pseudowords take longer to read aloud, and should therefore drive greater activity in these regions due to greater processing effort (c.f. Taylor et al., 2013Taylor et al., , 2014b. As this contrast was intended to reveal additional processing effort during pseudoword reading, response time differences between words and pseudowords were not taken into account when conducting this contrast. Reading familiar words is an automatic process that is relatively effortless compared to both naming objects and reading pseudowords, leading to much shorter response times for the former than either of the two latter conditions. Given this, it is perhaps not surprising that contrasts of both [wordsobjects] and [wordspseudowords] showed activation throughout the default mode network (Gusnard and Raichle, 2001;Raichle, 2015). We therefore chose not to use these contrasts as these regions are unlikely to contribute directly to word reading (see Table 2 for details of these contrasts).

Artificial words and objects during training
Due to problems with audio recording equipment, responses for 4 participants were not collected during all of the training runs. For the remaining 16 participants we had full data from both days of training. These were scored for accuracy and entered into behavioural analysis of the training runs. Reading accuracy was better for words learned on day 2 than on day 1, whereas accuracy at naming objects was similar for day 1 and day 2 items ( Fig. 2A, Table 1). Data were entered into a repeated measures ANOVA with the factors day of training (day 1 or day 2) and item type (word or object). There was no overall effect of item type on performance, F(1, 15)=0.5, ns. The effect of day was significant with day 2 items showing higher performance than day 1 items, F(1, 15)=8.86, ŋ p 2 =0.371, p < 0.01. Furthermore there was an interaction between item type and day, F(1, 15)=8.81, ŋ p 2 =0.370, p=0.01, with performance for objects remaining similar on both days but with improved word reading performance on the second day.

Results during scanning
3.2.1. Real words and objects in the localiser Naming performance in all three conditions was very high (Table 1) but response times were faster for words than objects, t p (18)=21.11, p < 0.001, and for words than pseudowords, t p (18)=8.32, p < 0.001. As described in the methods, the contrast [objectswords] was conducted after taking between-condition differences in response time into account. This was to ensure that we could localise regions more engaged by holistic as opposed to componential processing rather than regions that responded more strongly to object naming as it was more effortful than word reading. Objects showed more activation than words in bilateral inferior temporal gyrus, as well as middle and inferior occipital gyri (Fig. 3, Table 2). We next contrasted pseudoword with word activation. As explained before, we did not account for response time differences in this analysis, since we wanted to observe neural activity associated with more effortful componential reading of pseudowords compared to reading of words (see methods and Taylor et al., 2014b). Pseudowords activated bilateral middle and inferior occipital gyri, left superior temporal gyrus, bilateral precentral gyri, bilateral postcentral gyri, left supplementary motor area, and left inferior frontal gyrus (triangularis), more than words (Fig. 3, Table 2).
In order to compare activation for real words and objects with the artificial words and objects we constructed four spherical regions of interest based on peaks found in the functional localiser that corresponded most closely to peaks found in a meta-analysis of word and pseudoword reading (Taylor et al., 2013). The ROIs had a 10 mm radius and were centred on: (1) left anterior fusiform gyrus (−28, −50, −16), determined by exploring sub-peaks of the left occipital/temporal cluster reported for objectswords in Table 2 Table 2 (including a sub-peak of the cluster in superior temporal and frontal regions). These coordinates are respectively 17.2 mm, 7.48 mm, 6.63 mm, and 11.49 mm distant from comparable peaks in Taylor et al. (2013).

Reading artificial words and naming artificial objects during scanning
Accuracy was higher for reading words than naming objects (Fig. 2B, Table 1). Whilst words from days one and two were read with equivalent accuracy, objects learned on day 1 were named less accurately than objects learned on day 2. A repeated measures ANOVA confirmed that accuracy was higher for reading words than naming objects, F(1, 19)=37.535, ŋ p 2 =0.664, p < 0.001, and higher for items trained on day 2 compared to day 1 items, F(1, 19)=5.939, ŋ p 2 =0.238, p=0.025. There was also an interaction between day of training and item type, F(1, 19)=8.657, ŋ p 2 =0.313, p=0.008. Follow-up tests confirmed that naming accuracy for day 1 objects during scanning was significantly worse than for recently learned day 2 objects.
In the scanning session, in addition to reading the artificial words that were trained on days 1 and 2 (visually and phonologically familiar), participants also read the written forms of the object names trained on days 1 and 2 (visually unfamiliar, phonologically familiar), as well as a set of completely novel untrained words (visually and phonologically unfamiliar). These 5 conditions were entered into a oneway repeated measures ANOVA, allowing us to ask whether visual learning, phonological learning, and/or a period of offline consolida-tion leads to improved reading performance. Accuracy in all of these conditions was very similar (Table 1) with no significant difference between any of the conditions, F(4, 76)=0.694, ns.

Whole-brain imaging analyses
To examine neural activity associated with reading trained words versus trained objects, and for items learned on days 1 and 2, we conducted a 2×2×2 repeated measures ANOVA to compare the effect of item type (word or object), day of learning (day 1 or day 2) and trial type (see-think or see-speak). We first examined the main effect of item type (words versus objects), collapsed across day of learning and trial type. The contrast words > objects showed extensive activation in bilateral parietal cortices, as well as peaks in middle and inferior occipital gyri (posterior vOT regions), right supramarginal, bilateral precentral, and bilateral middle frontal gyri, cerebellum, left supplementary motor area, and right hippocampus. (Fig. 4, pink, Table 3).
The reverse contrast objects > words revealed clusters in bilateral fusiform gyri (anterior vOT regions), bilateral angular gyri, left middle occipital cortex, left precuneus, left middle and anterior cingulate cortices, right inferior parietal cortex, bilateral superior and middle frontal gyri, left inferior frontal gyrus, left middle temporal gyrus, right cerebellum, right inferior temporal gyrus, right calcarine fissure, and left superior temporal pole (Fig. 4, light blue, Table 3).
As discussed in the methods, the main effect of word reading versus object naming does not rule out the possibility that low-level visual differences (such as differences in visual complexity, or retinal extent) between words and objects may be driving differences in activation. We therefore examined the interaction between item and trial type to reveal changes in object versus word retrieval-related activity during repetition suppression (i.e. additional activity for see-think trials, compared to the see-speak trials that immediately followed and that contained the same visual form). The contrast (words [see-think − seespeak] > objects [see-think − see-speak]) revealed no clusters at an FWE cluster-corrected threshold of p < 0.05 (Fig. 4, Table 3). However, the contrast (objects [see-think − see-speak] > words [see-think − seespeak]) revealed clusters of activation in bilateral fusiform gyri and left cuneus (Fig. 4, dark blue, Table 3). As shown in Figs. 4A and 4B differential activity in left and right fusiform gyrus for objects compared to words is significantly greater during see-think than during see-speak trials. This might suggest that fusiform activity for objects is associated with initial identification and/or retrieving their names rather than being due to low-level visual differences. We will return to this point in the discussion.
The main effect of day showed no clusters surviving correction at whole-brain level. In addition, there was no interaction between day and item or trial type. The main effect of trial type was not of particular  Fig. 3). interest in this study, but is reported for similar, previous data in Taylor et al. (2014a).

ROI analyses based on the real objects and words localiser
1) The ROIs defined in the localiser allow us to ask whether activation for trained artificial words and objects overlapped with that for real words and objects. Four regions of interest were defined based on the localiser, one in left anterior fusiform where objects show more activation than words, and three showing greater activation for pseudowords than words in left posterior vOT, left precentral gyrus, and left superior parietal lobe (the white circles in Fig. 3 show where the ROIs were defined while the white circles in Fig. 4 show how the ROIs relate to the whole-brain activation for the artificial words and objects.). In each ROI we conducted a 2×2 repeated measures ANOVA, with the factors item type and trial type, collapsed across day (Table 4). In the anterior fusiform ROI where real objects showed greater activation than real words, artificial objects also showed greater activation than artificial words and this was more pronounced for see-think than see-speak trials (similar to the interaction profile plotted in Fig. 4A). In the left posterior vOT ROI (similar to Fig. 4H), the left precentral ROI (similar to Fig. 4F) C. Quinn et al. Neuropsychologia 98 (2017) 68-84 and in the left superior parietal ROI (similar to Fig. 4G) we saw activation consistent with contributions to reading artificial words with a main effect of item-type (words > objects), and of trial type (see-think > see-speak) but no interaction between these factors. Thus, ROIs which showed more activation for English pseudowords than words, also showed more activation for artificial words than objects, consistent with contributions to componential reading processes. Our confidence that these left hemisphere activation differences were driven by processes involved in recalling spoken from visual forms, as opposed to purely visual differences, is greater for objects in anterior fusiform, than for words in posterior vOT, precentral gyrus, and superior parietal cortex. 2) We used these same four ROIs to examine whether visual or phonological familiarity influenced neural activity during word reading. In each ROI, we conducted a one-way repeated measures ANOVA with the five word reading conditions (collapsed across trial-type): trained artificial words from day 1 and day 2, the written forms of artificial objects trained on days 1 and 2, and a set of untrained artificial words. This comparison allows us to ask whether visual and phonological familiarity (as for the trained words), or purely phonological familiarity (as for the written objects), or a period of offline consolidation (as for the day 1 items) is necessary to support more word-like representations. There was no significant difference between the five conditions in any of the four regions of interest (Table 4).
At the suggestion of a reviewer we additionally analyse these comparisons based on (1) whether the written words are visually familiar (trained words from days 1 and 2) or unfamiliar (written forms of objects from days 1 and 2, as well as untrained words), and (2) whether the written words are phonologically familiar (trained words from days 1 and 2 as well as written forms of objects from days 1 and 2) or unfamiliar (untrained words). Comparison of these conditions in each of the four regions of interest confirm no difference due to either visual or phonological familiarity (Anterior fusiform: visual familiarity, t

Hippocampal ROI analysis
Complementary learning systems (CLS) accounts of word learning ) suggest a key role for the hippocampus, and indeed previous studies have shown changes in hippocampal responses to recently learned spoken words (Breitenstein et al., 2005;Takashima et al., 2014). As there is strong a priori reason to expect an effect of day of learning on hippocampal activity, regions of interest analyses were conducted using two separate AAL masks for the left and right hippocampi. Activation values were combined across see-  think and see-speak trials and entered into a 3-way repeated measures ANOVA with factors day of training (day 1 or day 2), item type (word or object) and lateralisation (left or right hippocampus). Hippocampal activation was greater for day 2 than day 1 items,  (Fig. 5).

Discussion
Both educational and cognitive perspectives on reading have highlighted a critical distinction between holistic and componential processing; i.e., between recognising whole-word forms and decoding words letter-by-letter. Here we have compared an artificial learning paradigm to object naming and word reading of familiar, real language stimuli in order to identify the neural systems that support holistic and componential visual-verbal mappings and their acquisition. Our findings combine to demonstrate both behavioural and neural dissociations between holistic and componential mappings, which we will discuss in turn. We will start by summarising behavioural evidence for this distinction as shown by differences in learning profiles and generalisation, before moving onto discuss ventral occipito-temporal, parietal, and frontal contributions to reading and naming of both real and artificial items. We will conclude by returning to the educational issues that we introduced at the outset and consider the broader implications of our findings.

Behavioural results show holistic and componential learning
Behavioural results during training confirm that learning to read artificial words involved componential learning whereas learning to name artificial objects involved holistic learning. Participants become better at naming objects across four runs of training on day 1 but when they return on day 2 to learn 18 more objects their learning profile was essentially the same as on day 1. This pattern is due to the holistic and arbitrary relationship between the visual and verbal form of an object; knowledge of object-names from day 1 does not help to name objects learned on day 2. In contrast, the componential and systematic relationship between the visual and verbal forms of written words means that letter-sound mappings learned on day 1 are also effective in supporting reading of items learned on day 2. Hence, reading performance at the start of day 2 is substantially better than at the start of day 1.
The distinction between holistic and componential learning is further borne out by behavioural performance during scanning. Participants were significantly worse at naming objects learned on day 1 compared to those learned on day 2due either to forgetting of more distantly learned object names or interference from object names learned immediately prior to scanning on day 2. The present data does not distinguish these two possibilities (Anderson, 2003;Mensink and Raaijmakers, 1988). Whichever explanation we invoke, however, interference or forgetting arises from the holistic and arbitrary nature of visual-verbal mappings for object names; items learned on day 2 do not support, and might even interfere with, items acquired on day 1. In contrast, reading performance in the scanner was equally good for words learned on both days. Thus, the significant interaction between day and item type is again consistent with componential knowledge in reading words aloud; since items learned on day 2 contained the same letter-sound correspondences, this knowledge supported successful reading of words learned on day 1. Finally, the ability of participants to read untrained words accurately further demonstrates their ability to generalise this letter-sound knowledge to novel written words.
Hence our participants have acquired the ability to decode written words. This is the same skill as is taught to beginning readers through phonics instruction. These findings therefore support our use of functional neuroimaging of artificial language learning to explore the neural basis of learning to read. To the extent that neural activity overlaps for real and artificial items, we can further argue for parallels between the processes that support skilled reading/naming, and processes recruited for reading/naming our artificial materials. This will be the focus of the next two sections of discussion.

VOT contributions to reading and naming
A variety of evidence reviewed in the introduction suggests hierarchical organisation of visual processing in vOT regions. Results of the functional localiser to some extent support these proposals as the componential reading contrast (pseudowords > real words) showed activation in lateral and posterior vOT regions (replicating Mechelli et al. (2005); and others, see Taylor et al. (2013) for a meta-analysis). However, the holistic contrast of real objects > real words also showed activation throughout vOT, including posterior as well as mid-and anterior vOT regions. This finding appears more compatible with views of vOT specialisation that propose association with phonological representations as a key factor  rather than specialisation for alphabetic forms (Cohen et al., 2000;Cohen, 2007, 2011). Our use of an overt naming task might be critical in explaining this observation. Covert reading/naming may induce greater phonological processing for word reading than object naming since phonological access is automatic for written words (Hagoort et al., 1999;MacLeod, 1991;Price et al., 1996;Song et al., 2010;Twomey et al., 2011). Nonetheless, we also observed that mid-and anterior vOT regions showed an additional response to objects than to words consistent with a contribution to processing holistic visual forms in hierarchically higher levels of the vOT.
Activation for the artificial words and objects matches the hierarchical organisation of vOT responses seen both in the localiser and in previous literature. We observed greater activation for reading artificial words than naming artificial objects in bilateral posterior vOT, whereas the reverse profile of greater activation for naming artificial objects than reading artificial words was observed in bilateral anterior vOT. Furthermore, this differential response in anterior vOT interacted with trial type, such that object > word activation was more pronounced for see-think than see-speak trials.
We preceded see-speak trials with a see-think trial for the same item primarily for pragmatic reasons: (1) it avoids articulation on the majority of trials, reducing head movement, because see-think trials occurred twice as often as see-speak trials (2) it would otherwise have been difficult for participants to produce item names in the short silent interval between two scans, (3) it permits separation and comparison of neural activity during covert (see-think) and overt (see-speak) articulation. That participants were able to respond fast enough on see-speak trials can be seen as a form of behavioural priming by which articulation of an item name is faster if it has been presented on an immediately preceding trial. Studies of neural repetition suppression have shown reduced activity for repeated items in a similar anterior fusiform region to that showing repetition suppression for artificial objects but not written words in the present study (Fig. 4A, cf. Kherif et al., 2011;Glezer et al., 2009). In previous work we have argued that the reduction in activity on see-speak compared to see-think trials reflects the fact that phonological retrieval primarily occurs on seethink trials (Taylor et al., 2014a). Hence, we proposed that anterior fusiform plays a greater role in processing holistic object-name associations than componential written word pronunciations. However, at the suggestion of reviewers, some more careful consideration leads us to acknowledge that there may also be visual contributions to repetition suppression. It is possible that anterior fusiform regions contribute to visual configural processing unique to objects and that this process, instead of, or as well as, phonological retrieval, is reduced on repeated trials (Vuilleumier et al., 2002;James et al., 2002). While other studies were able to rule this out (for example, Kherif et al. (2011) showed repetition suppression following pairs of non-identical pictures with the same name), our paired presentations involved the same visual form as well as the same name. Further investigations using names for objects that are depicted in multiple different pictures, and/or written words in multiple fonts, might help assess whether anterior fusiform is primarily involved in holistic relative to componential visual configural processing, or in the retrieval of holistic rather than componential visual-to-verbal associations.
One effect that we failed to see for the artificial orthography, which we had anticipated based on findings for reading real words and pseudowords was differential activation for trained versus untrained words. This null effect is notable considering the comparison involved 216 trials for trained words with 216 trials for untrained words (108 trials for the written forms of objects and 108 for completely untrained words), while the localiser showed differences between words and pseudowords with only 60 trials of each. This outcome, in conjunction with frontal and parietal activation for trained words relative to objects that we will discuss subsequently, might suggest that trained words were still being read componentially. It may be the case that our design included an insufficient number of training presentations (4 presentations of each word) to produce whole-word representations and that more intensive or longer-lasting training is required in order to generate holistic representations for written words in an artificial orthography. Future work will address this possibility. It might also be the case that adding irregular spelling-sound mappings would increase the necessity for these whole word representations. Note however, that cognitive models of reading, such as the DRC model, propose that whole-word representations develop for both regular and irregular forms.
If holistic word representations were to emerge with further training we might expect trained items to activate anterior fusiform regions more than untrained items, in a similar manner to real as well as trained objects relative to words in the current study. In contrast, untrained words might activate left posterior occipitotemporal, parietal, and frontal regions more than trained words, in a similar way to pseudowords relative to real words. This would support neuroimaging studies that, in line with cognitive models of reading, have suggested a distinction between sub-and whole-word processes in dorsal versus ventral brain regions (Taylor et al., 2013). Note however, that although the DRC model proposes lexical orthography-to-phonology mappings, the triangle model (Plaut et al., 1996) proposes that such item-specific mappings primarily emerge for the mapping between visual/phonological word forms and their meanings. Thus, this model might predict activation for trained relative to untrained items in anterior fusiform only if artificial words were trained with meanings.

Parietal and frontal contributions to reading and naming
In addition to vOT contributions, there is substantial evidence for frontal and parietal regions supporting componential reading processes. For example, we saw extensive activation of parietal and frontal networks for the contrast of pseudoword > word reading in the localiser scan. This replicates a large number of previous observations in the functional imaging literature (see Taylor et al., 2013 for a metaanalysis). We will discuss the implications of these findings separately, first for parietal and then for frontal regions.
The observed activation increase in inferior parietal cortices for word reading over object naming replicates results previously reported by Taylor et al. (2014a). However, our results go beyond these in two ways. First, by demonstrating substantial overlap between activation contrasts for artificial and real language stimuli (as scanned during a functional localiser). Second, by showing that differences in parietal involvement can be seen in an event-related design in which trials presenting artificial words and objects are randomly intermixed, rather than the blocked presentation used by Taylor et al. (2014a). This suggests that activation differences can be evoked solely by stimulus differences in the absence of the more strategic effects that are possible for blocked designs.
Unlike Taylor et al. (2014a), we did not find an interaction between item type (words vs objects) and trial type (see-think vs see-speak) in parietal regions. We cannot therefore be certain that parietal involvement in reading words reflects the retrieval of written word pronunciations, independent of the perceptual differences between words and objects. Indeed, some have primarily associated parietal activity during reading with perceptual processes. For example Cohen et al. (2008) using fMRI, and Rosazza et al. (2009) using EEG, argued that parietal regions are primarily active when readers are forced into an "attentionbased serial reading strategy" (p 361 of Cohen et al.) by changes in the visual form (orientation or degradation) of written words. A similar, serial visual processing, interpretation was offered by Protopapas et al. (2016) who obtained length effects in this region during pseudoword reading.
However, other data argue against a purely visuo-spatial attention account of parietal activation during reading. Carreiras et al. (2014) demonstrated selectivity in left inferior and superior parietal regions for coding the identity and positions of letters, relative to symbols and numbers, suggesting a role for this region in processing stimuli that have linguistic associations. Parietal activation has also been shown to be greater when participants make judgements about spelling-to-sound mappings, over and above judgements about spellings or sounds in isolation, again implicating this region in cross-modal processing (Booth et al., 2003(Booth et al., , 2007. Thus, in line with the proposal made by Taylor et al. (2013Taylor et al. ( , 2014a, we suggest that engagement of parietal regions for pseudoword relative to word reading, and for artificial word reading relative to object naming, reflects their role in the translation of component letters into sounds. In our study, posterior frontal (precentral gyrus) and parietal regions largely showed the same response profile. Both regions were activated in the localiser for pseudowords vs real words, and in the contrast of artificial words vs objects. Previous studies have implicated these frontal regions (specifically the precentral gyrus) in the selection and assembly of phonological outputs (Bookheimer, 2002;Devlin et al., 2003;Gough et al., 2005). Left precentral gyrus not only shows increased activation for pseudoword compared to word reading (Taylor et al., 2013(Taylor et al., , 2014b, but also reduced activation following consolidation of new spoken words ). Furthermore, a recent fMRI study has dissociated posterior frontal regions (such as the precentral gyrus) that contribute to phonological output from more anterior inferior frontal regions, which may contribute to phonological selection (this latter process is particularly engaged by reading irregular words, Taylor et al., 2014b). This distinction between phonological selection and phonological assembly is consistent with our finding of increased activation for naming artificial objects than reading artificial words in left IFG orbitalis, and the reverse profile in left precentral gyrus.

Emergence of holistic representations for consolidated items
As mentioned above, even after a night of consolidation for day 1 words we did not find differences between trained and untrained words either in behavioural performance or in the regions of interest defined from the functional localiser. Nonetheless, there was some evidence of whole-word learning since the hippocampus was differentially active for items learned on days 1 and 2. Consistent with the predictions of CLS accounts, there was more hippocampal activation for both words and objects learned on the same day as scanning as compared to the previous day. As the words from days 1 and 2 share the same letters, this reduced activation for day 1 words would not be possible unless a whole-word representation of some form existed. Hence, evidence that consolidation impacts on hippocampal activation provides evidence for some form of holistic representation for the day 1 words. However, given the lack of reliable activation differences between trained and untrained written words we cannot be confident about where these representations reside. Further studies should extend and adapt the time period of training and scanning to answer this question. Further work could then also determine the relative roles of sleep and time in the consolidation of orthographic and lexical representations. This question concerning the role of item-level consolidation in early stages of learning to read has implications for the role of consolidation in literacy learning more generally. Although lexical consolidation has been shown for school-age children (Henderson et al., 2013), this has primarily been in the context of learning spoken and not written words.

Validation of laboratory-based learning paradigms and implications for education
The extensive overlap of activation between the functional localiser and the artificial items, combined with the behavioural evidence, shows that laboratory studies of reading can engage holistic and componential learning mechanisms. This is a striking finding when we consider that we are comparing words and objects that have been used since childhood with artificial items that have been learned at most one day prior to scanning. This outcome suggests that laboratory-based learning studies can activate the same neural systems as engaged in more ecologically-valid paradigms (e.g. in studies of beginning readers). However, our artificial orthography studies have an advantage of maintaining strict experimental control. Neuroimaging results for real language stimuli may be sensitive to a wide variety of linguistic features: word frequency, age-of-acquisition, etc. which can be readily controlled in laboratory learning studies. Similarly, real language stimuli may be subject to individual differences in terms of language and literacy exposure that can be readily controlled in a laboratory setting. Finally, educational questions about early literacy acquisition are dependent on a wide range of external factors that may influence classroom outcomes. Consequently we suggest laboratory-based approaches offer an important complement to more naturalistic studies.
In education, the relative importance of holistic and componential reading strategies is much debated, with componential phonics-based approaches to reading instruction being dominant. Our results are consistent with the importance of componential learning during the earliest stages of reading acquisition: we see activation in posterior vOT, parietal, and frontal regions for the contrast of artificial word greater than object naming. However, in contrast to imaging findings with skilled readers and English words we see relatively little activation evidence for holistic representations of artificial written words despite abundant evidence for holistic representation of novel objects.
While fronto-parietal activation has been implicated in many studies of word reading, the relative contributions of dorsal and ventral regions across different stages of learning to read has remained relatively unclear, with several studies citing the need to further investigate the relationship between parietal and vOT contributions (Carreiras et al., 2014;Cohen et al., 2008;Price, 2012;Reilhac et al., 2013). Although fMRI studies of children have shown parietal involvement (Bitan et al., 2007a;Cao et al., 2006;Hoeft et al., 2007), children old enough to undergo scanning already have several years of exposure to written words. By tracking the very earliest stages of learning to read, our laboratory-based approach offers a way to identify the contributions of parietal and vOT regions over the very earliest stages of learning to read. That these areas show extensive activation prior to neural evidence for holistic representations speaks to the important role of componential mechanisms in early stages of acquisition. Given functional imaging and neuropsychological evidence for parietal contributions to spatial encoding of written strings (Carreiras et al., 2014;Cohen et al., 2008), our findings motivate further work to explore parietal contributions to successful and unsuccessful literacy acquisition (Peyrin et al., 2011;Reilhac et al., 2013).