Dissociating the functions of superior and inferior parts of the left ventral occipito-temporal cortex during visual word and object processing

During word and object recognition, extensive activation has consistently been observed in the left ventral occipito-temporal cortex (vOT), focused around the occipito-temporal sulcus (OTs). Previous studies have shown that there is a hierarchy of responses from posterior to anterior vOT regions (along the y-axis) that corresponds with increasing levels of recognition - from perceptual to semantic processing, respectively. In contrast, the functional differences between superior and inferior vOT responses (i.e. along the z-axis) have not yet been elucidated. To investigate, we conducted an extensive review of the literature and found that peak activation for reading varies by more than 1 cm in the z-axis. In addition, we investigated functional differences between superior and inferior parts of left vOT by analysing functional MRI data from 58 neurologically normal skilled readers performing 8 different visual processing tasks. We found that group activation in superior vOT was significantly more sensitive than inferior vOT to the type of task, with more superior vOT activation when participants were matching visual stimuli for their semantic or perceptual content than producing speech to the same stimuli. This functional difference along the z-axis was compared to existing boundaries between cytoarchitectonic areas around the OTs. In addition, using dynamic causal modelling, we show that connectivity from superior vOT to anterior vOT increased with semantic content during matching tasks but not during speaking tasks whereas connectivity from inferior vOT to anterior vOT was sensitive to semantic content for matching and speaking tasks. The finding of a functional dissociation between superior and inferior parts of vOT has implications for predicting deficits and response to rehabilitation for patients with partial damage to vOT following stroke or neurosurgery.


Introduction
Many functional imaging studies of reading have reported activation in the left ventral occipito-temporal (vOT) cortex, centred around the middle part of the left occipito-temporal sulcus. Because of its important role in reading, this region has often been referred to as the "visual word form area". However, as the same region is activated during many tasks other than reading, it also appears to play a more general role in integrating visual inputs with the language system (see Price, 2012 for review). Given the size of the vOT region, and variability in where reading activation is reported, there are likely to be multiple subdivisions with dissociable functions. This has already been documented in the posterior-to-anterior direction (Y-axis) (Lerma-Usabiaga et al., 2018;Seghier and Price, 2011;Vinckier et al., 2007), and the lateral-medial direction (x-axis (Gao et al., 2017), but not in the inferior-superior direction. For example, posterior vOT is more sensitive than anterior vOT to stimulus-bound perceptual features while anterior vOT is more sensitive than posterior vOT to abstractions that support object recognition (Simons et al., 2003;Price and Mechelli, 2005;Levy et al., 2009;Seghier and Price, 2011). Although much less is known about functional divisions in the superior-to-inferior direction (Z-axis), high variability in the Z-axis has been seen in previous neuroimaging reviews (Jobard et al., 2003;Bolger et al., 2005;Taylor et al., 2013). The aim of current study was to investigate what drives variation in activation along the Z-axis (superior versus inferior).
In an extensive review of the literature (see Table 1), we found that reading activation localised to left vOT or the "visual word form area" varied 1.2 cm in the z-direction (from z ¼ À8mm to z ¼ À20mm). These differences have rarely been acknowledged or discussed (e.g. Nestor et al., 2012;Gauthier et al., 2000;James et al., 2005) but, in the bilingual literature, Bolger et al. (2005) reported activation peaks in inferior vOT for alphabetic writing and in superior vOT for logographic (Chinese and Japanese) writing. Given the contrasting nature of the visual word forms in alphabetic and logographic scripts, our first hypothesis is that superior and inferior vOT regions might be sensitive to different types of visual processing. Alternatively, our second hypothesis is that, because alphabetic scripts have closer links to phonological representations while logographic scripts have closer links to semantic representations, inferior vOT might have stronger connections to phonological regions and superior vOT might have stronger connections to semantic regions. A third hypothesis is that alphabetic and logographic scripts require different degrees of attention to detail and this might selectively influence activation in either the inferior or superior parts of vOT cortex.
Our literature review of vOT activation during reading studies did not identify clear evidence to support any of the above hypotheses. There were 52 studies that met our inclusion criteria and we split these studies according to whether they report (A) superior vOT activation that was less than 15 mm below the AC-PC line (z ¼ 0 to À14) or (B) inferior vOT activation that was more than 14 mm below the AC-PC lines (z ¼ À15 to z ¼ À22). The functional border between superior and inferior vOT was set here to around À15mm (i.e. equivalent to the median of all reported z-coordinates across the selected 52 studies). This functional border ensured that half the foci (26/52) were located in superior vOT and the other half were located in inferior vOT, see Fig. 1. There was no corresponding division in the type of stimuli and tasks used by the study (see Table 1 for more details). We therefore investigated our own data to determine how superior and inferior vOT subregions respond when the demands on visual and semantic processing were varied by changing (i) the stimuli (pictures versus letters), (ii) their familiarity or semantic content (semantically meaningful or not) and (iii) the task (speaking versus matching). We also used dynamic causal modelling (DCM) to investigate condition-dependent connectivity to and from superior and inferior vOT subregions. Our rationale was motivated by previous studies that have shown convergence of different ventral and dorsal inputs to vOT (e.g. Rauschecker et al., 2011;Yeatman et al., 2012)).

Methods
This study was approved by the London Queen Square Research Ethics Committee. Full details of our experimental design and stimuli have been reported previously (Josse et al., 2008 (Seghier and; but the data have not previously been used to investigate visual and semantic effects in different parts of left vOT cortex.

Subjects
Fifty-eight right-handed healthy subjects participated in the functional MRI paradigm. All were native English speakers, had normal or corrected-to-normal vision, no history of neurological or psychiatric disorders and no awareness of developmental disorders (e.g. dyslexia), and gave written informed consent to participate in the study. Three different age groups were included: 20 teenagers aged 13-18 (11 female, 9 male), 24 young adults aged 19-33.7 years (13 female, 11 male) and 14 older adults aged 50-73.6 years (10 female, 4 male). Overall, there were 34 females and 24 males with a mean age of 30 years. All the teenagers had normal or above normal verbal and nonverbal IQ as measured using the Wechsler Intelligence Scale for Children (WISC-III). The effects of age Table 1 Previous studies reporting left vOT activation during reading: A MEDLINE search was conducted (from January 2000 to October 2018) using the keywords (i) 'Reading', (ii) 'fMRI' or 'magnetic resonance imaging' and (iii) 'occipitotemporal', 'occipito-temporal', or 'visual word form area' to identify papers that had reported activation during reading in left vOT. Relevant references within these articles also directed us to other papers that were considered in the literature review. Altogether, we identified 213 articles. We then excluded: (i) reviews and meta-analyses (i.e. those not reporting original-research), (ii) effects from subjects who were not neurologically or psychiatrically "normal" adults, or who had atypical learning, (iii) effects that were not related to visually presented words or pseudowords, (iv) effects not reported in standardized coordinates, (v) results of contrasts that compared visual stimuli to rest or fixation (because it was impossible to determine the level of cognitive processing that was driving activation), (vi) single case studies, (vii) co-ordinates related to laterality indices, (viii) effects in predefined regions of interest (region-based analyses), and (ix) studies published in non-English journals. Where appropriate, stereotactic Talairach coordinates were converted into Montreal Neurological Institute (MNI) space. For each study, we reported the location of the left vOT activation peak. The median of all vOT peaks is [x ¼ À43 mm, y ¼ À58 mm, z ¼ À14.5 mm]. Activation contrasts were categorised as being related to: (1) changes in task demands where subjects performed different tasks with the same set of stimuli or (2) changes in stimulus demands where subjects performed the same task with different sets of stimuli. Task driven contrasts were further categorised into those primarily driven by visual (e.g. letter detection versus phoneme detection), semantic (e.g. semantic versus identity one-back matching), or general demands (e.g. one-back matching versus passive viewing). Stimulus driven contrasts were further categorised into those primarily driven by visual differences (e.g. written words versus pictures of objects), linguistic content (e.g. words versus false fonts), a combination of visual differences and linguistic content (e.g. words versus checkerboards), semantic content (e.g. high versus low imageable words), general demands (e.g. unfamiliar versus familiar words), or stimulus primes (i.e. less activation when stimuli were preceded by identical ones). In some papers, superior peaks at z ! À12mm were labelled as inferior occipital gyrus instead of vOT. and gender on behavioural and fMRI responses were investigated (see results) but did not explain our results. Three of the subjects were excluded from the connectivity analyses because they did not show activation in one of the regions of interest (p < 0.05 uncorrected in at least 5 voxels).

Experimental design
There were four separate scanning sessions. In two sessions, subjects performed speaking tasks: reading aloud written words, naming aloud objects depicted in pictures, and saying "1,2,3" in response to seeing meaningless Greek letters and nonobjects. In the other two sessions (hereafter referred to as matching tasks), subjects made a finger press response to make semantic matching decisions on written words and pictures of objects as well as perceptual matching decisions on Greek letters and nonobjects. None of our subjects spoke Greek and our tasks discouraged subjects from associating Greek letters and specific (i.e. mathematical) concepts. The same stimuli were used across different sessions. Within each session, there were (i) 4 blocks of written words, (ii) 4 blocks of pictures of objects, (iii) 2 blocks of meaningless Greek letter strings, (iv) 2 blocks of meaningless pictures of nonobjects and (iv) 6 blocks of fixation. The order of conditions was counter-balanced within and across sessions. The experimental design is summarised in Fig. 2.

Stimuli
Stimuli were presented in the scanner by a video projector, a frontprojection screen and a system of mirrors fastened to the MRI head coil. All stimuli were presented in same-format triads with one item above two other items (see Fig. 2). In the semantic and perceptual matching tasks, the item above was the target and the two items below provided the matching and non-matching choices that the subject was required to select from. In the speaking tasks, subjects were required to first attend (i.e. read, name, or say "123") to the item above followed by attending to the lower left and then lower right items. Each triad remained on the screen for 4.32 s, followed by 180 ms of fixation. There were four stimulus triads of the same type per block (18 s per block) and each block was preceded by 3.6 s of instructions to indicate the type of response required. Fixation periods of 14.4 s length were interleaved after every two stimulus blocks. Across the experiment, subjects were presented with a total of 192 pictures of objects and the 192 corresponding 3-6 letter written names of objects (i.e. the semantic content   Table 1: vOT coordinates above or below the median z ¼ À15mm are shown in red ('x') or blue ('o') respectively. The y-axis (z ¼ 0 mm) and the z-axis (y ¼ 0 mm) are shown as grey lines, with their intersection at the AC point in black. The background of this figure is a sagittal view of the SPM's tissue probability map of CSF at x ¼ À44mm.

Fig. 2. Experimental design.
Our experimental paradigm manipulated the factors "stimulus type" (letter strings versus pictures), "familiarity" (familiar words and objects versus unfamiliar Greek letter strings and nonobjects) and "task" (matching versus speaking). In all trials, three stimuli were simultaneously presented as a "triad", with one stimulus above and two stimuli below.
In the matching tasks, participants made a finger press response to make: semantic matching decisions on words and objects (i.e. matching 'Piano' to 'Harp' rather than 'Oven') and perceptual matching decision on the unfamiliar stimuli (based on physical identity). In the speaking tasks, participants read/named aloud familiar words and objects and say "1,2,3" in response to seeing the unfamiliar stimuli. and names were matched in the picture and word conditions). Words that were read were presented as pictures in the semantic matching condition and pictures that were named were presented as words in the semantic matching condition.

MRI acquisition
The experiment was performed on a 1.5-T Siemens system (Siemens Medical Systems, Erlangen, Germany). In the functional MRI (fMRI) sessions, imaging consisted of a single shot gradient Echo Planar Imaging (EPI) sequence (repetition time/echo time/flip angle ¼ 3600 ms/50 ms/ 90 , field of view ¼ 192 mm, matrix ¼ 64 Â 64, 40 axial slices, 2 mm thick with a 1-mm gap). Functional scanning was always preceded by 14.4 s of dummy scans to ensure tissue steady-state magnetization. A generalised reconstruction algorithm was used for data pre-processing to avoid ghost-EPI artefacts. An anatomical scan was also acquired for each subject and used for spatial normalisation. This was a 3D T1-weighted, modified equilibrium Fourier transform sequence with the following parameters: TR ¼ 12.24 ms, TE ¼ 3.56 ms, TI ¼ 530 ms, FOV ¼ 256 mm Â 224 mm, acquisition matrix ¼ 256 Â 224, 1 mm slice thickness for 1 mm isotropic voxels.

Functional MRI data processing and analysis
Image processing and statistical analyses were performed using standard procedures in SPM (Wellcome Trust Centre for Neuroimaging, London, United Kingdom, http://www.fil.ion.ucl.ac.uk/spm/). All functional volumes were spatially realigned, un-warped, normalised to MNI space using the unified normalisation-segmentation procedure. The normalised volumes were written out at voxels size of 2 Â 2 Â 2 mm. These volumes were smoothed with an isotropic 6 mm full width at half maximum (FWHM) Gaussian kernel to compensate for residual anatomical variability and to permit application of Gaussian randomfield theory for statistical inference (Friston et al., 1994).
Statistical analyses of the functional data were performed within a mixed effect model framework. In the subject-specific first-level models, the pre-processed functional volumes were submitted to a fixed-effects analysis using the general linear model at each voxel. Each stimulus onset was modelled as an event in condition-specific "stick-functions" lasting 4.32 s per trial and having a stimulus onset interval of 4.5 s. The resulting stimulus functions were convolved with a canonical hemodynamic response function that provided regressors for the general linear model. For each task condition we included 3 regressors: instruction, correct trials, and incorrect trials. Time-series from each voxel were highpass filtered (1/128-Hz cut-off) to remove low-frequency noise and signal drift. From each subject's first level analysis, we extracted eight contrast images (one for each of the eight conditions' correct trials relative to fixation). These subject-specific images were then used for the secondlevel random-effects group analysis. This was comprised of a repeated measures ANOVA with eight conditions corresponding to the eight first level contrast images (see below).

Activation effects of interest
This paper is only concerned with effects located along the left occipito-temporal sulcus (OTs) that is known to be an important reading area (and often referred to as the visual word form area), see Fig. 1. The OTs is present in 100% of adults (Malikovic et al., 2012). In the y axis, OTs extends anteriorly to y ¼ À45 mm and posteriorly to y ¼ À75 mm, an extent that covered the range of potential vOT localisation in healthy adults (Cachia et al., 2018). The mid-point is therefore y ¼ À60mm, extending 15 mm anterior and 15 mm posteriorly. In the z direction, the mid-point was located at z ¼ À15mm, because this point corresponds to: (i) the z-coordinate at which the probabilities of being in cytoarchitectonic regions FG2 or FG4 are equal at y ¼ À60mm Weiner et al., 2017), according to Eickhoff et al. (2005), (ii) the median of the z-coordinate in our extensive literature review (Table 1), and (iii) the midpoint between the most superior (z ¼ À10) and most inferior (z ¼ À20) parts of the occipito-temporal sulcus at y ¼ À60 mm. In the x direction, our ROI is located between x ¼ À38 and x ¼ À46 mm. Considering x, y and z co-ordingates together, the mid-point of our vOT ROI was set at [-42 -60, À15].
Within this region, we report main effects of task (i.e. speaking versus matching tasks), familiarity (i.e. familiar words and objects versus unfamiliar nonobjects and letter strings), and stimulus type (i.e. pictures versus letter strings). The comparison between familiar/meaningful stimuli and unfamiliar/meaningless stimuli (i.e. the familiarity effect) also reflects the semantic content of our stimuli. In addition, we report interactions between these variables when significant and the main effect of all conditions > fixation where there was also significant activation for each of the 8 conditions compared to rest at a statistical threshold of p < 0.001 uncorrected (this was achieved using the inclusive masking option in SPM). Correction for multiple comparisons was based on a sphere of 15 mm radius centred on this mid-point. In the x and z dimensions, this sphere conservatively included areas beyond the occipitotemporal sulcus, to ensure that we did not impose harsh or false boundaries on any apparent function subdivisions. We report the spatial location of different effects along the y-axis and the z-axis with reference to the current probabilistic definitions of cytoarchitectonic areas of the Anatomy toolbox in SPM (Eickhoff et al., 2005).
The anatomical definition used here was in standard MNI coordinates, after spatial normalisation, because we wanted to establish findings that generalised over groups of subjects, as in most group level functional neuroimaging studies. Alternatively, this might not be possible given potential variability from subject to subject in the size, shape, location and depth of the occipito-temporal sulcus (Kim et al., 2008;Ma et al., 2011;Ono et al., 1990) Malikovic et al., 2012, as well as in the number of its segments, side branches and cytoarchitectonic zones/areas (Caspers et al., 2013). For instance, Destrieux et al. (2010) have shown that the (lateral) occipito-temporal sulcus is discontinuous and difficult to differentiate, and it can connect to surrounding sulci including the anterior collateral transverse sulcus, the anterior occipital sulcus, or the inferior temporal sulcus (Destrieux et al., 2010).

Dynamic causal modelling
We used dynamic causal modelling (DCM; Friston et al., 2003) to investigate neuronal interactions (i.e. effective connectivity) among four different regions in the occipito-temporal sulcus that were each observed to be activated in our voxel-based analyses (see Fig. 3 and Section 3.3). DCM analyses were carried out using DCM12 as implemented in SPM12. In brief, DCM estimates three sets of parameters: 1) input/extrinsic parameters that quantify how brain regions respond to external stimuli, 2) endogenous parameters reflecting the latent connectivity that characterizes the context-independent coupling between regions, and 3) modulatory parameters that measure changes in effective connectivity induced by experimental conditions. In our case the input was modelled as a regressor representing all task conditions of the session. Therefore, the endogenous parameters reflect the average connectivity during the matching or speaking tasks. We were also interested in the modulatory effects of familiar words and objects. These parameters reflect connectivity changes over and above that for the unfamiliar items (i.e. Greek letter strings and pictures of nonobjects). Finally, we also tested for differences between words and pictures of objects.
All connectivity parameters are expressed in units of Hertz within the DCM framework. Positive values indicate that, as activity increases in the originating region, the rate of change in activity in the target region also increases. Negative values indicate the reverse (i.e. as activity increases in the originating region, a decrease in the rate of change in activity occurs in the target region). All parameters (endogenous and modulatory) of the DCM model and their posterior probabilities were then assessed with Bayesian inversion by means of an expectation-maximization algorithm (Friston et al., 2003). More details about DCM can be found elsewhere (Friston et al., 2003;Seghier et al., 2010;Stephan et al., 2010).

Data extraction
To conduct the DCM analyses, we extracted the summarised timeseries (i.e. principal eigenvariates) for each of the four regions of interest (ROI) for each separate experimental session from each first level analysis. The extracted ROI time-series were adjusted based on a subjectspecific F-contrast that retained the experimental effects of interest (i.e. correct trials of all 8 conditions) and regressed out variance caused by factors of no interest (i.e. incorrect trials, instructions). The time-series were then separately concatenated for (a) the two speaking task sessions and (b) the two matching task sessions and entered into the DCM analyses.

Parameter estimation
The exact mechanisms behind the differential activation responses that we observed were unknown. It was therefore important to specify a range of alternative models and search for useful models in the model space (Richardson et al., 2011;Seghier and Price, 2011). In the current study we did not have strong prior expectations about how our regions would interact. Since there are thousands of possible models that can be constructed with four ROIs, estimating the parameters of all models would be computationally inefficient. We therefore used Bayesian Model Reduction (BMR) which involves only estimating a full or parent model, containing all the parameters of interest and then use the posterior estimates of this full model to derive the posterior estimates (and model evidence) of reduced models, in which one or more parameters are systematically removed. BMR gives similar results as the standard approach but is more computationally efficient (Rosa et al., 2012). In our case the full model included input to the most posterior region and full connections (forward and backward) between all regions with the exception of any direct connections between the input region and the most anterior vOT region (i.e. visual information was assumed to initially flow forward from posterior vOT regions). Using the BMR approach, the connectivity parameters are estimated by taking a weighted average of the parameters under each model (Rosa et al., 2012). This approach, called Bayesian Model Averaging (BMA), accommodates uncertainty about the underlying model structure on the parameter estimates.

Connectivity effects of interest
At the group level, we used one-sample t-tests to evaluate whether a parameter (i.e. an endogenous or modulatory connection) is non-zero. We were also interested in modulatory connections (i.e. connectivity changes for familiar words and objects over and above that for the unfamiliar items). This allowed us to assess condition-dependent connectivity to and from superior and inferior vOT subregions.

In-scanner behaviour
Mean accuracy for all 8 tasks was above 90% and lowest accuracy was above 75% for all 8 tasks (see Table 2). fMRI activation (below) is reported for correct trials only. Matching response times were significantly slower (p < 0.05, 2 tailed t-test) for the 14 older participants (50-74) than the 44 younger participants (13-34) but there was no significant difference in response times for the teenagers (13-18) and young adults (19-34).

Activation results
3.2.1. The main effect of task: matching vs. speaking The voxel-wise analysis identified a main effect of task with higher activation for matching than speaking tasks at MNI coordinates [-46 -58 -10] with a Z score of 4.2 (p ¼ 0.004 corrected for multiple comparisons within a region of 15 mm radius centred on [-42 -60 -14] the mid-point of the occipito-temporal sulcus (see Section 2.6). This effect was located at the most superior edge of the middle part of the occipito-temporal sulcus, with a high probability of being within area FG4, according to Weiner et al., 2017), see Fig. 3A. Activation in this superior vOT region was higher for matching than speaking tasks when the stimuli were words (Z ¼ 4.6) and unfamiliar stimuli (Nonobjects: Z ¼ 3.4; Greek letters: Z ¼ 3.2) but not objects (Z < 1) that strongly activated superior vOT irrespective of task (Fig. 3B). This resulted in a weak task by semantics by stimulus type interaction (Z scores ¼ 2.45, p < 0.01).
Greater activation for matching than speaking in this superior vOT region [-46 -58 -10] was observed for each of the three age-groups tested (p < 0.05 for 20 teenagers, 24 young adults and 14 older adults) and for both males and females (p < 0.05 for both) There was no significant difference between the size of the effect for any pair of participant group (p > 0.30, two tailed t-tests).
There was no significant effect (p > 0.05 uncorrected) in the opposite direction (i.e. speaking more than matching tasks) anywhere in our ROI along the occipito-temporal sulcus (from y ¼ À75 to y ¼ À45).

The main effect of stimulus familiarity: familiar > unfamiliar stimuli
The voxel-wise analysis showed a main effect of stimulus familiarity with higher activation for familiar (words and objects) than unfamiliar stimuli (Greek letters and non-objects) at the anterior end of OTs (anterior vOT). Peak activation was observed at [-44 -50 -16] (Z > 8), and extended superiorly (Z score >8.0 at [-44 -50 -8]) and inferiorly (Z score ¼ 7.8 at [-36 -42 -24]) even when the statistical threshold (p < 0.05) was corrected for multiple comparisons across whole brain, and limited to voxels where the effect of semantics was significant for both letter strings (Z ¼ 6.0) and pictures (Z > 8). Both inferior and superior regions fell within the cytoarchitectonic area FG4, see Fig. 3A. Therefore, there was no evidence for a distinction between superior and inferior parts of anterior vOT in either our data or cytoarchitecture.

The effect of stimulus type (letter strings versus picture)
There were no parts of our ROI that were more activated for pictures of nonobjects compared to Greek letters (or vice versa), even when the threshold was reduced to p < 0.05 uncorrected. However, stimulus type interacted with semantic content in all regions of interest (p < 0.001) because activation was highest for objects (pictures with semantic content) than all other conditions (see Fig. 3B).

The effect of all stimuli more than rest
All conditions (relative to fixation) activated extensive regions of the occipital cortex (see Fig. 3) that included activation at both inferior and superior ends of OTs. Peak activation (across conditions) was identified in the posterior part of cytoarchitectonic area FG2, at the tail end of OTs at [-40, À74, À14] with a Z score of 9.5.

Differences in the response properties of superior, inferior and anterior vOT regions
To compare the response properties of the superior, inferior and anterior vOT regions, we extracted subject specific responses for each condition in each of these regions and tested for region by condition interactions using IBM SPSS Statistics for Windows, Version 22.0 (Armonk, NY: IBM Corp). The effect of interest was Region, and how this interacted with the effect of task, familiarity and stimulus type in a 2x2x2x2 repeated measures ANOVA). This was repeated when the regions were superior and inferior vOT and when the regions were superior and anterior vOT.
Superior vOT responses were extracted at the peak co-ordinates for matching more than speaking [-46 -58 -10]. Inferior vOT was positioned at [-44 -60 -18] after searching for the voxel with the highest activation for all conditions relative to fixation, within 4 mm of [x ¼ À46, y ¼ À58, z ¼ À20]the most inferior part of the occipito-temporal sulcus directly beneath the superior vOT region. Anterior vOT responses were extracted at the peak co-ordinates for familiar more than unfamiliar stimuli [-44 -50 -16].
When the superior and inferior vOT responses were compared, there was (i) a main effect of region (p < 0.0001) because responses were higher in inferior than superior vOT; and (ii) an interaction between region and task (p ¼ 0.009) because the effect of task was significantly greater in superior than inferior vOT. Region did not interact with familiarity or stimulus type and there were no three way or four way interactions (p > 0.05).
When the superior and anterior vOT responses were compared, there was (i) a main effect of region (p ¼ 0.002) because responses were higher in superior than anterior vOT; (ii) an interaction between region and task (p ¼ 0.01) because the effect of task was higher in superior than anterior vOT, and (iii) an interaction between region, familiarity and stimulus type (p ¼ 0.046) because the effect of semantics on words was greater in anterior than superior vOT. There were no other significant interactions with region (p > 0.05). Table 2 In-scanner accuracy and response times Mean accuracy (and standard deviation) are reported for all 8 tasks. Response times were only available for the matching tasks (post-decision finger press speed) but not for the speaking tasks due to difficulties extracting voice onset from the noise of the scanner.

Condition
Accuracy ( based analysis, we tested whether functional differences would also be observed in how superior and inferior parts of vOT interacted with posterior and anterior areas. Specifically we compared effective connectivity in two different pathways: a superior vOT pathway and an inferior vOT pathway. For both pathways, the input area was the posterior FG2 area [-40, À74, À14] where peak activation was observed for all stimuli compared to fixation (see Section 3.2.4 above) and the end point of the pathway was the anterior vOT area (i.e. FG4) [-44, À50, À16] where peak activation was observed for familiar compared to unfamiliar stimuli. Connections between these two regions were routed via either (i) the superior vOT area [-46, À58, À10] (i.e. posterior-superior FG4) identified for the main effect of matching more than speaking, or (ii) the inferior vOT area [-44,-60, À18] (i.e. anterior-inferior FG2) identified in Section 3.2.5. A full report of the DCM findings can be found in Table 3. Here we highlight the results that distinguish the functions of the superior and inferior pathways.

Average connectivity
Over all stimuli, the strength of connectivity was (i) significantly stronger in the superior pathway than the inferior pathway for speaking (t ¼ 5.53, p < 0.001), with (ii) a trend for the opposite direction (more for the inferior pathway than the superior pathway) for matching (t ¼ 1.83, p ¼ 0.07). This resulted in a significant pathway (superior versus inferior vOT) by task (matching versus speaking) interaction (F (54) ¼ 21.86, p < 0.001).

Semantic versus unfamiliar stimuli
Connectivity in the inferior pathway increased with semantic content for both the matching and the speaking tasks (Fig. 4). In the superior pathway, connectivity increased for semantic content during matching tasks but not during speaking tasks (Table 3). Consequently, during the speaking tasks, the effect of semantic content was less in the superior pathway than the inferior pathway (t ¼ 2.3, p < 0.05). This resulted in a significant pathway (superior versus inferior) by task (matching versus speaking) interaction for the modulatory effects (F (54) ¼ 6.55, p ¼ 0.014).

Discussion
This study investigated whether activation and connectivity differed, according to experimental task or stimuli, in superior and inferior parts of ventral occipito-temporal (vOT) cortex. An extensive review of the literature did not generate any clear hypotheses for a functional dissociation in these regions although it was clear that the co-ordinates of peak vOT activation vary considerably in the Z-axis (see Table 1). Our aim was to identify which task and stimulus variables influenced activation in the z-axis while excluding explanations of this variability in terms of random inter-subject variability in functional anatomy or insufficient spatial resolution in fMRI data.
To search for functional differences in the response of superior and inferior vOT subregions, we tested for effects of task (matching versus speech production), familiarity (familiar versus meaningless items) and stimulus type (letter strings versus pictures) when 58 right-handed healthy subjects who all spoke English as a first language performed 8 different visual processing conditions. The results show condition dependent effects that dissociate the function of superior and inferior parts of vOT at y ¼ À60mm. The superior region is most likely to be part of fusiform region FG4. The inferior region is most likely to be part of fusiform region FG2, see Fig. 3. Our discussion of these results below considers the function of these two regions and the implication of our findings for future studies of vOT function in neurologically normal and clinical populations.
The most striking finding was that activation in superior vOT regions depends on the nature of the task. More attention demanding tasks increase "superior vOT" activation, even when stimuli were held constant. This was demonstrated by increased vOT activation in the most superior part of the occipito-temporal sulcus when participants were attending to unfamiliar stimuli (i.e. Greek letter strings and pictures of nonobjects) and making perceptual matching decisions compared to when the same participants said "1,2,3" in response to the same stimulia task that does not require them to pay attention to the perceptual content of the stimuli. Greater superior vOT activation was also found for semantic matching decisions on written words compared to the more familiar task of reading aloud. Although the demand on semantic processing could explain the Table 3 Connection strengths (in Hz): Strength of endogenous and modulatory connections during matching (A) and speaking (B) tasks. Abbreviations: Pos ¼ input region in posterior FG2, Sup ¼ superior (middle) vOT (posterior-superior FG4), Inf ¼ inferior (middle) vOT (anterior-inferior FG2), Ant ¼ anterior vOT (FG4). superior vOT responses to written words, it cannot explain the superior vOT response to perceptual matching decisions on unfamiliar stimuli. We therefore suggest that both task effects can be more parsimoniously explained by increased attention to visual input enhancing activation in superior vOT more than inferior vOT. The DCM analyses provided further evidence for a dissociation between superior and inferior vOT pathways. While both superior and inferior pathways were found to drive activation in anterior OTs, the relative contribution of the two pathways depended on the nature of the task. In the superior pathway, connectivity strength increased for semantic content during matching but irrespective of semantic content during speaking. In contrast, connectivity strength in the inferior pathway, increased with semantic content during matching and speaking tasks.
Our findings of a dissociation in activation and connectivity in superior and inferior vOT regions have implications for research into the interaction between left vOT and brain regions subserving higher order language function (Price and Devlin, 2011;Woodhead et al., 2013;Perrone-Bertolotti et al., 2017). More specifically, our results lead us to predict that superior and inferior vOT subregions might be differentially sensitive to top down interactions from higher order language and attention areas (Gilbert and Li, 2013). This would complement previous studies showing that different frontal regions interact with posterior and anterior vOT subregions Seghier and Price, 2013). It would also elaborate more specifically on many emerging studies that illustrate how vOT responses are influenced by higher-order language processing and attention Kawabata Duncan et al., 2013;Schurz et al., 2014;Vandenberghe et al., 2013;Yoncheva et al., 2009;Vogel et al., 2011;Kay and Yeatman, 2017). Our specific prediction is that frontal and parietal regions involved in the control of attention will exert their influence on superior more than inferior vOT subregions. In contrast, frontal and temporal areas involved in linguistic processing will exert influences on inferior more than superior regions. These hypotheses need to be tested in future studies, however, there is already evidence that (i) white matter tracts to vOT vary along the z-axis (Yeatman et al., 2012) with temporal-occipital connections being dorsal to ventral-occipital connections (Rauschecker et al., 2011. Our DCM findings suggest that the strength of these different dorsal and ventral inputs are modulated by task and stimulus type. The dissociation of the superior and inferior vOT pathways adds to the growing body of evidence demonstrating that reading is supported by multiple pathways operating in parallel (Iwata, 1984;Ischebeck et al., 2004;Sakurai, 2004;Valdois et al., 2006;Ben-Shachar et al., 2007;Schlaggar and McCandliss, 2007;Kherif et al., 2009;Levy et al., 2009;Rosazza et al., 2009;Wilson et al., 2009;Jobard et al., 2011;Richardson et al., 2011;Seghier et al., 2012;Yvert et al., 2012). Such findings have implications for explaining variability in the symptoms of patients with left vOT damage (Sakurai et al., 2000;Leff et al., 2001;Cohen et al., 2003;Sakurai, 2004;Henry et al., 2005;Gaillard et al., 2006;Newhart et al., 2007;Pyun et al., 2007;Ino et al., 2008;Tsapkini et al., 2011;Seghier et al., 2012). Our study motivates future investigations into how performance differs in patients who have damage to either the superior or inferior vOT. Our knowledge of vOT function may also be enhanced by electrical or magnetic brain stimulation (McKeefry et al., 2009;Duncan et al., 2010) or intracranial recordings (Allison et al., 1994;Nobre and McCarthy, 1995;Jung et al., 2008;Hamam e et al., 2013) directed to superior and inferior vOT subregions. Such studies may eventually lead to more efficient classification of alexia which could have implications for selecting the appropriate course of rehabilitation.
Our findings also have many implications regarding the functional properties of vOT in object perception and recognition. For instance, they suggest that differences in demands on perceptual discrimination need to be considered when designing control/baseline conditions for word or picture stimuli. Such differences in perceptual processing may also explain other previous findings; for instance the response in superior vOT subregions to letter strings versus single letters (James et al., 2005) and the impact of visual crowding on familiar letter processing in vOT (Freeman et al., 2012). Moreover, characterizing differences in activation along the Z-axis may help to understand better the many interactions that vOT entertains with other brain regions. For instance, distinct connectivity profiles of neighbouring regions around the occipito-temporal sulcus have been reported (Yeo et al., 2011), showing a dorsal cluster (at z ¼ À2mm) was correlated with superior parietal cortex and frontal eye field, whereas a ventral cluster (at z ¼ À14mm) was correlated with the inferior parietal lobule (c.f. Figures 30 and 31 of Yeo et al., 2011). These differences in intrinsic connectivity suggest that vOT subregions may participate in distributed networks that are embedded within largely parallel circuits (see discussion in Yeo et al., 2011).
In summary, by examining the influence of multiple experimental variables, our findings show functional differences in superior and inferior vOT activation that have implications for the design and interpretation of visual processing studies. The variability in reading activation Modulatory (words and objects > unfamiliar stimuli) connections between the four regions of interest included in the dynamic causal modelling (DCM) analysis. Solid lines: significant modulations (p < 0.05), dashed lines: no significant modulations; plus 'þ' sign: positive modulations; minus '-' sign: negative modulations; blue dots: stronger modulations for word than picture stimuli; red dots: stronger modulations for picture than word stimuli (see Table 3 for a list of all effects). (C) Task by connection interaction. Bars represent average modulatory connection strengths (in Hz) from superior to anterior vOT and from inferior to anterior vOT during the matching versus the speaking tasks. Error bars represent AE 1 standard error of the mean. along the z-axis was observed here using group level statistics and smoothed data. Therefore it was not a consequence of random intersubject variability in functional anatomy. Future research is needed to investigate: developmental and retinotopic aspects of vOT function; how involvement of inferior and superior vOT changes with experience; how the inferior and superior vOT subregions are structurally and functionally connected to other brain regions and how this varies over subjects.