The position of visual word forms in the anatomical and representational space of visual categories in occipitotemporal cortex

Abstract Recent reviews emphasized the need for investigating the complexity of multiple subareas of word selectivity and how this relates to selectivity for other visual categories, at the individual level at a high spatial resolution (without normalization or smoothing). To investigate this, both on the brain surface and in the representational space of the occipitotemporal cortex, we presented 19 participants with images of 20 different categories during 7T fMRI. These categories included several word-like conditions, and in addition cover many of the dimensions that have been suggested to define object space, such as animacy and real-world size. In the left hemisphere, we found three subareas of the visual word form area (VWFA) and one extra subarea around the pFus face-selective area. We also observed several areas of selectivity to hands that could consistently guide the localization of word and face areas. No clear predictive anatomical landmarks were found. Results of the right hemisphere were less clear, in part due to weaker word selectivity. In the representational space, word selectivity stood out from other categories. It had multiple neighboring categories at a similar distance (e.g., faces, bodies, hands, cars), so no special relationship was found with, for example, faces. These results enable a consistent and reliable way to locate subareas of word selectivity and may inspire future research into words in the representational space of the occipitotemporal cortex.

Within OTC, category selectivity is organized according to dimensions on which categories vary.For example, in ventral OTC, areas responsive to animate categories are located more laterally than those to inanimate categories ( Grill-Spector & Weiner, 2014).This dimension also organizes representational spaces as detected through MVPA (for a classic example, see Kriegeskorte, 2008).Other examples of organizing dimensions are real-world size ( Konkle & Oliva, 2012) and body topography ( Orlov et al., 2010).More low/mid-level visual dimensions like retinotopy also influence OTC's functional organization (Malach et al., 2002).For example, both face and word areas prefer visual information provided by the fovea ( Grill-Spector et al., 2017;Hasson et al., 2002;Le et al., 2017;Lerma-Usabiaga et al., 2021;Rauschecker et al., 2012).
The location of some category-selective areas can be predicted from certain anatomical landmarks and/or from the location of other areas of selectivity.For example, Weiner et al. (2014) found that the anterior tip of the midfusiform sulcus, that separates the fusiform gyrus into a lateral and medial part, predicts the location of fusiform face area 2 (FFA-2), also called mFus (middle fusiform face area).As another example, Weiner and Grill-Spector (2010) observed that face-and limb-selective areas alternate in a consistent pattern, enabling a reliable way to define face areas (mFus, pFus, and IOG) based on their spatial relation to limb-selective areas (along OTS and ITG) ( Weiner & Grill-Spector, 2010).This alternation appeared more consistent across subjects in the right hemisphere ( Weiner & Grill-Spector, 2010).The authors suggested that this might be influenced by the presence of word-selective areas in the left hemisphere ( Weiner & Grill-Spector, 2010), although they could not test this due to the absence of a word form category in their study.Weiner's and Grill-Spector's (2010) suggestion, that the face-limb area alternation is not consistent in the left hemisphere due to words, is grounded in research that proposes a competition for cortical territory in OTC between faces and words during development.This research is based on the neuronal recycling theory ( Dehaene & Cohen, 2007;Dehaene et al., 2010).This theory states that neural circuits for similar evolutionary older skills (face processing) are repurposed for recently invented skills (written language) ( Dehaene & Cohen, 2007).The specifics of how this recycling unfolds is a topic of debate.Some studies indicated a destructive competition ( Dehaene et al., 2010), but others indicated no competition but rather a fine-tuning of visual object recognition ( Hervais-Adelman et al., 2019).A recent study adds another category on top of faces and words to the mix: Nordt et al. (2021) found increased selectivity for faces and words and decreased selectivity for limbs during learning to read and write.This suggests an interplay between destructive competition and visual fine-tuning.
This challenges the specificity of the proposed competition between faces and words.
The word selectivity in ventral OTC is often called the visual word form area (VWFA), extending from the posterior occipitotemporal sulcus (OTS) to about the midpoint of the fusiform gyrus ( Yeatman et al., 2021).Two recent studies aimed to subdivide this large region based on functional differences.Lerma-Usabiaga et al. (2018) applied two types of contrasts (perceptual and lexical), revealing that the anterior region (mOTS) was more sensitive to lexical characteristics of the stimuli and was structurally more connected to language areas, compared to the posterior region (pOTS).The authors stated that the mOTS corresponds to the central VWFA (as described in Cohen & Dehaene, 2004;Cohen et al., 2002;Dehaene et al., 2002;Vogel et al., 2012) and pOTS to posterior VWFA (as described in Ben-Shachar et al., 2007;Vogel et al., 2012).White et al. (2019) investigated selective spatial attention of the VWFA and concluded that VWFA-1 could in parallel process multiple words, whereas VWFA-2 processed only one word at a time (after an integration of hemifields).VWFA-1 and VWFA-2 correspond to pOTS and mOTS.Yeatman et al. (2021) proposed that VWFA-1 lies next to FFA-1 and VWFA-2 next to FFA-2 and in between those, lies a body/limb-selective area (see also Grill-Spector & Weiner, 2014).These studies divided the large VWFA into two subareas based on functional properties rather than clear anatomical boundaries.
Several researchers recently emphasized the need for high anatomical precision at the level of the individual brain (without spatial normalization or smoothing) when studying the location of the VWFA and possible subareas ( Caffarra et al., 2021;Yeatman et al., 2021).Such singlesubject precision impacts accurate descriptions of spatial organization: the VWFA may seem like one large region due to a lack of such precision ( Caffarra et al., 2021).With a higher precision, the VWFA may be subdivided based on anatomical boundaries instead of (only) based on functional differences.Thus, in our study, we mapped word selectivity (obtained through a general linear model) on the anatomy of the ventral brain surface of each of the 19 subjects that were scanned with 7T fMRI.This allowed a higher spatial resolution, investigating each subject separately, and without applying normalization or smoothing preprocessing steps.This type of study will allow for consistency in the defining subareas of the VWFA and facilitates studies investigating functional distinctions between these subareas ( Caffarra et al., 2021;Yeatman et al., 2021).
In addition to the words category, our study included various potentially relevant categories.This allows word selectivity to be related to other category selectivity.Such a relation may be expected based on findings from, for example, Weiner and Grill-Spector (2010) who found consistent spatial relations between face-and limbselective areas (but they did not include word-selective areas).Findings from the neural recycling theory suggest that categories of faces, hands, and bodies are relevant for the functional neuroanatomy of word selectivity.Building on Yeatman et al. (2021), we also expected bodies and hands (a type of limb) to be relevant because they might lie between two subregions of the VWFA.While one recent study by Boring et al. (2021) did explore faces compared to words at a high spatial resolution in the individual subject space, our study goes further by including several other relevant visual categories (faces, hands, bodies, and other character categories: fake script and numbers) on the word selectivity map and relating them to each other.In addition, we ran a split-half analysis to replicate category selectivity and to gain insights into the response profile of these selective areas.Uncovering a spatial relation between word and other categoryselective areas can enhance the reliability of future studies in locating category-selective areas.
Our study included many categories that vary on the multiple dimensions that organize object space (such as animacy, object size, and retinotopy), and can thus contribute to investigating word selectivity in the representational space, by using MVPA.Based on the competition between words and faces proposed by the neural recycling theories, our study investigates if a special relation between these categories also exists in the representational space.To construct this space, our study included a total of 20 categories varying on several dimensions like animacy and object size.

Subjects
Nineteen subjects participated in the study (mean age: 30.1 ± 6.8 (23-45), sex: 11 males, 8 females).Sixteen subjects were right-handed, and three were left-handed (subject 2, 7, and 17).All subjects had normal or corrected to normal visual acuity.Every subject gave informed consent.All procedures were approved by the ethics committee of Vrije Universiteit Amsterdam and adhered to the guidelines of the Declaration of Helsinki.

Stimuli
Stimuli (500 x 500 pixels, 4.7 degrees visual angle) were presented on a 32ʹʹ LCD screen (69.8 × 39.3 cm, 120 Hz) designed to use in an MRI environment (BOLDscreen, Cambridge Research Systems, UK).The screen resolution was 1920 × 1080 pixels.The screen was positioned at the end of the bore and viewed through a mirror (distance from screen: 220 cm) mounted on the head coil.All stimuli were presented with MATLAB (MathWorks, Inc.) and the Psychophysics Toolbox Version 3 ( Brainard, 1997;Kleiner et al., 2007;Pelli, 1997).Stimuli were presented at a semi-random position around the middle of the screen (center of stimulus maximum 33 pixels/0.32degrees visual angle away) in every trial, to avoid low-level visual confounds (same motivation as for applying the SHINE toolbox to the stimuli, described below) and to make the one-back task less trivial.A fixation dot was always present in the middle of the screen, presented over the stimuli.
The experiment was designed in a way that it could be interesting for different research purposes.We chose 20 conditions (19 categories + 1 scrambled control condition) to have a rich and varied set according to several criteria.The set included both natural and artificial objects, known and unknown shapes (cubies and smoothies, see Op de Beeck et al., 2006), animate and inanimate shapes, objects with a different real-world size, objects differing in how they are used and whether they are a tool or not.The categories could be grouped together in several ways, depending on the exact purpose of the research, according to dimensions such as these.Within each category, different viewpoints/angles on the objects were included and there was sufficient variability in the identity of the stimuli.The following is an alphabetical list of all the categories, for each an example is also depicted in Figure 1: bodies, buildings, cars, cats, chairs, cubies, faces, fake script, fish, flowers, hammers, hands, musical instruments, numbers, scissors, scrambled, smoothies, trees, vegetables, words.For the purposes of this study, certain categories were of particular interest as described in detail in the introduction (although all of them were of interest to construct the representational space of OTC): words, faces, other characters than words (numbers and fake script), other human-related and thus animate categories than faces (bodies and hands), and a typical object category to also include an inanimate category in the set, that could be used as a comparison and reference landmark (chairs).We did not choose the buildings category as this would activate too medial parts of ventral OTC to serve as a control for the categories of interest.In Figure 1, the character group is framed in purple, the relevant animate categories in red, and the reference inanimate category in green.
In the creation of the stimuli, we attempted to minimize the role of low-and mid-level features in the differences between categories.For eight categories (bodies, buildings, cars, cats, chairs, faces, fish, hammers), stimuli were provided by Cohen et al. (2017).They ensured high within-category variability by, for example, images of items in different angles, in different positions ( Cohen et al., 2017).As described above, we also followed these guidelines in the creation of the other categories' images.The following categories (partially) consist of images from the Bank of Standardized Stimuli (BOSS): musical instruments, flowers, vegetables, scissors ( Brodeur et al., 2010).To add more images to these categories, and to create stimuli for other categories (hammers, hands, trees), we created stimuli ourselves by using freely available images on the internet.
The words, the fake script, and the numbers categories consisted of two, three, or four letters/characters with a variable aspect ratio achieved by an ascending or descending angle of the string.They were presented in a bold font filled with a random dotted pattern.Half of the stimuli of each category were presented onto a white background and the other half on a black background, ensuring comparability in retinotopic envelope with other categories.The words lacked a semantic meaning.The words all contained at least 1 vowel (except one of the 30 stimuli) and were all pronounceable.For simplicity, we refer to this category as words rather than letter strings or pseudo-words.The fake script stimuli were created using a combination of two fonts to ensure the letters would take on a varied shape (e.g., short, long, blocked, curvy).
The control category (scrambled images) was created using the randblock(rgb) function, that is available online for MATLAB (Mathworks, Inc.), on all other images in the stimulus set.From all these scrambled images, we chose 30 images (including at least one from each of the other 19 conditions) where the middle of the image contained more variability than the borders as this resembles the general way in which the other categories' images were organized.
Fig. 1.For each of the 20 categories, an example stimulus is shown.The category the stimulus belongs to is indicated on top of the image.The character categories are framed in purple, the relevant animate (human-related) categories in red, and the typical object category used as a reference in green.
For the unknown objects we used the cubies and smoothies stimuli described in the study by Op de Beeck et al. (2006).Both objects could vary on four different shape-dimensions.On each of the dimensions, the object had a value from 0 to 5. We chose 16 shapes for both cubies and smoothies that took an extreme position in this 4D-space (e.g., 0505, 0005) and four more that had a center position within this space (e.g., 2222, 2323).Ten of those 20 stimuli were duplicated before any further processing happened.All 30 stimuli were then flipped with a random angle, making sure the duplicates were angled differently than their original version, to end up with 30 unique stimuli.During creation, the objects' positions on the grey background were ensured to not always be in the center.
Several conditions required extra control to match low and mid-level features between categories, on top of ensuring within-condition variability in viewpoints and identity.Hammers and scissors are often elongated in shape.To avoid retinotopic confounds, we translated and rotated exemplars, included different viewpoints, and selected the scissors open instead of closed.We controlled for the specific shape of trees by using different kinds of trees, by sizing them smaller/larger, and by not always having them appear in the middle part of the background but also on the upper/lower left/right part.We replaced the images of the buildings category that contained trees with other images of buildings that did not contain trees, as trees is supposed to be a separate category.The bodies' images were also adapted so they would have no visible hands, as this is a category on its own.
In the end, 30 stimuli were available for each category.All stimuli were then matched for average luminance, contrast, and spectral energy by the SHINE toolbox ( Willenbockel et al., 2010), inspired by the study of Cohen et al. (2017).Afterward, the identity of each stimulus was ensured to be at least moderately clear/ visible to viewers.We conducted several additional checks: a mean greyvalue image, a standard deviation greyvalue image, and an edge image were created for each category and then compared between categories to ensure they did not differ.
All images, and in addition all data (including raw data of all participants in the surface space) relevant to the analyses presented in this study, are available in the following GIN repository: https://doi .org / 10 .12751 / g -node .96eqnl (Pillet et al., 2024).

Experimental design
Subjects were explained the task before entering the scanner.They were asked to focus on the fixation dot in the center of the screen.A dual task was used.In the first task, a one-back task, subjects pressed a button with the instructed hand (which hand changed every subject) when an image was an identical copy of the image of the previous trial (see Fig. 2B, top row).Thus, if the same object was presented, but rotated or angled differently, they did not press the button.In the second task, subjects pressed a button, with their opposite hand, as soon as possible when a change in category occurred (see Fig. 2B, bottom row).This ensured subjects were processing category information and not just more lowlevel features to perform the first task.To familiarize subjects with the categories, an example stimulus from each category was shown after task explanation.Overall, participants showed a high performance in the category task, with a hit rate of 85% and a false alarm rate of 1%.The one-bask task was more difficult, with a hit rate of 65% and a false alarm rate of 10%.The latter can be explained by the fast pace of the stimuli in a block (one every 0.67 seconds).
The subjects completed seven runs (one subject only six, sometimes eight) in the scanner.Each run compromised of 40 blocks, each lasting 10.05 seconds and consisting of 15 trials/stimuli, along with 3 fixation-only blocks (at the start, middle, and end of the run) that lasted 15 seconds.A run thus lasted 447 seconds (see Fig. 2A).After the first fixation block, the first half of the run presented 20 blocks (one per category) in a random order.After the middle fixation block, the second half of the run presented the 20 blocks in a reversed order.A trial lasted 0.67 seconds, for 40% of the trial (0.27 seconds) only the fixation dot was present, for the other 60% of the trial (0.40 seconds) the stimulus was present.Stimuli within a block were always presented in a random order, with two stimulus repeats for the first task at random points within a block.This resulted in 140.7 seconds of data for each category per subject with 7 runs.Two types of runs existed: type 1 featured images from the first half, while type 2 featured images from the second half of the full set of 30 images per category/block.

(f)MRI acquisition
Data were acquired with a 7T Philips Achieva MRI scanner (Philips Healthcare, Best, The Netherlands) with an 8-channel transmit coil and a 32-channel receive coil (Nova Medical Inc, Wilmington, United States) at the Spinoza Centre for Neuroimaging in Amsterdam (the Netherlands).Functional and anatomical imaging was carried out using universal pulses (for more explanation, see Gras et al., 2017).During brain imaging, respiratory data were collected using a respiration belt and cardiac data were collected using a pulse oximeter on a finger of the left hand.
We used a 3D-EPI sequence.The relevant sequence parameters are volume repetition time (TR) = 1.37 s, echo time (TE) = 16.9 ms, flip angle = 13°, voxel size = 1.79 x 1.79 x 1.8 mm, field-of-view (FOV) = 200 x 200 x 176 mm, and matrix size = 112 x 112 x 98.We also collected images with an identical sequence except for reversed phase-encoding blips or in other words, phase-encoding in the opposite direction.Those images are used to correct for distortions.For all subjects an anatomical scan was collected using a MPRAGE sequence with parameters: TR = 10 ms, TE = 3.3 ms, flip angle = 8°, spatial resolution = 0.8 x 0.8 x 0.8 mm, and matrix size = 288 x 288 x 205.

(f)MRI data
The dataset was formatted according to BIDS ( Gorgolewski et al., 2016) after converting the PAR/REC files to nifti files using DCM2NIIx ( Li et al., 2016) and reorienting them to RAS+ with nibabel in Python.Preprocessing steps were carried out using fMRIPrep 20.2.0 (Esteban et al., 2018;Esteban et al., 2019; RRID:SCR_016216), which is based on Nipype 1.5.1 ( Gorgolewski et al., 2011( Gorgolewski et al., , 2018; RRID:SCR_002502).In short, the 7T BOLD images underwent susceptibility distortion correction, realignment, coregistration to the T1 weighted image, and spatial normalization.Surface reconstruction with Freesurfer was also conducted as part of fMRIprep.Spatially normalized data were not used in our analyses, to preserve the individual subject brain space for maximal spatial and anatomical detail.FMRIPrep's report provides comprehensive details on the preprocessing steps and can be found below.In addition, we looked at tSNR images (average of the time-series divided by the standard deviation) of every subject to assess the quality of the fMRI data.
Subjects 4 to 10 experienced a temporary coil issue, causing a small portion of the right hemisphere to appear darker than the left.To mitigate the impact of this on surface reconstruction of anatomical data, we took several steps.For each anatomical scan, the intensity value distribution was assessed with ITK-SNAP ( Yushkevich et al., 2006) and values were clipped off accordingly.The values were then rescaled using ANTs ( Tustison et al., 2014), and denoising and bias correction of the scans was done using SPM12 and cat12.Surface reconstruction results (from an fMRIprep anat only process) were carefully reviewed and improved, ensuring successful reconstruction of all subjects' brain surfaces.After this, using these results, we ran fMRIprep in full to also conduct functional preprocessing steps.
The shadow did not impact functional data preprocessing.Preprocessing results were thoroughly examined using fMRIPrep reports and by comparing similarity of strength and size of contrasts (e.g., faces versus all other conditions) across different models: (1) a general linear model based on the functional space after realignment, (2) a model after realignment, susceptibility distortion correction, and coregistration to T1w space, and (3) a model after all these steps plus normalization to MNI space.This check ensured the effectiveness of all spatial preprocessing steps.
What follows is the detailed description provided by fMRIPrep.A total of 1 T1-weighted (T1w) images were found within the input BIDS dataset.The T1-weighted (T1w) image was corrected for intensity non-uniformity (INU) with N4BiasFieldCorrection ( Tustison et al., 2010), distributed with ANTs 2.3.3 ( Avants et al., 2008;RRID: SCR_004757), and used as T1w-reference throughout the workflow.The T1w-reference was then skull-stripped with a Nipype implementation of the antsBrainExtraction.sh workflow (from ANTs), using OASIS30ANTs as target template.Brain tissue segmentation of cerebrospinal fluid (CSF), white-matter (WM), and gray-matter (GM) was performed on the brain-extracted T1w using fast (FSL 5.0.9,RRID:SCR_002823; Zhang et al., 2001).Brain surfaces were reconstructed using recon-all (FreeSurfer 6.0.1, RRID:SCR_001847; Dale et al., 1999), and the brain mask estimated previously was refined with a custom variation of the method to reconcile ANTs-derived and FreeSurfer-derived segmentations of the cortical gray-matter of Mindboggle (RRID:SCR_002438; Klein et al., 2017).Volume-based spatial normalization to one standard space (MNI152NLin2009cAsym) was performed through nonlinear registration with antsRegistration (ANTs 2.3.3),using brain-extracted versions of both T1w reference and the T1w template.The following template was selected for spatial normalization: ICBM 152 Nonlinear Asymmetrical template version 2009c [ Fonov et al., 2009; RRID:SCR_008796; TemplateFlow ID: MNI152N-Lin2009cAsym].For each of the six/seven/eight (dependent on the subject) BOLD runs found per subject (across all tasks and sessions), the following preprocessing was performed.First, a reference volume and its skull-stripped version were generated using a custom methodology of fMRIPrep.A B0-nonuniformity map (or fieldmap) was estimated based on two (or more) echo-planar imaging (EPI) references with opposing phase-encoding directions, with 3dQwarp Cox and Hyde (1997) (AFNI 20160207).Based on the estimated susceptibility distortion, a corrected EPI (echo-planar imaging) reference was calculated for a more accurate co-registration with the anatomical reference.The BOLD reference was then coregistered to the T1w reference using bbregister (Free-Surfer) which implements boundary-based registration ( Greve & Fischl, 2009).Co-registration was configured with six degrees of freedom.Head-motion parameters with respect to the BOLD reference (transformation matrices, and six corresponding rotation and translation parameters) are estimated before any spatiotemporal filtering using mcflirt (FSL 5.0.9;Jenkinson et al., 2002).The BOLD time-series were resampled onto the following surfaces (FreeSurfer reconstruction nomenclature): fsaverage, fsnative.The BOLD time-series (including slice-timing correction when applied) were resampled onto their original, native space by applying a single, composite transform to correct for head-motion and susceptibility distortions.These resampled BOLD timeseries will be referred to as preprocessed BOLD in original space, or just preprocessed BOLD.The BOLD timeseries were resampled into standard space, generating a preprocessed BOLD run in MNI152NLin2009cAsym space.First, a reference volume and its skull-stripped version were generated using a custom methodology of fMRIPrep.Several confounding time-series were calculated based on the preprocessed BOLD: framewise displacement (FD), DVARS, and three region-wise global signals.FD was computed using two formulations following Power (absolute sum of relative motions, Power et al., 2014) and Jenkinson (relative root mean square displacement between affines, Jenkinson et al., 2002).FD and DVARS are calculated for each functional run, both using their implementations in Nipype (following the definitions by Power et al., 2014).The three global signals are extracted within the CSF, the WM, and the wholebrain masks.Additionally, a set of physiological regressors were extracted to allow for component-based noise correction (CompCor; Behzadi et al., 2007).Principal components are estimated after high-pass filtering the preprocessed BOLD time-series (using a discrete cosine filter with 128 s cut-off) for the two CompCor variants: temporal (tCompCor) and anatomical (aCompCor).tCompCor components are then calculated from the top 2% variable voxels within the brain mask.For aCompCor, three probabilistic masks (CSF, WM, and combined CSF + WM) are generated in anatomical space.The implementation differs from that of Behzadi et al. (2007) in that instead of eroding the masks by 2 pixels on BOLD space, the aCompCor masks are subtracted from a mask of pixels that likely contain a volume fraction of GM.This mask is obtained by dilating a GM mask extracted from the FreeSurfer's aseg segmentation, and it ensures components are not extracted from voxels containing a minimal fraction of GM.Finally, these masks are resampled into BOLD space and binarized by thresholding at 0.99 (as in the original implementation).Components are also calculated separately within the WM and CSF masks.For each CompCor decomposition, the k components with the largest singular values are retained, such that the retained components' time series are sufficient to explain 50 percent of variance across the nuisance mask (CSF, WM, combined, or temporal).The remaining components are dropped from consideration.The head-motion estimates calculated in the correction step were also placed within the corresponding confounds file.The confound time-series derived from head motion estimates and global signals were expanded with the inclusion of temporal derivatives and quadratic terms for each ( Satterthwaite et al., 2013).Frames that exceeded a threshold of 0.5 mm FD or 1.5 standardized DVARS were annotated as motion outliers.All resamplings can be performed with a single interpolation step by composing all the pertinent transformations (i.e., head-motion transform matrices, susceptibility distortion correction when available, and co-registrations to anatomical and output spaces).Gridded (volumetric) resamplings were performed using antsApplyTransforms (ANTs), configured with Lanczos interpolation to minimize the smoothing effects of other kernels ( Lanczos, 1964).Non-gridded (surface) resamplings were performed using mri_vol2surf (FreeSurfer).Many internal operations of fMRIPrep use Nilearn 0.6.2 ( Abraham et al., 2014; RRID:SCR_001362), mostly within the functional processing workflow.For more details of the pipeline, see the section corresponding to workflows in fMRIPrep's documentation.

Physiological data
The physiological data were formatted according to BIDS ( Gorgolewski et al., 2016) using scanphyslog2bids created by Lukas Snoek and available through GitHub (https://github .com / lukassnoek / scanphyslog2bids).Correction for physiological noise was performed with RETROICOR ( Glover et al., 2000;Hutton et al., 2011) using Fourier expansions of different order for the estimated phases of cardiac pulsation (3rd order), respiration (4th order) and cardio-respiratory interactions (1st order) ( Harvey et al., 2008).The corresponding confound regressors were created using the MATLAB PhysIO toolbox ( Kasper et al., 2017), open-source code available as part of the TAPAS software collection (https://www .translationalneuromodeling .org / tapas).The physiological data collection was not successful for all participants: subjects 2, 3, 4, 5 and the first functional run of subject 18 did not include physiological data.

Univariate analysis
The BOLD response of each subject, run, and voxel was modeled using a general linear model (GLM, also called univariate analysis) with the Statistical Parametric Mapping toolbox (SPM12, Wellcome Centre for Neuroimaging, London) in MATLAB (Mathworks inc.).Within this model, each condition was captured by a regressor with specified onsets and durations, yielding 21 regressors (20 categories + one fixation condition) per run.Several confound regressors from either fMRIPrep preprocessing or from the PhysIO toolbox were also included: the realignment parameters and their temporal derivatives (total: 12) to minimize motion effects on the data and additional (number differed per run and subject) motion outlier regressors (a binary regressor for each volume labeled as outlier); the resulting confound regressors from both respiratory and cardiac data (total: 18) when available.A high-pass filter of 610 seconds (based on the design of the run) was applied.Contrasting each category against the average of all other categories (excluding fixation) generated functional neuroanatomy maps for each subject.In these maps, we focused on the key categories: faces, bodies, hands, words, numbers, fake script, and a typical objects category (chairs) as a reference landmark.Contrasts were family-wise error corrected (p < .05).
All the analyses were conducted within the subjectspecific space, thus without performing spatial normalization.Following Weiner and Grill-Spector's advice (2013, summarized in table 2), this approach preserves the gyral and sulcal patterns of a specific brain for a more accurate localization of activity.They also advise against spatial smoothing to prevent inaccurate activity localization and averaging together regions that are in truth distant on the surface.We adhere to the pipeline outlined by Brodoehl et al. (2020), conducting GLM analysis without spatial normalization or smoothing in the volume-space, with results projected onto the brain surface.
2.6.1.1.Split-half analysis.To quantify and demonstrate the replicability of category selectivity, we conducted a split-half analysis.First, we defined a region of interest (ROI) on the left and right hemisphere surface of each subject, encompassing the middle-anterior ventral surface of the OTC (activity anterior to the inferior occipital gyrus, including the fusiform gyrus and neighboring occipitotemporal sulcus, see Fig. 3A).This ROI was designed to include all second and third clusters of selectivity for words, faces, and hands, bodies, numbers, fake script, and objects (chairs) (see Results).One subject (10) lacked these activity clusters in the right hemisphere, so no ROI was drawn here.We chose only the middle and anterior ventral surface to create these ROIs, to investigate the response profile of word areas (compared to other category-selective areas) that show more selectivity to words than to other characters (this was also confirmed later by the results from this analysis).Based on previous research, posterior activity is less selective to words and more selective to characters in general (this is also visible on the surface maps from the results section).Therefore, posterior activity was not included for any of the ROIs.
Second, for every subject, we ran two GLMs: one with the odd runs and one with the even runs.Using the odd runs GLM, we computed the contrasts of interest, including one category (faces, bodies, hands, words, numbers, fake script, or an objects category (chairs)) versus all other 19 categories.These contrasts were set with an uncorrected p < .0005.Per subject, we intersected each functional contrast with the left and the right hemisphere anatomical ROI.As a result, 7 anatomical-functional ROIs were created per hemisphere for each subject: a ventral middle-anterior face, body, hand, word, number, fake script, and object area.It was not possible to create every ROI for every subject (due to no significant voxels during intersecting).
Third, using the even runs GLM, we selected the beta values of all 21 conditions per ROI and per subject, and averaged them across all runs and all voxels within an ROI.This yielded one beta value per category, per ROI, and per subject.Finally, we averaged across subjects to create a bar graph depicting the averaged beta value per category per ROI and calculated the standard error for each category/bar.Within each ROI, we selected the main relevant category for the functional contrast defining that ROI (e.g., faces for the left ventral middle-anterior face area).Using a paired t-test, we compared this category to the other categories of interest resulting in 6 tests per ROI (e.g., for the face area this would be faces against either bodies, hands, words, fake script, numbers, and chairs).These tests were Bonferroni corrected per ROI (p < .008).We also performed a paired t-test on the difference between the first and second preferred category between the hand-and body-selective voxels in the left hemisphere.We also performed several paired t-tests on the difference of two categories between the left and right hemisphere ROIs: between the word areas for the difference between words and numbers and for the difference between words and fake script (Bonferroni-corrected p < .03)and between the hand areas for the difference between hands and words.

Multi-voxel pattern analysis (MVPA)
FreeSurfer performs an automated aparc parcellation using the Desikan-Killiany atlas.From this parcellation, we constructed a large OTC ROI separately for each hemisphere, by including the fusiform gyrus, inferior temporal, and lateral occipital cortical regions according to the aforementioned parcellation (e.g., Lerma-Usabiaga et al., 2018;Mattioni et al., 2020; see Fig. 3B).We used the CoSMoMVPA toolbox ( Oosterhof et al., 2016) for MATLAB (Mathworks, Inc.).A multi-voxel pattern in response to each of the conditions for each run was constructed by using the beta coefficient estimates for all voxels present within the ROI.We used the crossvalidated Mahalanobis distance ( Walther et al., 2016), also called linear discriminant contrast (LDC), to decode the condition for every possible pair of conditions (excluding the fixation condition), such as faces versus scenes, words versus numbers.The code for this was written by J. Brendan Ritchie.The results were presented in a socalled dissimilarity matrix where each point in the matrix reflected the distance between the condition linked to that row and the condition linked to that column of the matrix.The distance signified a measure of dissimilarity between the multi-voxel pattern of the ROI in response to the row-condition versus the column-condition.The higher the distance, the more dissimilar the patterns of these two conditions were.The distance estimates for each condition pair were created using a cross-validation scheme where the data are split up into training and testing folds according to the standard leave-one-run-out partitioner.Each run was used as the testing fold while all others were used for the training fold once, generating as many distance estimates as there were runs available for that subject.These estimates were then averaged, and this average was placed in the appropriate spot in the dissimilarity matrix.Then, we normalized each subject matrix, per ROI, by dividing all the values inside by the maximum value of that matrix.Consequently, we averaged the matrix across subjects per ROI.To show the reliability of these matrices, per ROI, each subject matrix was correlated with the average (calculated without that subject) matrix.We performed several paired t-tests (with a Bonferroni-corrected p-value) between different distances, to confirm observations made from the multidimensional scaling plots (described below).
2.6.2.1.Multidimensional scaling (MDS) and Procrustes transformations.Multidimensional scaling (MDS) was used to visualize the main dimensions underlying the patterns in the representational dissimilarity matrices in a two-dimensional space where the distance between the points in this space was a measure for how dissimilar these points were: the higher the distance, the more dissimilar.We used the built-in MATLAB (Mathworks, Inc.) function mdscale with the default parameters, while minimizing the default goodness-of-fit criterion: stress, and using a 100 replicates of the scaling.We applied MDS on the matrices, averaged across subjects, keeping left and right OTC separate.In addition, we also applied MDS on the normalized dissimilarity matrices of each individual participant.The MDS results of every subject, per ROI, were transformed to the average MDS results using a Procrustes transformation (using the built-in MATLAB (Mathworks, Inc.) function procrustes).We then visualized the average MDS results in a 2D space per ROI.In this space per ROI, for each category, 19 lines were drawn using the Procrustes-transformed individual subject MDS position of that category.The line started in the dot of a category and ended at the coordinates (Procrustestransformed MDS results) of that category for the individual participant.

Widespread and abundant activity to all categories of interest
We created a map of the functional neuroanatomy in response to the main conditions in our experiment.Each subject's brain surface was reconstructed and results from various contrasts, calculated using the GLM, were projected onto this surface.Figure 4 shows four exemplary subject surfaces (surfaces of all subjects are available in Supplementary Fig. 2).On each surface, selective activation (contrast of one versus all other categories except fixation, p < .05,FWE corrected) to several categories was displayed: to faces (orange), to bodies (ochre), to hands (yellow), to words (dark blue), to fake script (medium blue), to numbers (light blue), and to a typical objects category (chairs, medium green) to provide a reference landmark.
Before focusing on words, we noted some general findings.First, concerning category selectivity, it was present for each main category in all our subjects, thus for all types of characters and for all types of humanrelated categories (see Fig. 4).This category selectivity was extensive and widespread.The human-related and character category selectivity appeared on the lateral ventral surface, whereas object selectivity appeared more medial, except in the more posterior parts (posterior to the fusiform gyrus, within the inferior occipital gyrus) of the ventral surface (see Fig. 4).This category selectivity was grouped into separate areas (these areas are described in the following sections).The way these areas were grouped was based on a consistent pattern of selectivity found across participants.This pattern is described in detail in Section 3.1.2.1 and is also included in the Supplementary Results.Sometimes, participants showed some deviation from this standard pattern.In these cases, we used this standard pattern as much as possible to make decisions about which activity to include into areas, but nonetheless these decisions are somewhat uncertain and subjective (examples are given in the Supplementary Results).This variability was the most obvious when certain areas were missing, and this type of variability was made explicit in the following results sections by mentioning which subjects digressed from the standard (and this is also summarized in Supplementary Fig. 1).
Second, we noted several findings concerning overlaps between category selectivity.In Figure 4, overlap between selectivity for different human-related categories is depicted in red, for different character categories in purple, and for human-related categories overlapping with character categories in brown.These types of overlaps appeared all over the ventral surface in a predictable manner (see Fig. 4): where selectivity for different humanrelated categories was nearby, often clusters of overlap between different human-related category selectivity emerged (shown in red).These seemed to act as a transition between the different category selectivity.This was similar for overlap between selectivity for character categories (shown in purple).This was also similar for the overlap between selectivity for human-related and character categories (shown in brown): when selectivity for human-related and character categories appeared near each other, we often also found this type of overlap, suggesting a transition between the two kinds of selectivity.Overlap between selectivity for human-related categories overlapping with selectivity for the objects category is depicted in light green, and for character categories overlapping with the objects category in dark green.These kinds of overlap appeared in a small amount in the posterior parts (posterior to the fusiform gyrus, within the inferior occipital gyrus) of the ventral surface (see, e.g., subject 14 in Supplementary Fig. 2).The overlap between all types of selectivity (to human-related characters and the objects categories) is depicted in pink.As expected, given that each category selectivity was defined as one versus all other categories, this type of overlap was rarely present and if so, it was very small (see, e.g., subject 1 in Fig. 4).

Several word-selective areas exist among areas selective to faces, hands, and bodies in the left hemisphere
To discern the spatial organization of word selectivity, we first investigated where word selectivity (also called the visual word form area (VWFA)) was located on the left hemisphere ventral surface and if we could consistently divide this activity into separate regions.Then, we investigated if this organization was consistent across subjects.Second, we located an important landmark in each subject ( Weiner et al., 2014): the left hemisphere midfusiform sulcus (MFS).The anterior tip of this sulcus can predict the location of the FFA-2 or mFus face area, located on the middle fusiform gyrus ( Weiner et al., 2014).By locating the MFS, we could investigate if this landmark also proved important for locating word selectivity.Third, we explored if the organization of word selectivity might be consistently related to the location of other category selectivity.We refer to Figure 4 for all anatomical descriptions written below.
First, we consistently observed three clusters of word selectivity on the left hemisphere ventral surface of almost every subject (subject 4, 7, and 17 lacked the most anterior cluster and 17 also lacked the posterior cluster, Supplementary Fig. 1 gives an overview of which areas were found in which subjects).The first cluster was located posterior to the fusiform gyrus, within the inferior occipital gyrus, the second one more anterior within the posterior part of the fusiform gyrus and the occipitotemporal sulcus, and the third even more anterior, within the anterior part of the fusiform gyrus and sometimes including the anterior part of the occipitotemporal sulcus.The second cluster appeared in between the first/posterior and third/anterior cluster but was often located more lateral than them, towards/on the occipitotemporal sulcus.To quantify the replicability and size of the word selectivity, we performed a splithalf analysis focused upon the middle and anterior ventral surface.The region of interest thus covered the second and third cluster of selectivity to words, faces, and hands.We were able to identify (in 15 subjects) word-selective voxels using half of the scans.Across these voxels, activity (as quantified by the beta value, see Methods section) to words in the other half of the scans was significantly (i.e., p < .008based on Bonferroni correction) higher than to any of the other main categories of interest (see Fig. 5A, for statistical details see Supplementary Table 1).The word-selective voxels showed a significantly higher response to words than to numbers and to fake script, even though the size of these differences was clearly smaller than when words were compared with objects categories.The division of this word selectivity into three subregions was helped by relating it to the selectivity for other categories.These categories are discussed in the following sections.
Second, we assessed the relevance of the midfusiform sulcus (MFS) to locate the different wordselective clusters on top of its relevance for face areas.In most subjects, the selectivity for faces was plentiful and distributed over most of the ventral surface.First, we determined the location of the MFS ( Weiner et al., 2014).Second, we identified the three face areas in the left hemisphere, as described in Weiner and Grill-Spector (2010): in the inferior occipital gyrus (IOG); FFA1 or pFus, located in the posterior fusiform gyrus; and FFA2 or mFus, located in the mid-fusiform gyrus.This was possible in almost all subjects (subject 13's anterior cluster seemed too anterior to be mFus, subject 14 lacked the Fig. 4. Category-selective regions shown upon annotated right (RH) and left (LH) hemisphere ventral surfaces of four example participants, together with an annotation of how to structure this category selectivity in three clusters of hand, word, and face selectivity.Annotations were made using a circle, each linked to a square, with the number 1, 2, or 3 to indicate the first, second, or third hand (yellow), word (dark blue), and face (orange) area.They also include several arrows that point to the posterior object region (in green), the pFus word area (dark blue), the anterior tip of the mid-fusiform sulcus (white), a possible third body instead of the third hand area (ochre), and lastly, arrows pointing to even more anterior areas (color depends on the category).Color legend on the right.Selective activation (contrast of one versus all other categories except fixation, p < .05,FWE corrected) to faces in orange, to bodies in ochre, to hands in yellow, to words in dark blue, to fake script in medium blue, to numbers in light blue, and to an objects category (chairs) in medium green.Overlap between selectivity for different human-related categories in red, overlap between selectivity for different character categories in purple, overlap between selectivity for human-related categories overlapping with character categories in brown, overlap between selectivity for human-related categories overlapping with selectivity for the objects category in light green, overlap between selectivity for character categories overlapping with the objects category in dark green, and overlap between selectivity for human-related characters and the objects category in pink.pFus, and subject 15 and subject 17 lacked the IOG).In some subjects, the separation between pFus and mFus was not straightforward (subject 16 and subject 19).The replicability of face selectivity overall was tested with the split-half analysis.We were able to identify (in 14 subjects) face-selective voxels across the middle and anterior ventral surface with one half of the scans.Across these voxels, activity (as quantified by the beta value, see Methods section) to faces in the other half of the scans was significantly (i.e., p < .008based on Bonferroni correction) higher than to any of the other main categories of interest (see Fig. 5B, for statistical details see Supplementary Table 1).Third, we investigated the role of the MFS in locating the subregions of the VWFA.Its contribution was not critical but a few relations were often present.We noticed that the second/middle subregion of the VWFA was lateral and more posterior relative to the anterior tip of the MFS.Additionally, the anterior subregion of the VWFA appeared in the neighborhood of the mFus, and this mFus face area was often in line with the anterior tip of the MFS as previously demonstrated by Weiner and Grill-Spector (2010).However, the anterior word area was often located further away (relative to the mFus face area) from the anterior tip of the MFS.
Third and finally, we looked at the selectivity to the other categories besides words and faces.We found that there were at least two, sometimes three (in subject 1, 9, 10, 13, and 19), clusters of selectivity to hands.The first one appeared within the inferior occipital gyrus and lateral occipital sulcus.The second one was located along the middle of the fusiform gyrus, sometimes including the neighboring occipital temporal sulcus.The third one, when present, appeared anterior in the fusiform gyrus.On the surfaces of some subjects (subject 3, 7, 8, and 18), we could not determine the third anterior hand cluster, but instead they did show an anterior cluster of selectivity to bodies where we expected the third hand cluster.The replicability of hand selectivity overall was tested with the split-half analysis.We were able to identify (in 16 subjects) hand-selective voxels across the middle and anterior ventral surface.Across these voxels, activity (as quantified by the beta value, see Methods section) to hands was significantly (i.e., p < .008based on Bonferroni correction) higher than to any of the other main categories of interest (see Fig. 5C, for statistical details see Supplementary Table 1).The hand-selective voxels showed a significantly higher response to hands than to faces and to bodies, but the size of these differences was clearly smaller than when hands were compared with character categories (with the exception of the words category) or objects categories.
Considering that body selectivity has often been used to obtain a detailed picture of face selectivity (e.g., Weiner & Grill-Spector, 2010), we looked at the body-selective areas and found them to lie separate from the hand areas.We also identified body-selective voxels across the middle and anterior ventral surface using the split-half analysis.Across these body-selective voxels (identified in 15 subjects), activity (as quantified by the beta value, see Methods section) to bodies was significantly (i.e., p < .008based on Bonferroni correction) higher than to words, numbers, fake script, and objects (chairs) (see Fig. 5D, for statistical details see Supplementary Table 2).However, activity to bodies was not significantly different from faces (significant only at uncorrected threshold: p < .05)or hands.In contrast, in the hand-selective voxels, the activity to hands was significantly higher than to faces or bodies (described above).To compare the activity to its preferred category and second-preferred category, we compared the hand-and body-selective voxels/areas in a paired t-test.Specifically, we compared the difference between bodies and hands in the body area with the difference between hands and bodies in the hand area.We did not find a significant difference, although there was a trend (t(12) = -1.45,p = .17):the difference between bodies and hands in the body area tended to be smaller than the difference between hands and bodies in the hand area.This is all the more reason to consider hand selectivity as an important property in ventral occipitotemporal cortex.
With the split-half analysis, we could also identify voxels selective to numbers and to fake script across the middle and anterior ventral surface, but there was no evidence for more selectivity to numbers than words, or to fake script than words, in the number and fake script areas.First, across these number-selective voxels (identified in 17 subjects), activity (as quantified by the beta value, see Methods section) to numbers was significantly (i.e., p < .008based on Bonferroni correction) higher than to faces, bodies, hands, fake script, and objects (chairs) (see Supplementary Fig. 3A, for statistical details see Supplementary Table 2).It was not significantly different from words.Second, across these fake script-selective voxels (identified in 17 subjects), activity (as quantified by the beta value, see Methods section) to fake script was significantly (i.e., p < .008based on Bonferroni correction) higher than to faces, bodies, hands, and objects (chairs) (see Supplementary Fig. 3B, for statistical details see Supplementary Table 2).It was significantly lower than words and not different from numbers.Lastly, given that on the map of the ventral surface, we used selectivity to a typical objects category (chairs) as a reference landmark, we also attempted to identify object-selective voxels using the split-half analysis.We identified these object-selective voxels in all 19 subjects and found that activity (as quantified by the beta value, see Methods section) to objects was significantly (i.e., p < .008based on Bonferroni correction) higher than to words, numbers, and fake script, but not different (significant only at uncorrected threshold: p < .05)from faces, bodies, and hands (see Supplementary Fig. 3C, for statistical details see Supplementary Table 2).
3.1.2.1.The location of hand selectivity serves as a reference point to locate word and face selectivity in the left hemisphere.The clusters of selectivity to hands were consistent in location and served as an important reference point to locate the subregions of the VWFA and the typical face areas (IOG, pFus, and mFus) in most subjects.First, for the posterior part of the ventral surface within the inferior occipital gyrus and sometimes including lateral occipital sulcus, we found that the first/posterior hand region was important.This hand-selective area was previously described by Bracci et al. (2010) as a region separate from the extrastriate body area (EBA) and selective to hands more than to whole bodies and other body parts.Adjacent to this first hand region, more medial on the inferior occipital gyrus, we found a faceselective cluster (IOG) and a first word-selective cluster of activity, around this IOG face area.In many of the subjects, on top of this word cluster around IOG, there was even more activity to words and other character categories distributed across the posterior ventral surface (including the inferior occipital gyrus), but still on the more medial side of this first hand region.In some subjects, we also found that a cluster of selectivity to objects and/or a cluster of overlap of object selectivity with character and/or human-related selectivity, separated this first hand cluster from the first face and word cluster (the most clearly visible in subject 1, 3, 6, 9, 11, 12, and 18).
Second, for the middle part of the ventral surface along the fusiform gyrus and sometimes including the occipitotemporal sulcus, the second/middle cluster of selectivity to hands also proved to be a useful reference landmark.In between the first and second cluster of hand selectivity, the second word-selective cluster of activity was located (along the middle occipitotemporal sulcus and fusiform gyrus), with more medial to that word cluster, the second face selectivity cluster (pFus).Due to the central location of the second selectivity cluster to hands, we decided to examine this selectivity also in the volume space (Fig. 6).We found that all selectivity to categories on and around this location of the second hand cluster lined up with the anatomical structure of the brain.We also found that pFus/FFA1 seemed to be joined, or sometimes broken up into two, by another cluster of selectivity to words and/or other characters (visible in subject 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 15, and 16).We called this the pFus word area.To investigate this more closely, we also looked at this area in the volume space (Fig. 7) in two of our subjects.In subject 1, the pFus word area seemed to break the pFus area in two, whereas in subject 11, the pFus word area adjoined the pFus area.
Third, we looked at the anterior part of the fusiform gyrus and occipitotemporal sulcus on the ventral surface.Like mentioned above, in a couple of the subjects, there seemed to be a small third/anterior cluster of selectivity to hands and/or bodies.Around the second hand cluster, in between the second and the third hand selectivity cluster (when present), we found the third face (mFus) and word selectivity cluster.Looking even more anterior than these areas, we also regularly found very anterior face selectivity clusters: in subject 1 (accompanied by a hand selectivity cluster), 3, 4 (+ hand cluster), 5 (accompanied by a word selectivity cluster), 6, 9, 10, 12 (+ word cluster), 13 (if this is not in fact mFus, see above, + word cluster), 16, and 19.

Less word selectivity in the right hemisphere
Previous research observed the most prominent selectivity to words in the left hemisphere and thus studies have often analyzed only the left hemisphere.After investigating the left hemisphere of our subjects, we sought out to compare it to the selectivity found in the right hemisphere.Specifically, we compared if the standard organization of the left hemisphere selectivity clusters was similar in the right hemisphere.This consisted of first checking if all the same types of selectivity were present and if the same number of clusters of each type of selectivity appeared.Second, we examined if these areas were organized in a similar way.We refer to Figure 4 for all anatomical descriptions written below.Overall, we found the same types of selectivity in the right hemisphere as in the left hemisphere.Some types of selectivity were often less clear in appearance in the right versus the left hemisphere (e.g., smaller, more distributed, or a somewhat strange or inconsistent location in many subjects).
We were able to consistently find the posterior and middle word areas (except in subject 9, 10, 11, 16, 18, and 19) of the ventral surface.In several subjects however, this selectivity did not appear selectivity to words specifically but rather to characters in general (posterior area: see subject 1, 5, 8, 14, 16, and 18; middle area: see subject 1, 2, 6, 8, and 14).This was also evidenced by the split-half analysis, described below, that showed no difference between selectivity to words, numbers, or fake script within the word-selective voxels of the middle and anterior ventral surface.The third word area could only be found in some subjects (subject 5, 12, and 18).In subject 2, 8, and 9, there was instead a small selectivity cluster to the numbers character category and in subject 2, there was also a cluster of selectivity to numbers, but more lateral than expected based on the left hemisphere organization.The replicability of word selectivity overall was tested with the split-half analysis, and we were able to identify (in 11 subjects) word-selective voxels across the middle and anterior ventral surface (see Fig. 8A, for statistical details see Supplementary Table 1).Across these voxels, activity (as quantified by the beta value, see Methods section) to words was significantly (i.e., p < .008based on Bonferroni correction) higher than to faces, and bodies, but not different from (significant only at uncorrected threshold p < .05)hands, fake script, and objects/ chairs.It was also not significantly different from numbers.To further look into the lack of difference in selectivity between words, numbers, and fake script in these right-hemisphere word-selective voxels, as opposed to the left-hemisphere word-selective voxels, we performed two paired t-tests to directly compare the two hemispheres.We applied Bonferroni correction at p < .03.In the first test, we compared the difference between words and numbers between the left and right hemisphere word-selective voxels; in the second, we compared the difference between words and fake script.Of note to these comparisons is that in the left hemisphere, we could not identify word-selective voxels in 4 of the 19 subjects, whereas in the right hemisphere we could not identify such voxels in 8 of the 19 subjects.We found that in the left hemisphere, the difference between words and numbers (t(7) = 2.6, p = .04),and between words and fake script (t(7) = 3.78, p = .007),was significantly bigger (marginally unsignificant in case of words versus numbers) as compared to the right hemisphere word-selective voxels.
Like in the left hemisphere, we could indicate three clusters of face selectivity, suited to be defined as IOG, pFus, and mFus in most subjects.Regarding the IOG, all but one subject (subject 10) showed the IOG cluster of face selectivity.Some subjects (1,3,4,5,10,14) did not show a pFus area at all or the activity seemed more likely to be a part of the first face selectivity cluster (IOG) and/ or the third face selectivity cluster (mFus).Like in the left hemisphere, the separation from the third/mFus area was not always clear (e.g., subject 7, 8, and 18).The third cluster of selectivity to faces (mFus) was detected in most subjects (except subject 10, 11, and 17).With the split-half analysis, we were able to identify faceselective (in 11 subjects) voxels across the middle and anterior ventral surface (see Fig. 8B, for statistical details see Supplementary Table 1).Across these voxels, activity (as quantified by the beta value, see Methods section) to faces was significantly (i.e., p < .008based on Bonferroni correction) higher than to any of the other main categories.Like the left hemisphere, we also located the mid-fusiform sulcus in the right hemisphere and found similar results as in the left hemisphere.
Like in the left hemisphere, we often found two, but rarely three, areas of selectivity to hands.The posterior hand area appeared in all the subjects in the right hemisphere.The second cluster of selectivity to hands was also apparent in most of the subjects (except subject 4, 5, 6, 7, 10, and 11).The third cluster of selectivity to hands could not be determined in most of the subjects (only in subject 9, 11, and maybe 17).Also alike in the left hemisphere, we could instead around this location find selectivity to bodies in some subjects (subject 3, 12, 14, and 16).With the split-half analysis, we were able to identify (in 16 subjects) hand-selective voxels across the middle and anterior ventral surface.Across these voxels, activity (as quantified by the beta value, see Methods section) to hands was significantly (i.e., p < .008based on Bonferroni correction) higher than to faces, words, numbers, fake script, and objects (chairs).It was not significantly different from bodies (significant only at uncorrected threshold: p < .05)(see Fig. 8C, for statistical details see Supplementary Table 1).A big difference between the left and right hand areas was the strength of the selectivity for words.Indeed, the difference between hands and words was smaller in the left hand-selective voxels as compared to the right handselective voxels (t(14) = -2.78,p = .02),suggesting that the left hand area is more selective to words than the right hand area.
Regarding the body areas, we again found those separate from the hand areas on the ventral surface.We identified (in 14 subjects) body-selective voxels across the middle and anterior ventral surface using the splithalf analysis.We found that activity (as quantified by the beta value, see Methods section) to bodies was significantly (i.e., p < .008based on Bonferroni correction) higher than to faces and to hands, in contrast to the left hemisphere body-selective voxels.The activity to bodies was also significantly higher than to words, numbers, fake script, and objects (chairs) (see Fig. 8D, for statistical details see Supplementary Table 2).
With the split-half analysis, we could also identify voxels selective to numbers and to fake script across the middle and anterior ventral surface, but there was no evidence for more selectivity to numbers than words or to fake script than words, in the number and fake script areas.First, across these number-selective voxels (identified in 14 subjects), activity (as quantified by the beta value, see Methods section) to numbers was significantly (i.e., p < .008based on Bonferroni correction) higher than to faces, bodies, hands, fake script, and objects (see Supplementary Fig. 4A, for statistical details see Supplementary Table 2).It was not significantly different from words.Second, across these fake script-selective voxels (identified in 11 subjects), activity (as quantified by the beta value, see Methods section) to fake script was significantly (i.e., p < .008based on Bonferroni correction) higher than to faces, bodies, hands, and objects (see Supplementary Fig. 4B, for statistical details see Supplementary Table 2).It did not differ significantly from words and from numbers.Lastly, given that on the map of the ventral surface, we used selectivity to a typical objects category (chairs) as a reference landmark, we also attempted to identify object-selective voxels using the split-half analysis.Across these object-selective voxels (identified in 15 subjects), activity (as quantified by the beta value, see Methods section) to objects was significantly (i.e., p < .008based on Bonferroni correction) higher than to faces and to numbers, but did not differ significantly (significant only at uncorrected threshold: p < .05)from hands, words, and fake script.It was also not different from bodies (see Supplementary Fig. 4C, for statistical details see Supplementary Table 2).
3.1.3.1.The location of hand selectivity cannot serve as a reference point to locate word and face selectivity in the right hemisphere.Then, we compared the locations of these areas in the right hemisphere to the locations in the left hemisphere and investigated if the hand areas in this hemisphere could also consistently locate the VWFA subareas and the typical face areas, like in the left hemisphere.In conclusion, the hand areas could not serve as a reference landmark like in the left hemisphere, in part due to the more complex nature of the hand and word selectivity in the right hemisphere.For more details, we refer to the Supplementary Results.

The organization of word-and other category-selective areas in left-handed subjects
Three subjects in our subject pool were left-handed: subject 2, 7, and 17.We refer to Supplementary Figure 2 for all results described below.Looking at the left hemisphere, subject 2 and subject 7 showed the same organization in their left hemisphere as the right-handed subjects did, as described above.However, subject 17 differed from this.In the left hemisphere, we found the first hand cluster, the second word cluster anterior to that, the pFus cluster medial to that, and the second hand cluster below it.An mFus area also seemed to be present in this subject 17.All the other clusters were missing: the first (unless very small and distributed) and third word cluster, the IOG cluster, and the third hand cluster.
Looking at the right hemisphere of the left-handed subjects, in subject 17, we could find an organization of areas somewhat like the one found in the left hemisphere of our right-handed subjects.The subject showed all first and second hand, word, and face areas.In the right hemisphere of subject 2, we found the standard organization of selectivity like that of the left hemisphere in the right-handed subjects.We found the first hand cluster, the first word cluster, the IOG cluster, the second hand cluster, the second word cluster, pFus, and mFus.This means that we did not clearly see a third cluster of selectivity to words.The right hemisphere organization of selectivity of subject 7 seemed similar but less clear than that of subject 2.

Character categories group together in representational space
To achieve a mapping of word selectivity relative to other visual categories, we investigated the data in two ways: the functional neuroanatomy and the representational space.For every subject, two different matrices were created through MVPA: for the OTC ROI, separately for the left (average number of voxels: 3776.37,standard deviation: 795.02) and the right (average number of voxels: 2894.26,standard deviation: 848.64) hemisphere.Each of these matrices were normalized and then averaged across all subjects.These matrices can be found in Supplementary Figure 5.These matrices were consistent: every subject's matrix was significantly correlated with the average matrix (mean correlation for the left OTC ROI: r = .80,mean correlation for the right OTC ROI: r = .74).
These matrices were then visualized in a twodimensional space using MDS.We called this result the representational space of our categories (see Fig. 9).Within the visualization of the matrices in a 2D space, categories that were located close together shared a more similar neural representation than categories that were further away from each other.Inside this space, we also visualized the Procrustes-transformed MDS results of every participant for each category using one line per participant and found that overall, our subjects presented with consistent results.We looked first at the results from the left OTC ROI.We noted some general findings: a grouping of character categories (words, numbers, fake script), a grouping of animate categories (faces, bodies, cats, hands), and a group of inanimate categories in the middle of the plot with the following categories: cars, fish, chairs, instruments, flowers, vegetables, hammers, scissors, buildings, trees, and cubies.Fish are animate shapes but in MVPA results they appear close to inanimate categories ( Connolly et. al, 2012).Cubies and smoothies were unknown shapes for the participants.The cubies fell with the inanimate group, whereas the smoothies could be included within the animate group in the plot.This might be explained by differences in midlevel features between the two categories, based on previous research results: animate categories have a higher curvilinearity than artifacts ( Levin et al., 2001), perceived curvature predicts the classification of animacy of a texform, and the curvature differences between animals and artifacts can explain to some degree the mixedanimacy search advantage ( Long et al., 2017).We found that the control condition (scrambled) was located separately from all other categories.When looking at the right OTC ROI, we found similar results.
Then, we focused on understanding the position of words and other character categories in particular, inside this MDS space.Characters were grouped together.Using paired t-tests (i.e., p < .006based on Bonferroni correction), we investigated if numbers and fake script indeed were significantly more similar to words than other seemingly close categories like cars and faces.The distance (based on the normalized LDC value of every subject) between words and numbers was significantly smaller than the distance between words and cars in both the left (t(18) = -10.81,p = 2.64*10 -9 ) and the right hemisphere (t(18) = -5.84,p = 3.32*10 -5 ).The same was true when comparing the distance between words and numbers with the distance between words and faces in both the left (t(18) = -11.06,p = 1.86*10 -9 ) and the right hemisphere (t(18) = -12.92,p = 1.53*10 -10 ).To conclude, numbers were significantly more similar in its neural activation pattern to that of words, as compared to other categories that seemed the closest in the MDS space.We then investigated the same for the other character category: fake script.The distance between words and fake script was significantly smaller than the distance between words and cars in both the left (t(18) = -7.07,p = 1.37*10 -6 ) and the right hemisphere (t(18) = -4.18,p = .0006).The same was true when comparing the distance between words and fake script with the distance between words and faces in both the left (t(18) = -10.09,p = 7.82*10 -9 ) and the right hemisphere (t(18) = -12.26,p = 3.55*10 -10 ).As a conclusion, both numbers and fake script were significantly closer to words than other categories.Character categories thus grouped together in the representational space of the OTC.
Second, we explored if faces, compared to other relevant human-related categories, showed a special relation to the position of words in the representational space of the OTC, given that these categories have been previously implicated to compete for cortical territory during development.We performed paired t-tests (i.e., p < .01 based on Bonferroni correction) to investigate if the neural activation pattern of faces was more similar to that of words than bodies and hands.In Figure 9, we observed that bodies and hands were further away than faces.The distance between words and faces was not significantly smaller than the distance between words and bodies in the left (t(18) = -1.18,p = .25)hemisphere, but it was significantly smaller in the right hemisphere (t(18) = -4.31,p = .0004).When comparing the distance between words and faces to the distance between words and hands, we found that the distance between words and faces was not significantly smaller than the distance between words and hands in the left hemisphere (t(18) = -0.38,p = .71),nor in the right hemisphere (t(18) = 1.63, p = .12).To summarize, in the left hemisphere, faces was not the closest category to words: bodies/hands were as close to words as faces was.In the right hemisphere, faces was also not the closest category to words: faces was closer to words than the bodies, but hands was as close to words as faces was.This suggested that in the representational space, no special relationship existed between words and faces specifically, compared to other human-related categories.
Third, we sought out the closest inanimate category (cars) in the MDS space and compared its distance to words with what seemed to be the closest/most representationally similar animate category in the MDS space (faces) to words.The distance between words and faces was not significantly different from the distance between words and cars (left hemisphere: t(18) = 3.94, p = .22;right hemisphere: t(18) = 6.07, p = .15).This suggested that faces, compared to another seemingly close inanimate category, was not more or less similar in terms of the neural activity pattern.This was further evidence against a special relationship between words and faces in the representational space of the OTC.
Finally, we compared the left to the right hemisphere.When looking at the functional neuroanatomy of the ventral OTC, we found that the right hemisphere, as compared to the left hemisphere, showed a weaker selectivity to words.Word-selective areas, identified using the splithalf analysis, were not significantly more responsive to words than to other types of characters in the right hemisphere.In addition, in most of the subjects, we could not localize the third/anterior cluster of word selectivity on the right ventral brain surface.On the other hand, the organization in the representational space of the left and right hemisphere (Fig. 9A, B) seemed very similar.We sought to understand if the left and right hemisphere processed words and other characters in a similar way in terms of representational similarity.To this end, we compared the normalized LDC value of each subject for several pairs of categories between the left and right hemisphere: for words versus numbers and for words versus fake script.We found that one of the two tests was almost significant (i.e., p < .03based on Bonferroni correction): words versus numbers (t(18) = 2.25, p = .04).The other test, words versus fake script, was significant (t(18) = 2.53, p = .02).Thus, the neural pattern of words differed more from the pattern of numbers and fake script in the left hemisphere than in the right hemisphere.This was in accordance with the results from the split-half analysis, where the left hemisphere differentiated between words and other characters, but the right hemisphere did not.

DISCUSSION
In this study, we investigated where visual word forms are located on the ventral surface map of the functional neuroanatomy of the occipitotemporal cortex (OTC).We related these findings to other category-selective areas implicated by the neural recycling theories and the proposal by Yeatman et al. (2021, see Introduction) (other character categories: numbers and fake script, and human-related categories: faces, hands, and bodies).In addition, we also explored word selectivity in the representational space and if there was a special relationship between words and faces (and in extension possibly also hands and bodies) here.We scanned 19 participants with 7T fMRI and presented them with shapes of 20 different categories.Our analyses were conducted at the level of the individual brain (thus, without spatial normalizing or smoothing).
We mapped the functional neuroanatomy of OTC by visualizing the selective activation to characters and human-related categories on the ventral brain surface.We also assessed replicability and the strength of category selectivity through a split-half analysis.To provide a reference point, we included a typical objects category (chairs).We found abundant and wide-spread selectivity to the categories of interest across the ventral surface.
In the left hemisphere of ventral OTC, we consistently identified three clusters of word selectivity across participants.One cluster was situated posteriorly to the fusiform gyrus within the inferior occipital gyrus, one more anterior in the posterior part of the fusiform gyrus and occipitotemporal sulcus, and one even more anterior within the anterior fusiform gyrus and occipitotemporal sulcus.In addition, we found an extra cluster of word selectivity that we called the pFus word area because it was positioned adjacent to or between the pFus face selectivity area.Comparing these word areas to recent functional subdivisions, we suggest that the middle/second and the anterior/third word area correspond with the pOTS/VWFA-1 and mOTS/VWFA-2 respectively ( Lerma-Usabiaga et al., 2018;White et al., 2019).The posterior/first word area aligns with posterior regions, sometimes defined separately from the VWFA, as these regions exhibit selectivity for letters without specific preference for word formations ( James et al., 2005;Strother et al., 2015;Vinckier et al., 2007;Wong et al., 2009;Yeatman et al., 2021).This is also supported by the observation of activity to not just words, but also fake script and numbers in this posterior part of ventral OTC, whereas more anterior the activity was more specific to words.Notably, the designation of the pFus word area as a subarea of the VWFA appears novel.Since previous research has predominantly subdivided the VWFA based on functional differences instead of anatomically determined separations, it could be that the pFus word area could not be discriminated from pOTS/VWFA-1 and/or because these studies did not localize the separate face areas.A recent study by Boring et al. (2021) localized word and face selectivity at high spatial resolution in each individual subject.Comparing our findings to theirs, we suggest that the pFus word area, along with the posterior, middle, and anterior word areas, could be defined in most of their subjects.In about 75% of their subjects, the authors observed selectivity to words medial to selectivity to faces along the mid-fusiform sulcus.We believe we can find (some) evidence for this in a large part of our participants (2,3,5,6,7,8,9,10,11,12,13,15,16) and this medial selectivity could often be linked (but was not necessarily limited) to what we have termed the pFus word area (2,3,5,6,7,8,9,11,12,16).Word selectivity within the left hemisphere was confirmed through the split-half analysis, showing robust selectivity to words in the identified voxels.This selectivity was significantly higher compared to all other categories of interest, particularly evident when contrasting words with noncharacter categories.While some selectivity to other character-related categories (numbers and fake script) was observed in the word-selective areas, it was notably weaker than the selectivity observed for words.
Considering the importance of the mid-fusiform sulcus in identifying face-selective regions ( Weiner et al., 2014), we examined its relevance to the word-selective areas in the left hemisphere of OTC.While it did not play a critical role in locating these word areas, some associations with the word areas were noted.The split-half analysis replicated the face selectivity, demonstrating stronger selectivity to faces in these areas compared to other categories of interest.
Surprisingly, in the left hemisphere, we also identified at least two clusters of hand selectivity: one lateral and posterior (within inferior occipital gyrus and lateral occipital sulcus), which has been demonstrated before by Bracci et al. (2010), and one more in the middle of the ventral surface of OTC (along the middle of the fusiform gyrus, sometimes including the occipitotemporal sulcus).These hand-selective regions exhibited robust selectivity to hands, as confirmed by the split-half analysis, where hand activity surpassed that of all other categories of interest.Interestingly, the difference in activity between hands and words, faces, and bodies was smaller than between hands and objects, numbers, and fake script.This unexpected observation (since words are inanimate shapes, like the other character categories and objects) suggests a potential special connection between words/ letters and hands, possibly due to their frequent association (both in a visual and motoric sense) during typing, writing, and reading tasks.This aligns with recent research by Nordt et al. (2021), indicating a competitive relationship between limb and word selectivity.Additionally, we observed body-selective regions in the ventral left hemisphere, with the split-half analysis confirming their replicability and strong selectivity to hands and faces, albeit not significantly stronger than to bodies.An extra test suggested a trend that the hand-selective areas showed a more distinct selectivity to its preferred category (hands) than the body-selective areas did to bodies.
Remarkably, the hand-selective regions served as reliable landmarks for locating word-and face-selective areas on the left hemisphere ventral surface of OTC.Medial to the first hand was the first word and face area in the inferior occipital gyrus.The second hand area appeared more anterior than the first one around the middle of the surface along fusiform gyrus and occipitotemporal sulcus.In between this first and second hand area, the second word area was located and more medial to that, the pFus face area, which was joined or broken up by what we called the pFus word area.In some of the subjects, there was also a third hand area even more anterior on the ventral OTC surface.Below/around the second hand area and in between this second and the third hand area, we found the mFus face area and the third word area (within anterior fusiform gyrus and occipitotemporal sulcus).These findings align with the proposal by Yeatman et al. (2021) (which was in part based on Grill-Spector & Weiner, 2014), suggesting that VWFA-1 and VWFA-2 are positioned adjacent to the pFus and mFus face areas, with a body/limb area subdividing VWFA-1 and VWFA-2.Interestingly, we observed that the hand areas, more so than the body areas, provided guidance for the subdivisions of word (and face) areas.Moreover, we identified not just one hand area between VWFA-1 and VWFA-2, but two (or even three), contributing to the subdivision between the first/posterior word area and VWFA-1 (second/middle word area).
Regarding the right ventral surface of OTC, we could consistently identify the first and often also the second word-selective cluster.We could not identify the third/ anterior word area, and this aligns with White et al. (2019) that could only define the VWFA-2 in the right hemisphere in a minority of their subjects.Unlike the left hemisphere, selectivity within these right hemisphere word areas leaned more towards characters in general (including numbers and fake script) rather than specifically to words.The split-half analysis confirmed this result, revealing that while word-selective voxels were identifiable, their activity to words did not significantly surpass that to other character categories, consistent with prior research ( Dehaene et al., 2004;Vinckier et al., 2007).Given their response to various letter forms, naming these areas "word-selective" may be debatable.We could rarely identify the third word area, nor could we identify the pFus word area.Like in the left hemisphere, the midfusiform sulcus did not play a critical role in locating the word areas, but did show certain relations to the word areas.We replicated face selectivity with the split-half analysis.We could define the first and often also the second hand area and this selectivity was replicated in the split-half analysis.Interestingly, left hand-selective voxels showed greater selectivity to words compared to right hand-selective voxels.Distinct body areas were found separate from the hand areas, with the split-half analysis revealing their stronger activity to bodies compared to faces and hands, unlike the left hemisphere body area.Due to the more complex nature and location of word and hand selectivity in the right hemisphere, the hand areas could not serve as reliable reference points for locating word and face selectivity.All these findings from the right hemisphere are consistent with the idea of the left hemisphere being language dominant.Notably, among the three left-handed participants, two showed organization similar to right-handed participants in their left hemisphere, while the others displayed typical left hemisphere organization in their right hemisphere.
In the representational space of both the left and right hemisphere OTC, character categories formed a distinct group separate from both animate and inanimate categories.Numbers and fake scripts were notably closer to words than to faces or cars.We examined if faces were significantly closer to words than other human-related categories, aiming to discern any special relationship between words and faces, as one might expect based on the observed competition for territory between these categories during development, as proposed by the neuronal recycling theory.However, faces did not exhibit significantly closer proximity to words compared to bodies and hands.Furthermore, results showed that faces did not lie significantly closer to words than cars (which was the closest inanimate category to words in the plot).This suggests that, although these categories may compete during development for cortical territory, within the representational space of the occipitotemporal cortex there is no special relationship between words and faces.Finally, the visualization of representational space revealed a similar organization in both hemispheres.However, further analysis showed that the neural patterns for words and other categories were more similar in the right hemisphere than in the left hemisphere, consistent with functional neuroanatomy findings in ventral OTC.
The word forms that were used in this study (akin to the numbers and fake script categories) were strings of letters, which consisted of a string of letters of an unknown alphabet and of digits.In our split-half analysis results, word-selective areas in the left hemisphere exhibited significantly higher selectivity to words compared to numbers and fake script, suggesting differentiation between letters of the roman alphabet and other characters.Our letter strings were not pseudowords (all except one did contain at least one vowel and all were pronounceable) and we did not include any real words (no semantic meaning).Importantly, in the past, word selectivity has been localized with a variety of different contrasts (for a detailed overview, see Caffarra et al., 2021).For example, studies have differentiated between words, pseudowords, letter strings, and individual letters (e.g., Vinckier et al., 2007), whereas others have compared text in general to false fonts or even non-linguistic categories such as faces (e.g., Cohen et al., 2002;Dehaene et al., 2010).Only studies that differentiate between several categories like words can isolate lexical-sensitive areas ( Caffarra et al., 2021).Lerma-Usabiaga et al. ( 2018) confirmed this: only specific types of contrast could isolate anterior from posterior word selectivity and vice versa.However, they did find that with more general contrasts (such as word-fixation) they could activate both posterior and anterior word selectivity, but of course could not make any functional distinctions between subregions in this way.In our study, we identified the VWFA by a contrast between letter strings and numerous other categories (some word-like and many not word-like categories).We then divided the VWFA into subregions based on clustering we found within the single-subject high-resolution anatomy of the ventral OTC.We could not make functional distinctions between the found subregions due to a lack of word-like categories and thus different contrast types.Another limitation of this study pertains to the results of the functional neuroanatomy, more specifically the organization of the areas that we identified.Across all subjects, we could deduce a standard organization of word, face, and hand areas in the left hemisphere, but not every subject showed this so clearly.For example, three subjects did not show the third word area in the left hemisphere.This complicated the process of identifying and localizing areas.Supplementary Figure 1 contains an overview of areas found per subject.However, small deviations from a standard organization are to be expected based on individual variability.While we observed this consistent pattern of different category-selective areas in relation to each other, we could not describe any consistently predictive anatomical landmarks across participants (nonetheless, the mid-fusiform sulcus was described), as the variability of the anatomical location of the word areas was too high.This is an important limitation of our study, given that such a finding would have further improved the reliability of localizing the word areas in this study and for future studies.Based on our study, it remains important to include other categories than just words to localize the word areas.
This study used 7T fMRI, offering several advantages.First, it provided data of high quality and spatial resolution, as evidenced, for example, by the observation that in volume-space, activity was nicely restricted to grey matter (see Figures 6 and 7).Second, it allowed us to work at the level of the individual, without spatial normalization or smoothing of the data, preserving unique gyral and sulcal patterns of each brain and thus allowing a more accurate localization of activity ( Weiner & Grill-Spector, 2013), which was a main aim of this study.Spatial smoothing can also lead to averaging together of regions that actually lie distant from each other on the surface ( Weiner & Grill-Spector, 2013), which would have also been detrimental to the aim of our study.Group averaging can also have detrimental effects: it lacks spatial preciseness (especially when areas vary in size and/or location between subjects) and boundaries between areas can get mixed ( Glezer & Riesenhuber, 2013;Wandell et al., 2012).Following recommendations by Brodoehl et al. (2020), our analysis pipeline employed a general linear model analysis projected onto individual brain surfaces without normalization or spatial smoothing.This approach aligns with recent calls to investigate the VWFA at the individual level ( Caffarra et al., 2021;Yeatman et al., 2021).Third, the use of 7T allowed us to present each subject in just one scan session with many different categories, enabling a more holistic view on the functional organization of the OTC.
This study represents, to our knowledge, the first investigation of high spatial resolution mapping of word selectivity and the delineation of its subareas, alongside other potentially relevant categories.Both Caffarra et al. (2021) and Yeatman et al. (2021) recently emphasized the importance of investigating where precisely, at the level of the individual brain, word selectivity was located in OTC and the potential existence of anatomically divisible subareas of word selectivity.Across participants, we consistently identified three subareas of word selectivity and an additional area termed the pFus word area on the ventral surface of the OTC in the left hemisphere.These findings provide a foundation for future studies to consistently define subregions of the visual word form area (VWFA).Furthermore, by relating these word areas to other areas of selectivity, we discovered hand areas that were instrumental in locating these word areas and the well-known face areas (IOG, pFus and mFus), offering an avenue for future studies to incorporate hand categories in localizing VWFA subareas.We also observed that other categories than just faces may be relevant to the competition with words for cortical territory, given that the hand areas provided a reference landmark for the word areas.This aligns with the results found by Nordt et al. (2021).Additionally, our investigation explored the potential special relationship between words and faces within the representational space of the OTC.These categories compete for cortical territory during development according to the neuronal recycling theory ( Dehaene & Cohen, 2007;Dehaene et al., 2010).In the representational space, we found no evidence for a special relationship between words and faces.
Future research could explore differences in functional connectivity between subareas of the VWFA and other brain regions.While studies have examined the anatomical connectivity of the VWFA (for an overview, see Caffarra et al., 2021;Yeatman et al., 2021), investigating the functional connectivity of VWFA subareas could provide novel insights.Furthermore, it would be valuable to investigate potential functional differences among the subareas observed in this study.For example, a processing gradient may exist from the posterior to the anterior subareas of the VWFA.Based on many different studies (e.g., Lerma-Usabiaga et al., 2018;Vinckier et al., 2007;White et al., 2019;Woolnough et al., 2020), Caffarra et al. (2021) proposed a posterior to anterior processing model, suggesting that more posterior regions represent perceptual information of written language while more anterior portions are sensitive to linguistic aspects of words.The authors also considered the temporal dynamics and anatomical connectivity found in left ventral OTC in this word processing model.Additionally, a similar study incorporating pseudowords (considering the natural characteristics of language) and real words could explore functional distinctions between VWFA subregions identified in this study, potentially augmenting our understanding of the results observed in the representational space.Lastly, future research on the VWFA should consider the inclusion of categories beyond faces, such as hands, not only to explore the competition posited by neural recycling theories, but also to investigate the location and function of category-selective subareas.

Fig. 2 .
Fig. 2. (A) A timeline of the block design of a run of the experiment.(B) An example of when the participant had to press the button with the instructed hand for the one-back task (top row), or the category task (bottom row).

Fig. 3 .
Fig. 3. (A) The region of interest (ROI) for the split-half analysis for both the left and right hemisphere, shown using a pink outline on a brain surface of one of the participants.The ROI includes all middle and anterior selectivity to words (dark blue on the surface), faces (orange) and hands (yellow).(B) The region of interest (ROI) for multi-voxel pattern analysis, here shown for both the left and right hemisphere of the occipitotemporal cortex (OTC) in pink on a brain surface of one of the participants.The left and right OTC ROI was created separately by combining the lateral occipital cortex, inferior temporal gyrus, and fusiform gyrus from the Desikan-Killiany atlas incorporated in FreeSurfer.

Fig. 5 .
Fig. 5.The response in word-(A), face-(B), hand-(C), and body-(D) selective voxels of the left hemisphere to seven categories (faces, bodies, hands, words, numbers, fake script, and objects: chairs).Responses were calculated using data that were independent from the data used to select the voxels (see Methods).Error bars represent the standard error, and the lines and stars indicate which paired t-tests between conditions were significant (p < .008).

Fig. 6 .
Fig. 6.Visualization of the second/middle hand area of subject 1 (top) and subject 10 (bottom) in surface space on the left, next to it a zoomed-in visual of the middle ventral surface, and then on the right the corresponding volume space in axial and sagittal view.Black arrows on the zoomed-in surface in the middle indicate the location used to visualize the volume space on the right (this location is also indicated by black arrows on the volume on the right).

Fig. 7 .
Fig. 7. Visualization of the pFus word area of subject 1 (top) and subject 11 (bottom) in surface space on the left, next to it a zoomed-in visual of the middle ventral surface, and then on the right the corresponding volume space in two adjoining slices in coronal view.Black arrows on the zoomed-in surface in the middle indicate the location used to visualize the volume space on the right (this location is also indicated by black arrows on the volume on the right).

Fig. 8 .
Fig. 8.The response in word-(A), face-(B), hand-(C), and body-(D) selective voxels of the right hemisphere to seven categories (faces, bodies, hands, words, numbers, fake script, and objects: chairs).Responses were calculated using data that were independent from the data used to select the voxels (see Methods).Error bars represent the standard error, and the lines and stars indicate which paired t-tests between conditions were significant (p < .008).

Fig. 9 .
Fig. 9. Visualization of the average MVPA matrix of the (A) left OTC ROI and (B) right OTC ROI, using MDS.MDS was also applied on each individual MVPA matrix, and these MDS results were then Procrustes transformed to the average MDS result.The individuals' Procrustes-transformed MDS results are depicted by one line per subject for every category within the space.