Spatial organization of face part representations within face-selective areas revealed by high-field fMRI

Regions sensitive to specific object categories as well as organized spatial patterns sensitive to different features have been found across the whole ventral temporal cortex (VTC). However, it is unclear that within each object category region, how specific feature representations are organized to support object identification. Would object features, such as object parts, be represented in fine-scale spatial organization within object category-specific regions? Here we used high-field 7T fMRI to examine the spatial organization of neural tuning to different face parts within each face-selective region. Our results show consistent spatial organization across individuals that within right posterior fusiform face area (pFFA) and right occipital face area (OFA), the posterior portion of each region was biased to eyes, while the anterior portion was biased to mouth and chin stimuli. Our results demonstrate that within the occipital and fusiform face processing regions, there exist systematic spatial organizations of neural tuning to different face parts that support further computation combining them.


Introduction 2
The ventral temporal cortex (VTC) in the brain supports our remarkable ability to 3 recognize objects rapidly and accurately from the visual input in everyday life. Identity 4 information is extracted from visual input through multiple stages of representation. To 5 fully understand the neural mechanism of object processing, it is critical to know how 6 these representations are physically applied to anatomical neural structure in the VTC. feature space (8,9). Human fMRI studies also found the neural response patterns in 9 FFA or OFA could distinguish different face parts (10), suggesting voxels within same 10 face-selective region may have different face feature tuning. In addition, previous study 11 also suggests that the spatial distribution of a face feature may be relevant to the 12 physical location of that feature in a face (11). 13 The sizes of the face-selective regions in VTC are relatively small, spanning 14 about 1 cm. To investigate the potential spatial organization within each face region, 15 high-resolution fMRI with sufficient sensitivity and spatial precision is necessary. With 16 high-field fMRI, fine-scale patterns have been observed in early visual cortex, such as 17 columnar-like structures in V1, V2, V3, V3a, and hMT (12-17). These findings validate 18 the feasibility of using high-field fMRI to reveal fine-scale (several mm) structures in the 19 visual cortex. 20 Here we used 7T fMRI to examine whether category-specific feature information, 21 such as object parts, would be represented in certain spatial organization within object 22 selective regions. With faces as stimuli, the high-field fMRI allowed for measuring 23 The face parts were generated from 20 male faces. Each stimulus was presented around the fixation and participants performed a one-back task during the scan. (B) Average fMRI responses to different face parts in each face-selective region. Generally, eyes elicited higher responses than responses to nose, hair, and chin in most of the regions. No significant difference was observed between eyes and mouth responses. Error bars reflect ±1 SEM. amplitude difference between conditions. Then we directly contrasted the two patterns 1 and projected the difference onto the inflated brain surface. A spatial pattern was 2 observed in the right pFFA consistently across all participants ( Figure 2). In the 3 dimension parallel to the mid-fusiform gyrus, the posterior portion of the right pFFA was 4 biased to respond more to eyes, whereas the anterior portion was biased to respond 5 more to mouth. Note that in participant S2, the direction of MFS was more lateral-medial 6 near the position of the right pFFA, and interestingly the eyes-mouth contrast map was 7 oriented in the same direction, even though S2's map may initially appear oriented 8 differently from that of other participants. It suggests the anatomical orientation of MFS 9 is highly correlated with such functional spatial organization of face parts. 10 1 To further demonstrate such relationship, and also to provide a quantitative 2 description of the spatial organization of face parts within right pFFA, we grouped voxels 3 based on their location along the direction parallel to the MFS, and averaged the voxel 4 responses at each location to generate the response profile on this posterior-anterior 5 dimension ( Figure 3A, see details in Method). The group-averaged results clearly 6 showed that the difference between eyes and mouth signals consistently changed along 7 the posterior-anterior direction in the right pFFA ( Figure 3B). To quantify this trend, we 8 Figure 2. Contrast maps between normalized fMRI responses to eyes and mouth in the right pFFA illustrated in the volume (upper) or on inflated cortical surface (lower) of each participant. On the surface, the mid-fusiform sulcus is shown in dark gray with orange outline. The blue line outlines the right pFFA identified with an independent localizer scan. Aligned with the direction of mid-fusiform sulcus, the posterior part of right pFFA shows response bias to eyes (warm colors), while the anterior part illustrates mouth bias (cool colors). The posterior to anterior pattern is generally consistent across participants. further calculated the correlation coefficient between the eyes-mouth neural response 1 differences and the position index along the posterior-anterior dimension (i.e., more 2 posterior location was assigned with smaller value) in each participant. The group result 3 revealed a significant negative correlation (t(5)=8.36, p=0.0004, Cohen's d=3.41), 4 confirming the consistency across participants that the posterior part of right pFFA was 5 biased to eyes and anterior part was biased to mouth. 6 The contrast map highlighted the differences between eyes and mouth 7 responses. However, the original response patterns elicited by eyes and mouth share 8 the same underlying general "face-related" pattern, which was subtracted out when 9 contrasting the two response patterns. To extract the response profile of individual face 10 parts, we used independently obtained response patterns of whole faces as the general 11 face-related pattern and regressed it out from the eyes and mouth response patterns. 12 With the general pattern regressed out, we observed distinct spatial profiles elicited by 13 eyes and mouth in the right pFFA ( Figure 3D top panel). The eye-biased voxels were 14 more posterior than that of mouth-biased voxels, which is consistent with the contrast 15 map shown in Figure 2. 16 Removing the general pattern helped to reveal the pattern of voxel biases for 17 individual face parts. While removing the face-related general pattern achieved this 18 goal, it is possible that removing the general face-related pattern distorted the parts 19 generated response patterns since they share high-level visual information (i.e., face 20 and eyes stimuli are both face-related). Therefore, it is important to check whether the 21 parts specific patterns could be seen with removal of a common face-independent 22 signal distribution. In five of the six participants, data were also obtained when they 23 viewed everyday objects. Indeed, non-face objects generated significantly lower but 1 spatially similar patterns of activation compared with faces across the right pFFA 2 ( Figure 3C). This result suggests that there is a general intrinsic BOLD sensitivity profile 3 in the pFFA regardless of the stimuli. Thus we proceeded to use the response patterns 4 of either faces or non-face everyday objects to regress out the intrinsic baseline profile 5 from eyes and mouth response patterns, and plotted face part specific patterns along  The spatial profiles of whole faces and everyday objects in the right pFFA. Both profiles showed similar patterns, though the whole face responses were generally higher than object responses. (D) The spatial profile of individual face part responses, after regressing out the general fMRI response patterns elicited by either the whole faces (upper) or everyday objects (lower). In both cases, distinct spatial profiles were observed between eyes and mouth in the right pFFA.
To control for the potential contribution from retinotopic bias of the different face 1 part conditions, in our experiment, all stimuli were presented at the fixation with a 1.3º 2 horizontal jitter either to the left or to the right alternatively in different trials within a 3 block. Even though the stimuli were centered on the fixation, because of the nature of 4 the face parts (e.g., two eyes are apart, chin depicts the outline of the face), there were 5 still small degrees (less than 3º) of retinotopic differences between the eyes and mouth 6 conditions. To further rule out the retinotopic contribution, as well as to replicate our 7 finding, we did two control experiments. In the first control experiment (Control 8 Experiment 1), data were obtained with a single eye or mouth presented at either the 9 near central (1.3º) or near peripheral (3.1º) location during the scan (see Figure S1A ). 10 This 2x2 (face parts x location) design allowed us to contrasted fMRI response patterns 11 between face parts (single eye vs. mouth) regardless the stimulus location, or between 12 locations (near central vs. near peripheral) regardless the face parts presented. Data 13 from six participants were collected in the Control Experiment 1 and two of them (S1 14 and S5) also participated main experiment. In all participants, the eye vs. mouth 15 contrast revealed spatial patterns in the right pFFA very similar to that in the main 16 experiment ( Figure S1B). However, contrasting fMRI responses between the near 17 central and near peripheral location regardless the face parts failed to reveal consistent 18 patterns across participants ( Figure S1C). These results further support that the 19 different fMRI response patterns we observed in the right pFFA were contributed by 20 face feature differences rather than retinotopic bias. In the second control experiment 21 (Control Experiment 2), we used top and bottom parts of the face as stimuli and 22 counterbalance the stimulus location to verify the spatial organization in the right pFFA.   and vertical (y) axis in the visual field. Although generally more voxels in the right pFFA 5 were bias to left visual field, which is consistent with previous report (27,28), we 6 observed no consistent spatial pattern in either x or y map of the right pFFA across 7 participants ( Figure S3B).  Among these regions, the right OFA also had distinct response patterns for eyes and 6 mouth along the posterior-anterior dimension (Figure 4), similar to what we observed in 7 the right pFFA. Group negative correlation was observed between the eyes-mouth 8 Cohen's d=1.48). Such pattern was also observed in the Control Experiments. While the 2 right OFA and right pFFA have been considered as sensitive to facial components and 3 whole faces respectively, in our data they showed similar spatial profiles of eyes and 4 mouth responses along the posterior-anterior dimension. This is consistent with, but 5 adds some constraints to, the idea that the right pFFA may receive face feature 6 information from right OFA for further processing (29, 30). In other face-selective 7 regions, no consistent pattern was observed, as the correlations between the eyes-8 mouth difference and posterior-anterior location were not significant (ts<1.09, ps>0.32, 9 see Figure 4A). 10 Beside the anterior-posterior dimension, the spatial representation of parts could 1 organize in other spatial dimensions, such as the lateral-medial dimension in the VTC, 1 or even in more complex nonlinear patterns. However, since the right pFFA located 2 within the sulcus (MFS) in most of our participants, such that voxels distant from each 3 other on the surface along the lateral-medial dimension could be spatially adjacent in 4 the volume space, making it difficult to accurately reconstruct the spatial pattern along 5 the lateral-medial dimension within the sulcus. Nevertheless, the finding of anterior-6 posterior organization of face parts is sufficient to demonstrate the existence of fine-7 scale feature map within object-selective regions. 8 Our stimuli also included nose, hair, and chin images, thus gave us a chance to 9 examine their spatial profiles in each face-selective ROI as we did for eyes and mouth, 10 though their neural responses were generally lower than that from eyes and mouth. 11 Chin and mouth elicited similar response patterns along the anterior-posterior 12 dimension in the right pFFA and right OFA after regressing out general spatial patterns 13 ( Figure 5A). By directly contrasting fMRI response patterns between eyes and chin, 14 similar spatial profiles were revealed in the right pFFA and right OFA that the posterior 15 part was biased to eyes and anterior part was biased to chin (ts>5.30, ps<0.01, see 16 Figure 5B). We also observed a similar though less obvious profile in the left FFA 17  In the current study, five face parts (i.e., eyes, nose, mouth, hair, and chin) were 8 tested, with eyes and mouth showed most distinct spatial organizations in the right 9 pFFA and right OFA. No obvious spatial pattern was observed for nose and hair in face-10 selective regions, but it would be premature to conclude that there is no fine-scale 11 spatial organization for their neural representations. For one, the nose and hair stimuli 12 elicited lower fMRI responses compared with eyes and mouth stimuli, making it more 13 difficult to detect potential spatial patterns. The observation that eyes and mouth elicited 14 most differential patterns is consistent with them providing more information about faces 15 than other features in face processing (22,23). The dominance of eyes and mouth in 16 face-selective regions could be considered as a form of cortical magnification of more 17 informative features, a common principle of functional organization in sensory cortex 18 The discovery that some face parts are represented within the face-processing  Another previous study also tested the idea of "faciotopy", that there are cortical 20 patches representing different face features within a face-selective region and the 21 spatial organization of these feature patches on the cortical surface would reflect the 22 physical relationships of face features (11). Their results showed that in the OFA and 23 FFA, the differences between neural response patterns of face parts were correlated 1 with physical distances between face parts in a face. Our results support the existence 2 of stable organization of face features in the right OFA and right pFFA, especially for 3 eyes and mouth. The possible mechanism underlying such faciotopy organization is the 4 local-to-global computation, that physically adjacent face parts interact more than parts 5 far apart from each other during the early stages of processing, thus it is more efficient 6 for them to have neural representations near each other. However, in the current study, 7 we did not find the posterior bias pattern for hair as we did for eyes, even though hair 8 and eyes are spatially adjacent. which could be caused by the hair being generally less 9 invariant and less informative in the face identification. features the representation manifests at large spatial scale across the whole VTC (e.g., 20 large/small, animate/inanimate), and for more specific features such as face parts, it 21 manifests at finer spatial scales within specific object processing regions. Under this 22 view, we would expect more fine-scale feature organizations to be revealed with more 23 advanced neural imaging tools, which are critical for fully understanding the neural 1 algorithm of object processing in the VTC.

Materials and Methods 9
Participants 10 Six (3 females) human participants were recruited in the main experiment. Six (5 11 females) participants (two of them also participated main experiment) were recruited in 12 the Control Experiment 1. Three participants (2 females) from main experiment finished 13 the pRF experiment. Ten participants (1 female) were recuited in the Control 14 Experiment 2, but in two participants right pFFA was failed to be localized, thus we 15 excluded these two participants from the analyses. All participants were between the 16 ages of 21 and 27, right-handed, and had normal or corrected to normal visual acuity. 17 They were recruited from the Chinese Academy of Sciences community with informed 18 consent and received payment for their participation in the experiment. The experiment 19 was approved by the Committee on the Use of Human Subjects at the Institute of 1 Biophysics of Chinese Academy of Sciences. 2

Stimuli and Experimental design 3
In the main experiment, for face stimuli, 20 unique front-view Asian male face 4 images were used. Each face image was gray-scaled and further divided into five parts 5 (i.e., eyes, nose, mouth, hair, and chin. See Figure 1A). Twenty unique gray-scaled 6 everyday objects were used as comparison stimuli. The full face and object images on 7 average subtended around 5º x 7º. For stimuli used in localizer scans, video clips of 8 faces, objects, and scrambled objects were used (For detail see (42)). 9 There were total of seven stimulus conditions (i.e., eyes, nose, mouth, hair, chin, 10 whole face, and object condition). Each main experimental run contained two blocks of 11 each stimulus condition. In the scan of participant S6, the object condition was not 12 included. Each stimulus block lasted 16 sec and contained 20 images of the same type. 13 Each image was presented for 600 msec at fixation and followed by a 200-msec blank 14 interval. There was a 16-sec blank fixation block at the beginning, the middle, and the 15 end of each run. Participants performed a 1-back task that they were asked to press a 16 button when two successive images were the same. To balance the spatial property in 17 the visual field of different images, each image was presented at a slightly shifted 18 location, 1.3º either to the left or to the right of the fixation alternately in different trials 19 within a block. Participants were instructed to maintain central fixation throughout the 20

task. 21
Each localizer run contained four 18-sec blocks of each of the three stimulus 1 conditions (i.e., faces, everyday non-face objects, and scrambled objects) shown in a 2 balanced block order. The 12 stimulus blocks were interleaved by three 18-sec fixation 3 blocks inserted at the beginning, middle and end of each run. Each block contained 6 4 video clips of a given stimulus category, each presented for 3 sec. Participants were 5 asked to watch the video without any task. No fixation point was presented during the 6 scan. 7 The 8 experimental runs and the 2 localizer runs were completed within the same 8 scan session for each participant. 9 In the Control Experiment 1, we used a similar block design as that in the main  In the pRF experiment, we adopted stimuli and analysis code from analyzePRF 1 package (http://kendrickkay.net/analyzePRF/). There were total of four conditions (i.e., 2 clockwise wedges, counterclockwise wedges, expanding rings, contracting rings). The 3 angular span of the wedges was 45º, and it revolved for 32 seconds per cycle. In the 4 rings conditions, the rings swept 28 seconds per cycle with 4 seconds of rest followed. 5 Colored object images were presented on the wedges or rings. The rings and wedges 6 were presented within a radius of 10º. For each run there was a 22-sec blank fixation 7 block at the beginning and the end. Participants performed a change detection task that 8 they pressed a button whenever the fixation color changed. In each run, only one kind

Data analysis 23
Anatomical data were analyzed with FreeSurfer (Cortechs Inc, Charlestown, MA) 1 and custom MATLAB codes. To enhance the contrast between white and gray matter, 2 T1-weighted images were divided by PD-weighted images (45). Anatomical data were 3 further processed with FreeSurfer to reconstruct the cortical surface models. 4 Functional data were analyzed with AFNI (http://afni.nimh.nih.gov), FreeSurfer, 5 fROI (http://froi.sourceforge.net)), and custom MATLAB codes. Data preprocessing 6 included slice-timing correction, motion correction, removing physiological noise with 7 respiration and pulse signals, distortion correction with reversed phase encoding EPI 8 images, and intensity normalization. For the localizer runs only, spatial smoothing was 9 applied (Gaussian kernel, 2 mm full width at half maximum). After preprocessing, 10 function images were co-registered to anatomic images for each participant. To obtain 11 the average response amplitude for each voxel in the specific stimulus condition for 12 each individual observer, voxel time courses were fitted by a general linear model 13 (GLM), whereby each condition was modeled by a boxcar regressor (matched in 14 stimulus duration) and then convolved with a gamma function (delta = 2.25, tau = 1.25). 15 The resulting beta weights were used to characterize the averaged response 16

amplitudes. 17
The face-selective ROIs were identified by contrasting functional data between 18 face and everyday-object conditions in the localizer runs. Specifically, FFA and OFA 19 was defined as the set of continuous voxels in fusiform gyrus and inferior occipital 20 gyrus, respectively, that showed significantly higher response to faces than to objects (p 21 < 0.01, uncorrected). We were able to identify right pFFA, right anterior FFA (right 22 aFFA), right OFA, and left FFA in all six participants. The left OFA were successfully 23 identified in five out of six participants. In each ROI, to remove the vein signal in the 1 functional data, voxels of which signal changes to face stimuli were larger than 4% were 2 excluded in further analysis. 3 For the main experimental data, to remove the general fMRI response pattern 4 shared among different face parts, response patterns from whole faces or everyday 5 objects were regressed out from response patterns of each individual face part. Whole 6 face or object response in each voxel was used to predict the individual part response 7 with linear regression algorithm, and the residuals across voxels were considered as the 8 individual part response pattern with general response pattern removed. To extract the 9 trend of the fMRI response pattern along anterior-posterior dimension in the FFA, we 10 first drew a line along the mid-fusiform sulcus on the cortical surface of each participant. 11 For all vertices within the FFA ROI, we calculated their shortest (orthogonal) distances 12 to the line, and projected the neural response of all voxels in the FFA ROI to the line 13 along the mid-fusiform sulcus, and obtained the averaged response on each point along 14 the line to get the response profiles (see Figure 3A). Similar analysis was done for OFA 15 with the line drawn along the inferior occipital sulcus. 16 For the control experiments, same data processing steps as in the main 17 experiment were applied to extract the spatial patterns of different conditions. For the 18 pRF data, fMRI respond time course of each voxel was fit with compressive spatial 19 summation (CSS) model (http://kendrickkay.net/analyzePRF/). To determine the center 20 location (x, y) of each voxel's population receptive field, CSS used an isotropic 2D 21 Gaussian and a static power-low nonlinearity to model the fMRI response. In each 22 voxel, model fitness can be quantified as the coefficient of determination between model 1 and data (R 2 ). We only included the pRF results of voxels with R 2 higher than 2%.