A Head View-Invariant Representation of Gaze Direction in Anterior Superior Temporal Sulcus

Summary Humans show a remarkable ability to discriminate others' gaze direction, even though a given direction can be conveyed by many physically dissimilar configurations of different eye positions and head views. For example, eye contact can be signaled by a rightward glance in a left-turned head or by direct gaze in a front-facing head. Such acute gaze discrimination implies considerable perceptual invariance. Previous human research found that superior temporal sulcus (STS) responds preferentially to gaze shifts [1], but the underlying representation that supports such general responsiveness remains poorly understood. Using multivariate pattern analysis (MVPA) of human functional magnetic resonance imaging (fMRI) data, we tested whether STS contains a higher-order, head view-invariant code for gaze direction. The results revealed a finely graded gaze direction code in right anterior STS that was invariant to head view and physical image features. Further analyses revealed similar gaze effects in left anterior STS and precuneus. Our results suggest that anterior STS codes the direction of another's attention regardless of how this information is conveyed and demonstrate how high-level face areas carry out fine-grained, perceptually relevant discrimination through invariance to other face features.

. Supplemental Results Relating to Figure 3 (A) Spatial extent of the right STS anatomical mask, shown overlaid on the sample's mean T1 volume. (B) Independently estimated correlation between the view-invariant gaze direction predictor and response pattern dissimilarities in anterior and posterior right STS regions. Regions of interest (ROI) were defined using a leave-one-set-out procedure. We carried out a group analysis (similar parameters as main analysis) separately for the ROI-defining data in each unique split (4 of 5 sets) of the data to identify response pattern dissimilarities that were explained by view-invariant gaze direction. Responses to each set were estimated in five separate first-level models with 7 discarded volumes (17.43 s) separating each model to ensure independent estimates. Statistical thresholds for ROI definition varied between splits (p < 0.01 to p < 0.05, uncorrected). The only regions that appeared consistently across splits were anterior STS (mean [33.2,10.0,] mm MNI, standard deviation [1.0, 5.1, 1.0]), and posterior STS (mean [46.4,4.0] mm MNI, standard deviation [3.2, 3.5, 4.2]). To better accommodate alignment errors across participants, we identified the participant-specific peak within a 10 mm radius of each group peak using ROIdefining data only. Subsequent tests of the identified ROIs were carried out separately for each split (e.g., ROIs defined using sets 1-4 were tested using set 5). We generated the illustrated response pattern dissimilarities for anterior and posterior STS by first averaging each participant's dissimilarities for each ROI across the 5 independent test splits, and then averaging the resulting ROI dissimilarity matrices across participants. It can be seen that both anterior and posterior STS showed consistent effects of view-invariant gaze direction in the independent test data (p values were defined using a permutation test where the order of the matrices were shuffled without replacement 10000 times [1]). (C) Fine-grained gaze direction codes in right STS. Regions with consistent pattern responses (partial Spearman correlation) across participants (n = 18, p < 0.05 FWE). View-invariant gaze direction responses in anterior and posterior right STS remain when the influence of a qualitative distinction between gaze left/direct/right is removed. (D) Gaze direction discrimination. Median Spearman correlations (bars 1,5-7) and median partial Spearman correlations (bars 2-4) across the participants (+/-95% bootstrap confidence intervals). The participants' gaze discrimination performance was most strongly correlated with the viewinvariant gaze direction predictor. Although performance was also moderately correlated with physical image features and head view, the strength of the relationship between discrimination performance and the view-invariant gaze direction predictor is relatively unaffected by partialling out the influence of these alternative predictors.  Peak MNI coordinates are shown with p values FWE-corrected for regions as indicated by the analysis column.

Stimulus Design and Presentation
We used Poser 6 (Curious Labs Inc. Santa Cruz, CA) to create grey scale face images of two identities, each displaying 25 head-gaze configurations. Each face varied in horizontal head view (5 increments from left 20° to right 20°), horizontal eye position relative to the head (same increments as for head view), and identity (2 faces). The faces were processed in Matlab (The MathWorks, Inc., Natick, MA) to achieve similar luminance histograms, and were cropped to ensure that each face appeared in a similar retinal area. Cropping was achieved with a smooth border, and the resulting face was superimposed on a background texture that varied across conditions and across repetitions of the same face (Figure 1a). The background textures were created by Fourier-scrambling each of the 50 faces separately. The inclusion of the background texture served to reduce the influence of low-level physical differences between the conditions, and to increase the difficulty level of the one-back behavioral task. Stimuli were back-projected onto a screen in the scanner which participants viewed via a tilted mirror. The stimuli extended a 6° visual angle including the background texture, and approximately 3° horizontally by 4° vertically without it. The experiment was controlled using Matlab and the Psychophysics toolbox [2].

Behavioral Performance
Participants carried out a one-back face matching task whilst in the scanner. The task instruction was to respond to any repetition of the same face (same identity and head view/eye position configuration) while ignoring the scrambled backgrounds, which did not repeat. Accuracy was relatively high across the sample (mean 77%, standard error 3%), with low false alarm rates (mean 4.5% of trials, standard error 2.3%), and high sensitivity (mean d' 2.52, standard error 0.13). The large number of different head view/eye position configurations (25) relative to the number of response trials (75 per participant) meant that there was insufficient behavioral data available to model each of the 25 configurations separately. Thus, we pooled the available response trials according to gaze direction, and calculated accuracy scores for each of the 9 gaze directions. Repeated-measures ANOVA revealed no significant accuracy effects of gaze direction (p>0.23), suggesting that attentional or performance differences did not confound our fMRI analysis.

Eye Tracking
All participants' eye movements were monitored in the scanner using an infrared video-based eye tracker (50 Hz acquisition, Sensomotoric Instruments, Germany). Successful calibrations were obtained for 10 participants out of the final sample of 18. The remaining participants were excluded from the eye tracking analysis. On-line visual inspection of the eye tracking monitor suggested that these participants were maintaining their gaze at the fixation cross. Eye tracking data were analysed using custom code developed in Matlab.
To measure stimulus-induced eye movements, we analysed how the horizontal and vertical fixation position shifted between the start and the end of each stimulus presentation. This fixation shift was analysed using ANOVAs for each individual participant. We used a one-way ANOVA where the faces were labelled according to perceived gaze direction. One participant showed an effect of gaze direction on horizontal fixation shifts (F (711,8) =2.37, p = 0.016). This participant was removed from further analyses of the fMRI data. No other horizontal or vertical