Seeing things differently: Gaze shapes neural signal during mentalizing according to emotional awareness

Studies on social cognition often use complex visual stimuli to asses neural processes attributed to abilities like "mentalizing" or "Theory of Mind" (ToM). During the processing of these stimuli, eye gaze, however, shapes neural signal patterns. Individual differences in neural operations on social cognition may therefore be obscured if individuals' gaze behavior differs systematically. These obstacles can be overcome by the combined analysis of neural signal and natural viewing behavior. Here, we combined functional magnetic resonance imaging (fMRI) with eye-tracking to examine effects of unconstrained gaze on neural ToM processes in healthy individuals with differing levels of emotional awareness, i.e. alexithymia. First, as previously described for emotional tasks, people with higher alexithymia levels look less at eyes in both ToM and task-free viewing contexts. Further, we find that neural ToM processes are not affected by individual differences in alexithymia per se. Instead, depending on alexithymia levels, gaze on critical stimulus aspects reversely shapes the signal in medial prefrontal cortex (MPFC) and anterior temporoparietal junction (TPJ) as distinct nodes of the ToM system. These results emphasize that natural selective attention affects fMRI patterns well beyond the visual system. Our study implies that, whenever using a task with multiple degrees of freedom in scan paths, ignoring the latter might obscure important conclusions.


Introduction
Most people form impressions about their emotional condition or the affective state of others with ease. Some people, however, have difficulties with the perception of both their own feelings and those of their counterparts ( Bagby and Taylor, 1997 ). For the description of these difficulties in emotional awareness Sifneos (1973) introduced the term "alexithymia " which literally means "no words for feelings ". Alexithymia refers to a personality construct composed of the factors difficulty identifying feelings (DIF), difficulty describing feelings (DDF) and externally oriented thinking (EOT) ( Nemiah et al., 1976 ). It is sometimes treated as a categorical concept, implying that alexithymia is either present or absent, and sometimes approached as a continuous variable with a linearly increasing manifestation. Considering alexithymia as a categorical concept, its prevalence has been reported as approximately 10% in their behavior. Whereas empathy and compassion imply embodiment of the other's state or emotion, ToM refers to the formation of an idea or knowledge about it ( Kanske et al., 2015 ). The core neural network of ToM processing characteristically includes the posterior temporal sulcus (pSTS), the temporoparietal juction (TPJ) and the medial prefrontal cortex (MPFC) ( Carrington and Bailey, 2009 ;Frith and Frith, 2003 ). Although a majority of previous studies suggest that people with high levels of alextihymia, in particular those emphasizing an external attribution style (i.e., with high load on EOT), have difficulties with ToM, results have been mixed ( Demers and Koven, 2015 ;Luminet et al., 2011 ;Lyvers et al., 2017 ;Moriguchi et al., 2006 ;Neumann et al., 2014 ;Raimo et al., 2017 ;Redondo and Herrero-Fernández, 2018 ;Swart et al., 2009 ; but see, e.g., Chalah et al., 2017 ;Lane et al., 2015 ;Milosavljevic et al., 2016 ;Winter et al., 2017 , who found no relationship between alexithymia and ToM skills). Several reasons for these discrepancies are discussed. For instance, conceptualization of numerous ToM tasks is criticized ( Schaafsma et al., 2015 ). Performance on the "Reading the Mind in the Eyes Test " (RMET) ( Baron-Cohen et al., 2001a ), a frequently used task in the literature, is for example suggested to depend on verbal skills since the vocabulary that participants are asked to assign to presented pairs of eyes is predetermined ( Betz et al., 2019 ;Peterson and Miller, 2012 ). In general, the current definition of ToM is broad and distinct ToM tasks were shown to lack convergent validity ( Schaafsma et al., 2015 ). Hence, one way to approach the relation between ToM abilities and alexithymia may be the decomposition of the particular task into its psychologically relevant constituents (e.g., into single features of complex stimuli, including their sensory dimension or response category) in order to reveal the level where difficulties occur.
How do typical individuals derive their inferences about intentions and feelings of others and what goes wrong in high alexithymics? Critical information about the mental state of others is, e.g., provided by goal directed movement, or by the eye region ( Baron-Cohen, 1997 ;Frith, 2001 ). Hence, several ToM tasks resort to showing agents interacting in a dynamic manner or make use of pictures from faces or eyes (e.g., Abell et al., 2000 ;Baron-Cohen et al., 2001a ;Kanske et al., 2015 ;Shamay-Tsoory and Aharon-Peretz, 2007 ). In this regard, the eyes seem to play a particular role for the development and application of affective ToM, i.e., for the interpretation of emotional states ( Baron-Cohen and Cross, 1992 ;Shamay-Tsoory and Aharon-Peretz, 2007 ). Critically, healthy people with high levels of alexithymia were recently shown to spend less time looking at the eyes of presented face pictures ( Fujiwara, 2018 ). Additionally, in a group of people with autism spectrum disorder (ASD) -a disorder which is characterized by impairments in ToM and which is associated with high levels of alexithymia -it was alexithymia that predicted reduced dwell time on eyes and not severity of autistic symptoms ( Bird et al., 2011 ). However, aberrant gaze behavior of high alexithymics seems not restricted to a context in which eyes are emphasized. It was also found when high alexithymics were confronted with pictures with different emotional content ( Wiebe et al., 2017 ).
Most importantly, eye gaze shapes neural signal not only in visual or sensory cortex but also in multimodal higher-order regions. Only recently, Hadjikhani and colleagues ( Hadjikhani et al., 2017b ) demonstrated an impact of gaze constraints on the signal in socially relevant brain areas of healthy individuals, including the core ToM network. Thus, observed signal differences between individuals with different gaze behavior may simply be the result of variations in scan paths, i.e., distinct allocation of visual attention. However, when constraining gaze instead of tracking natural viewing, important conclusions might be obscured. For one, neural signal related to natural gaze behavior might differ from that revealed during constrained viewing. Further, tracking neural responses during natural viewing not only allows to control for signal differences that are related to different scan paths. It also enables the examination of signal differences that emerge when looking at the same things, i.e., differences in neural operations when attending the same stimulus features. For example, computations in ToM re-gions might differ between individuals with high and low alexithymia levels because they gaze at different stimulus aspects ( Hadjikhani et al., 2017b ), but also although they look at the same characteristics of the stimulus. Importantly, both differences in neural signal related to gaze at different stimulus aspects and differences in neural signal related to gaze at the same stimulus aspects can be modeled within the scope of a combined analysis of eye-tracking and functional magnetic resonance imaging (fMRI). 1 To date, one fMRI study has investigated neural ToM processing in alexithymia ( Moriguchi et al., 2006 ). In high alexithymics the authors found reduced activation of the right MPFC, a core region of the ToM network. However, this study did not control for eye gaze. Thus, we cannot be sure that the results are actually due to alexithymia and not just explained by participants' attention to different task aspects.
By combining eye-tracking and fMRI acquisition, about 15 years ago Dalton and colleagues ( Dalton et al., 2005 ) already showed that eye gaze was associated with different amygdala signal in ASD. To our knowledge, however, only a handful of fMRI studies on social cognition picked up on this by deploying similar techniques ( Gamer et al., 2010 ;Gamer and Büchel, 2009 ;Jiang et al., 2016Jiang et al., , 2020Kliemann et al., 2012 ). In contrast, most imaging studies from the domain either restricted or guided eye fixation by instruction or task ( Hadjikhani et al., 2017a( Hadjikhani et al., , 2017bLassalle et al., 2017 ;Morris et al., 2008Morris et al., , 2007Perlman et al., 2011 ;Spilka et al., 2019 ;Zürcher et al., 2013 ), tracked gaze behavior isolated from neural signal ( Tottenham et al., 2014 ), refrained from combining the analysis of the two ( Kujala et al., 2012 ), or even ignored potential bias through gaze. To this effect, thus, the control of gaze is of particular importance for the neural mapping of disorders in which aberrant gaze behavior is already known, including, for example, ASD, depression, social anxiety or schizophrenia ( Toh et al., 2011 ). Moreover, evidence is accumulating that gaze behavior of healthy individuals is neither uniform nor exclusively shaped by the salience of stimulus characteristics but differs with respect to experience, personality, preferences or skills, such as curiosity or even alexithymia ( de Haas et al., 2019 ;Gottlieb and Oudeyer, 2018 ;Henderson, 2017 ;Risko et al., 2012 ;Schomaker et al., 2017 ).
In the present study, we investigated differences in ToM processing associated with natural viewing behavior in alexithymia. Since high alexithymics' gaze behavior differs from that of typical individuals, we wanted to rule out the possibility of erroneously associating neural signal changes with alexithymia that actually relate to differences in visual attention. Furthermore, we were interested in neural operations associated with unrestricted eye gaze on specific features of the ToM stimuli. Therefore, we combined the analysis of simultaneously recorded eye-tracking and fMRI data. We applied the RMET ( Baron-Cohen et al., 2001a ), a common task to assess affective ToM. In this task, participants are asked to select that word, out of four, which describes best the mental state depicted by a photo of a person's eye region ( Fig. 1 ). Critically, the design of the task, i.e., the arrangement of the task features, allowed us to discriminate between fixations on the eyes from those on words ( Fig. 2 ). First, we analyzed performance on the task. We hypothesized that high alexithymics, or those with a high score on EOT, perform worse than people with lower alexithymia, or EOT, levels (e.g., Demers and Koven, 2015 ;Raimo et al., 2017 ). Second, we examined whether high alexithymics also deploy reduced attention to eyes in the context of the RMET, i.e., during a mentalizing task in which only pairs of eyes instead of an entire face is presented. We hypothesized that people with high alexithymia levels look less at eyes ( Fujiwara, 2018 ). Third, by controlling for the effects of eye gaze we evaluated whether the neural signal in core ToM regions (MPFC, anterior TPJ, posterior TPJ/pSTS) is affected by alexithymia per se ( Koster-Hale and Saxe, 2013 ;Moriguchi et al., 2006 ). Furthermore, we investigated whether high Fig. 1. Study Design. Experimental and control stimuli lasted for 10 s and alternated with a 5 s fixation cross. Responses were made during stimulus presentation or during the succeeding fixation period, using four buttons of the response box. Position of the correct answer was counterbalanced within and between conditions. ToM stimulus: 1. despondent [ "niedergeschlagen "] (correct answer), 2. relieved [ "befreit "], 3. shy [ "schüchtern "], 4. excited [ "aufgeregt "]; control stimulus: 1. plucked eyebrows [ "gezupfte Augenbrauen "] (correct answer), 2. bright strands [ "helle Strähnen "], 3. crooked bulbous nose [ "schiefe Knollennase "], 4. tattooed forehead [ "tätowierte Stirn "].

Fig. 2.
Overview of experimental setup, parametric modeling and basic results. During stimulus presentation, eye fixations (magenta and green circles on the stimuli) and fMRI signal were tracked. Regressors of interest for the ToM condition are exemplified by the graphs at the bottom right: the time from stimulus onset ( "on ") until button press (arrow) was modeled by a boxcar function (blue dotted line) convolved with the hrf (blue solid line) representing mean BOLD response during the ToM condition. Cumulated fixations on the eye region (magenta) and on words (green) were entered to the model as orthogonalized parametric modulators. The same regressors were entered for the control condition. Fixations on neither eyes nor words (red circles on the right stimulus) were not modeled. Additional regressors of no interest covered the time from button press (arrow) until stimulus offset ( "off"). This time period was again modeled by a boxcar function (yellow dotted line) convolved with the hrf (not shown) and two parametric modulators (fixations on eyes and words) separately for ToM and control condition (not shown). ROI analysis demonstrated reverse relations of neural operations on ToM vocabulary for high (HA, light blue) and low (LA, purple) alexithymics in the MPFC (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.). alexithymics engaged in ToM "see things differently ", i.e., whether they compute certain aspects of the stimulus, like eyes, in a different way. We hypothesized that alexithymia level is inversely related to neural changes in the MPFC during ToM ( Moriguchi et al., 2006 ). Because previous studies sometimes treat alexithymia as a categorical variable, using accepted cut-off scores, and sometimes as a continuous variable (e.g., Bird et al., 2010 ;FeldmanHall et al., 2013 ;Moriguchi et al., 2006 ), we carried out analyzes for both approaches -finding only non-essential differences, mostly due to effects at trend level not reaching significance, however. Importantly, in addition to the exclusion of participants loading high on related dimensions as depression, anxiety and autistic traits, we evaluated the impact of these and other covariates (verbal intelligence, age, gender) on dependent measures using LASSO regressions.

Participants
We analyzed fMRI and eye-tracking data from 32 healthy participants (see Table S1 for participant characteristics). They were selected from a large sample of individuals ( n = 550, 8.36% high alexithymics (HA), 16.73% intermediate alexithymics (IA), 74.91% low alexithymics (LA)) that was previously screened for alexithymia using the German 20item version of the Toronto Alexithymia Scale (TAS, Bach et al., 1996 ;Bagby et al., 1994 ). An extensive description of the selection procedure can be found in the Supplementary Information (SI Participants). See also Fig. S1 for a consort diagram of participant recruitment and inclusion. Beside treating alexithymia as a continuous variable and analyzing the 32 participants as one sample, we also divided them according to accepted cut-off scores (LA < 52, IA > 51 & < 61, HA > 60) into three groups (HA: n = 10, 5 female, mean age of 27.9 ± 10.1 y; IA: n = 10, 5 female, mean age of 28.5 ± 11.2 y; LA: n = 12, 7 female, mean age of 24.6 ± 5.3 y). The participants completed a comprehensive neuropsychological test battery assessing depression and anxiety (Hospital Anxiety and Depression Score, HADS-a, HADS-d, Zigmond andSnaith, 1983 , andBecks Depression Inventory, BDI, Beck, 1961 ), autistic traits (short version of the German Autism Spectrum Questionnaire, ASQ-s, Baron-Cohen et al., 2001b ;Freitag et al., 2007 ) and verbal IQ and speech comprehension (WST, Schmidt and Metzler, 1992 ). Groups did not differ significantly with regard to sex, age or verbal IQ and speech comprehension (all p -values > 0.05), but with respect to EOT and BDI scores ( p < 0.05) (see Table S1 and Fig. S2). Participants with a history of psychiatric disease or exceeding cut-off values of the questionnaires were excluded from the analyzes (see SI Participants). Furthermore, we used the test scores from the remaining participants as covariates in LASSO regressions in order to assess their predictive value for dependent measures ( Tibshirani, 1996 ). All participants were right-handed. Their first language was German. They gave written informed consent before the study. The study protocol was approved by the local ethics committee.

Analysis strategy regarding the alexithymia concept
In the present study, we assessed the association between alexithymia and ToM processing using several dependent variables of interest (RMET performance, eye fixation, fMRI signal; see Table 1 for an overview). We applied two latent variable concepts of alexithymia, one based on a categorical perspective and the second modeling alexithymia as a continuous latent factor. For statistical analyzes, we used Matlab R2016b and R version 4.0.3. Following the categorical approach, we compared HA, IA and LA by using analysis-of-variance (ANOVA) models. Main and interaction effects were tested at = 0.05. Following the continuous approach, we calculated correlation coefficients of the dependent variable with TAS or EOT scores. Whenever the literature allowed for directional hypotheses regarding the relation between alexithymia and dependent measures, we conducted one-tailed significance tests (see Table 1 for details). Effect sizes were calculated with R or, for MR images, with PowerMap ( Joyce and Hayasaka, 2012 ) and are reported as generalized eta squared ( 2 ), standardized mean difference, i.e. Cohen's delta (d), or Cohen's f-squared (f 2 ). Additionally, for both the categorical and the continuous perspective we formally evaluated the contribution of covariates (ASQ-s, BDI, HADS-a, HADSd, sex, age) using LASSO regressions ( Tibshirani, 1996 ). The LASSO regression is an alternative to other model feature selection methods (e.g., stepwise regression) which has proven useful for small datasets with high-dimensional predictors ( Finch and Finch, 2016 ;Hastie et al., 2009 ). Particularly when the ratio of sample size and number of predictors is small, in standard Ordinary Least Squares (OLS), or stepwise regression, r 2 and coefficients may be augmented while standard errors and p -values may be may be diminished, thus rendering these methods prone to overfitting. In such cases, the use of LASSO regression is endorsed ( McNeish, 2015 ). The method provides a reasonable balance between evading false positives and disclosing true positives even when predictors are highly correlated ( Wang et al., 2020 ). LASSO regression is an extension to the Ordinary Least Squares (OLS) approach that adds a penalty term to the residual sum of squares in which the coefficients are combined with a regularization or shrinkage parameter . As increases, the coefficients shrink toward zero, thereby effectively removing the least associated variables from the model. In order to find the optimal , i.e., the final set of predictors, k -fold cross-validation is used ( k = 5 in the present study). Since the folds of the cross-validation are selected at random, which may introduce some variation in selected predictors by repetition, we ran the cross-validation 100 times and averaged the error curves. We report those predictors that were chosen by the most accurate model (i.e., the model with minimal cross-validation error) or the sparse model (i.e., the model with a cross-validation error one standard error (1SE) above the minimum). The objective of the sparse model is the selection of the most parsimonious solution whose accuracy is commensurate with the most accurate model. In comparison with the accurate model, the sparse model may be less prone to overfitting ( Krstajic et al., 2014 ). In addition to the above-listed covariates and EOT scores, we introduced either a categorical variable that represented the alexithymia group (LA, IA, HA) or TAS scores as predictors of LASSO models.

Experimental design and performance analysis
Participants underwent fMRI scans while completing a face localizer task (modified version of Frässle et al., 2016 ) and a German version of the RMET ( Fig. 1 ). The face localizer task was administered for a separate research question. In the following, we only describe RMET results. During the RMET, participants were asked to choose that word which, in their opinion, fits best to the expression shown by a photo of a person's eye region. We introduced a control task showing the same photos but offering words that refer to physical aspects of upper face parts (e.g., tinted eyelashes, small lower crease, tightened eyelids, dark eyes). We used these physical descriptions instead of, e.g., asking for the sex of the person in the photo, and repeated none of those descriptions in order to increase complexity and relative difficulty of the control task compared with the ToM task (see SI Stimulus Design and Presentation for more details on stimulus creation). 36 ToM trials alternated with 36 control trials separated by a fixation cross ( Fig. 1 ). Trial order was pseudo-randomized. After half of the trials a 30 s break was introduced. The MR scanner was not stopped during this break. Scanning time of the RMET was approximately 20 min (see Fig. 1 for more details on the design).
To test for group differences in RMET performance (i.e., the number of errors) when treating alexithymia as a categorical latent variable, we used a mixed ANOVA with alexithymia group (LA, IA, HA) as between-subject factor and experimental condition (ToM, control) as within-subject factor. In order to also account for a continuous concept of alexithymia, we calculated correlations between the number of errors in the ToM condition and TAS and EOT scores. Since verbal Non-parametric (Spearman) correlations ( = 0.05, Bonferroni corrected) ‡ The literature allowed for directional hypotheses regarding RMET performance (the higher the alexithymia level, the lower the performance on the RMET, e.g., Demers and Koven, 2015 ;Lyvers et al., 2017 ;Raimo et al., 2017 ), dwell time on the eye region (the higher the alexithymia level, the shorter the fixation on eyes, Fujiwara, 2018 ) and MPFC responses during ToM (the higher the alexithymia level, the lower MPFC signal, Moriguchi et al., 2006 ). Therefore, we applied one-tailed significance tests for analyzes of these cases. However, IA has not been included in group comparisons on these issues yet. That is why we ran two-tailed significance tests for all categorical analyzes including IA.
IQ was shown to be related to RMET performance (e.g., Peterson and Miller, 2012 ), we also included WST scores as explanatory variable in LASSO regressions with error rate as dependent factor and underlying Poisson distribution.

Eye-tracking data acquisition and analysis
Eye movements were recorded monocularly at 500 Hz with an MRcompatible EyeLink 1000 (SR Research Ltd., Ontario, Canada). Preprocessing of the eye-tracking data was accomplished with Matlab R2016b (see SI Eye-tracking data analysis). For each subject and each condition (ToM and control) we defined rectangles around the words and the eye region as areas of interest (Fig. S3). Dwell times within word or eye regions were identified and cumulated over each trial. Most of the time, participants looked at the eyes or the words. Those times when the participants were looking at neither the eyes nor the words were not included into analyzes (Figs. S4, S5).
We tested for group differences in dwell time using two time windows: from stimulus onset until decision making and from the time of decision making until stimulus offset. The latter accounts for differences of attentional foci independent from instructions, i.e., viewing preferences after the task has been completed. For analyzes of the time window after decision making, ToM and control condition were collapsed, because task-free viewing starts after subjects' response in both experimental conditions (although it cannot be ruled out that previous instructions might still exert some influence on gaze behavior after decisionbut see, e.g., Fig. S5 for an illustration of differences in visual selective attention before and after decision during the ToM condition). We calculated trial-wise ratios of cumulated fixations per time window (FIX/t). The ratios were forwarded to statistical testing.
To test for group differences in gaze behavior (i.e., dwell times), we set up mixed ANOVAs with alexithymia group (LA, IA, HA) as betweensubject factor and experimental condition (ToM, control) and fixation location (eye region, words) as within-subject factors. Although our hypotheses only accounted for differences in dwell time on the eye region, we included dwell time on words in an exploratory manner (note that the time participants looked neither at eyes nor words were not included in our models). We applied Bonferroni correction for multiple comparisons of dwell time on the eyes and on words at a respective overall level of 0.05. We further calculated correlations between dwell time on the eye region in the ToM condition and TAS and EOT scores. We used LASSO regression for linear models to test for critical contributions of covariates to the prediction of dwell times.
Note that the analyzes of eye-tracking data on the behavioral level evaluate differences in gaze behavior according to the level of alexithymia. Effects therefore relate to differences between participants because they adopt different scan paths. Instead, MRI data analyzes examine effects while participants deploy their attention to the same stimulus features. As such MRI models probe neural differences revealing although participants look at the same stimulus aspects.

MRI data acquisition and analysis
We collected the MRI data on a 3-Tesla Tim Trio MR scanner (Siemens Medical Systems) with a 12 channel head coil. Functional images were assessed continuously using a T2 * -weighted echo planar imaging (EPI) sequence sensitive to blood oxygen level dependent (BOLD) contrast (64 × 64 matrix, 36 slices, in plane resolution 3.3 mm, gap 15%, slice thickness 3 mm, TR = 1.81 s, TE = 30 ms, flip angle 90°, parallel imaging GRAPPA with an acceleration factor of 2). Slices covered the whole brain and were positioned transaxially parallel to the anterior-posterior commissural line (AC-PC). We discarded the initial 26 scans from analysis which assessed the BOLD signal of an exercise example including a ToM and a control trial. For each subject, we additionally acquired a T1-weighted anatomical image (for details see SI MRI data acquisition and analysis).
Analyzes of functional imaging data were performed using Matlab 2008b and standard routines of SPM12 version 6685 (Wellcome Department of Cognitive Neurology, London, UK). The functional images were realigned, normalized (resulting voxel size 2 × 2 × 2 mm 3 ), smoothed (6 mm isotropic Gaussian filter) and high-pass filtered (cut off period 128 s).
For each participant, fixed-effect analyzes were set up as two general linear models (GLM), each containing six realignment parameters and a constant as regressors of no interest (see Fig. S11 for plots of realignment parameters over participants, illustrating their head movements during the task). All models included separate boxcar functions for ToM and control conditions convolved with the canonical hemodynamic response function (hrf). The duration of each trial during these conditions was modeled from stimulus onset until the participant's response. Because in most of the trials the response preceded stimulus offset we included two additional boxcar regressors for ToM and control stimuli (convolved with the hrf) modeling the time from decision making until stimulus offset. In order to account for between-trial variance as a result of visual attention to different stimulus aspects, we appended two first order parametric modulators (PMs) to each of the four boxcar regressors: per trial dwell time on words (PMw) and on the eye region (PMe) ( Fig. 2 ). Since we used the serial orthogonalization procedure employed as default in SPM12 when entering PMs, we set up two GLMs per participant: In this way, we ensured that variance due to the second PM ( either fixations on the eye region or fixations on words) was fit independently from variance due to the first PM ( either fixations on the eye region or fixations on words) and variance due to average trial effects. Only parameter estimates of the second PM of the respective model were used for group analyzes.
We set up three one-way ANOVA models and three correlation analyzes to evaluate between-subject effects. First, we compared the parameter estimates of the individual "ToM -control " contrasts which allowed us to test for group differences in mean signal during the ToM condition. Then, we looked for group differences in neural signal during the ToM task that varied either with dwell time on the eye region, using the contrast ( "PMe ToM -PMe control "), or with dwell time on ToM vocabulary, using the contrast ( "PMw ToM -PMw control ").
In order to avoid type-II errors, we conducted hypothesis-driven family-wise error (FWE) corrected ROI-analyzes in core regions of the ToM network: MPFC, pSTS or posterior TPJ (TPJp), and anterior TPJ (TPJa) ( Koster-Hale and Saxe, 2013 ; Moriguchi et al., 2006 ). ROIs were defined by creating a sphere with 14 mm radius on center MNI coordinates taken from the literature ( Moriguchi et al., 2006 : MPFC [13, 68, 22], Mars et al., 2012 : TPJp [54, − 55, 26] and TPJa [58, − 37, 20]; see SI MRI data acquisition and analysis for more details). To control for multiple comparisons, we applied Bonferroni correction that maintained an overall level of 0.05 for each ROI. Since we sought to capture neural processes associated with mentalizing instead of task difficulty, word length or other confounding factors, we restricted our analyzes to these core ToM regions. For results of explorative ANOVAs adopting an uncorrected threshold see Tables S2-S4. To test for a relation of alexithymia and ToM processing when treating alexithymia as a continuous variable, we assessed contrast estimates (c.e.) of each regressor of interest from the individual subject models using MarsBaR software ( http://marsbar.sourceforge.net/ ). Importantly, mean c.e. were calculated by averaging betas over all voxels included in the particular ROI ( "mean c.e. "). We then assessed the contrast of the ToM and control condition ( "mean c.e. ToM -mean c.e. control ") and correlated it with both TAS and EOT scores. We also performed LASSO regressions using the contrast estimates summarized over the particular ROI ( "mean c.e. ToM -mean c.e. control ") as dependent variable.
Note that the analyzes of MRI data evaluates differences in neural signal revealing while participants pay attention to the same stimulus aspects, i.e. models look at between-subject differences that occur although subjects see the same things. At the same time, by including dwell times in those models, they are controlled for effects associated with different visual foci, thereby ruling out the possibility that neural differences result because participants' scan paths differ.
LASSO regressions (accurate models) selected sex (dummy coded: males as '1', females as '2'), age, HADS-a and HADS-d but not verbal IQ, alexithymia group or TAS scores as important predictors of error rate ( Fig. 3 a illustrates the spatial distribution of fixations for one exemplary ToM trial. Categorical analysis of the time from stimulus onset until response revealed a three-way interaction between alexithymia group, experimental condition and fixation location ( F (2, 29) = 5.79, p = 0.008, 2 = 0.02). A mixed two-way ANOVA showed an interaction between group and fixation location only for the ToM condition ( F (2, 29) = 3.57, p = 0.041, 2 = 0.16). Post-hoc T -tests suggested that HA paid less visual attention to eyes than LA ( T (20) = 2.55, p = 0.009, d = 1.09) ( Fig. 3 b). With regard to post-hoc T -tests on word fixation, a trend revealed that HA looked longer on words than LA ( T (20) = − 2.28, p = 0.034, d = 0.97), which did not survive Bonferroni correction ( Fig. 3 b). Additionally, we found a significant negative correlation of dwell time on eyes during the ToM task with TAS scores ( = − 0.38, p = 0.016) ( Fig. 3 d) and -after Bonferroni correction -a trend correlation with EOT scores ( = − 0.34, p = 0.030).

Gaze behavior
For the categorical approach, LASSO regression (accurate model) on fixation time on the eye region during the ToM condition resulted in the selection of alexithymia group (r 2 = 0.08, = 0.02, MSE = 0.008, = − 0.011) as only predictor. The respective sparse model and LASSO regressions treating alexithymia as a continuous latent variable resulted in the selection of any predictor.
The two-way ANOVA with gaze after decision making (representing gaze behavior in a task-free viewing context) disclosed an interaction between the level of alexithymia and fixation location ( F (2, 29) = 4.27, p = 0.024, 2 = 0.15). Post-hoc T -tests indicated that LA dwelled more on the eye region than HA ( T (20) = 2.40, p = 0.013, d = 1.03) ( Fig. 3 c). No differences concerning fixations on words after decision making revealed.
Regarding the continuous approach, we found trend correlations of fixations on the eye region after decision with TAS scores ( = − 0.28, p = 0.064) and with EOT scores ( = − 0.28, p = 0.077).
For the time after decision making, LASSO regressions (most accurate models) resulted in the choice of alexithymia group or TAS scores, respectively, as the only important predictor of dwell time on the eye region (categorical approach, Fig. 3 e: r 2 = 0.14, = 0.03, MSE = 0.017, = − 0.027; continuous approach: r 2 = 0.05, = 0.04, MSE = 0.018, = − 4.937e-4). Furthermore, LASSO regression following the continuous approach using the sparse model also chose TAS scores as only predictor of eye fixation in a task-free viewing context ( r 2 = 0.03, = 0.04, MSE = 0.018, = 1.152e-18), but note that model fit and estimated coefficient are rather small.

FMRI data
See Figs. S8-S10 and Tables S5-S9 for brain regions involved in the task of the present study in dependence on visual attention. When controlling for effects of visual attention, neither categorical nor continuous analyzes revealed any association between alexithymia and mean signal during the ToM task in any ROI. Instead, we found alexithymia groups to differ in the effect of gaze on the eye region during the ToM task ( "PMe ToM -PMe control ") on signal in the right TPJa ROI (xyz = 54, − 24, 24, F (2, 29) = 18.70, p = 0.006, f 2 = 0.10) ( Fig. 4 a). This means that neural operations in the TPJa differ between alexithymia groups although they look at the same feature, i.e., the eye region (and not because they look at different aspects). Post-hoc T -tests showed that this effect was driven by a reverse association of visual attention to eyes and TPJa activity in IA compared to HA ( T (18) = 6.16, p = 0.005, d = 0.37) ( Fig. 4 b). A similar trend was observed for LA compared to HA ( T (20) = 4.18, p = 0.099, d = 0.25). Fig. 4 b shows that in HA the signal in the TPJa decreased with longer dwell time on eyes during ToM trials whereas in LA and IA it decreased with longer dwell time on eyes during control trials.
Continuous analyzes using PMe contrast estimates averaged over the respective ROI revealed no significant effect in any ROI ( Fig. 4 c). This apparent discrepancy to the results of the categorical analysis is presumably due to only a small anterior portion of the TPJa ROI showing the significant effect in the ANOVA ( Fig. 4 d).
LASSO regressions on PMes averaged over the total TPJa ROI likewise resulted in the selection of any predictor.
In addition, we found alexithymia groups to differ in the effect of gaze on words that were substantial for ToM decisions ( "PMw ToM -PMw control ") in the MPFC ROI (xyz = 20 56 26, F (2, 29) = 13.62, p = 0.016, f 2 = 0.35) ( Fig. 5 a). This means that neural processing of the MPFC differs between alexithymia groups although they pay attention to the same stimulus features, i.e., ToM vocabulary (and not because they look at different aspects). Post-hoc T -tests showed that this effect was due to a reverse association between MPFC activity and visual attention to ToM-related words in LA compared to HA ( T (20) = 5.12, p = 0.006, d = 1.02) ( Fig. 5 b). Furthermore, correlation analyzes revealed a significant association between PMw contrast estimates averaged over the MPFC ROI and TAS scores ( = − 0.48, p = 0.003) ( Fig. 5 c) and -after Bonferroni correction -a marginally significant association with EOT scores ( = − 0.36, p = 0.044).

Discussion
In the current study, we investigated mentalizing mechanisms in healthy individuals who differ in their ability to identify and describe emotions, i.e. with different levels of alexithymia. Since recently it was demonstrated that healthy people with high levels of alexithymia show atypical gaze behavior, we combined functional neuroimaging with eyetracking. By this means, we, on the one hand, controlled for neural responses related to their natural viewing behavior and therefore assured that differences in neural signal were not simply the result of different scan paths. On the other hand, we were able to explore whether neural responses of core ToM regions differed according to the level of alexithymia while participants freely looked at identical stimulus features. Our results suggest that alexithymia per se does not modulate the signal in core ToM regions, as had been proposed before. Instead, ToM processing was shaped by an interaction between alexithymia and eye gaze: the longer high alexithymics focused on the same critical stimulus features, the more neural signal in core ToM regions decreased. Since we observed reverse relations for people with lower levels of alexithymia, we conclude that high alexithymics who are enganged in a ToM task "see things differently ", i.e., some of their core ToM regions operate on the identical stimulus aspects in the opposite way. These results suggest a noteworthy influence of gaze behavior on neural patterns of social cognition. In the following, our findings will be discussed in detail.
Contrary to expectations, we found no evidence for an influence of alexithymia on RMET performance accuracy. Previously, multiple studies had shown an association of poor ToM skills and alexithymia, particularly with regard to affective ToM (e.g., Demers and Koven, 2015 ;Lyvers et al., 2017Lyvers et al., , 2019Moriguchi et al., 2006 ;Redondo and Herrero-Fernández, 2018 ). However, others did not find any relation ( Chalah et al., 2017 ;Lane et al., 2015 ;Milosavljevic et al., 2016 ;Winter et al., 2017 ). Thus, it is to question why the behavioral results differ to such an extent. One reason for this might be that the various tasks used to measure ToM were repeatedly shown to lack convergent validity (e.g., Ahmed and Stephen Miller, 2011 ;Clemmensen et al., 2016 ). The definition of ToM is broad and allows ToM tasks to vary in many neuropsychologically relevant dimensions as, for example, conceptualization (e.g., judging emotions vs. false-beliefs vs. anthropomorphization), stimulus complexity (e.g., photos vs. stories in words vs. dynamic geometric figures), or response category (e.g., multiple choice ratings vs. subjective associations) ( Schaafsma et al., 2015 ). Additionally, those different operationalizations of the ToM concept create ambiguity about the interpretation of ToM tasks. For example, Oakley et al. (2016) observed worse RMET performance in people with high levels of alexithymia compared to those with low levels, while at the same time the Fig. 6. Trace plots of coefficient fit and plots of cross-validated MSE by LASSO regression for the categorical (top row) and continuous (bottom row) concept of alexithymia. Predicted variable was the effect of visual attention to words (PMw) on signal in the MPFC ROI. Left: Y -axis denotes the coefficients, which are estimated by LASSO using a single penalty term (log ( ) on the x -axis). The green dashed line represents the min (accurate model), the blue dashed line illustrates 1SE (sparse model). Alexithymia group, or TAS score, and BDI were chosen as predictors for both the categorical and continuous approach to alexithymia applying the most accurate model. With regard to the sparse model in combination with the continuous approach, TAS score was the last explanatory variable to be shrunk to zero by increasing , rendering it the most important predictor (lower left). However, in this model both model fit and estimated coefficient are rather small. Right: Illustration of the cross-validated error at various levels of penalization. X -axis denotes values of log ( ). The green dashed line represents min (accurate model), the blue dashed line points 1SE (sparse model). Circles indicate cross-validation errors for the accurate model (at min ) or the sparse model (at 1SE ). MSE: mean squared error (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.).
authors found no association of alexithymia levels with performance on another ToM task. However, they proposed that the RMET rather measures emotion recognition abilities than ToM and therefore took this result as support of their assumption that alexithymia is only weakly related to ToM.
Another issue impacting on the accuracy of mental inferenceand also performance on the RMET -is that emotional judgments are strongly influenced by context, which often implies framing by emotional words. For example, when Betz et al. (2019 ) asked healthy participants to freely associate mental states to the eye stimuli of the RMET (instead of having multiple-choice options), participants' accuracy dramatically dropped. The authors therefore called the validity of the RMET to adequately map ToM abilities into question. They suggested that low RMET performance would rather reflect deficits in the application of emotion concepts or vocabulary. In the present study, however, alexithymia groups did not vary with respect to verbal IQ or speech comprehension nor did the participant's level of those verbal skills predict RMET performance.
Given all these ambiguities about the ToM concept and its measures, it is all the more surprising, however, that certain brain regions, namely the MPFC and the TPJ, are robustly and reliably involved in the processing of ToM tasks in general ( Koster-Hale and Saxe, 2013 ). That is, ToM tasks seem to heavily differ in the set of cognitive functions required to complete them but they all converge in the involvement of core ToM regions. Therefore, in order to maximize the probability of mapping neural operations concerned with ToM processing, we confined our analyzes to these core regions. As discussed in more detail below, we show that in people with different levels of alexithymia the computations of those regions are characteristically shaped by the attended stimulus feature. Thus, adding to the literature that suggests a systematic reconsideration of the ToM concept ( Schaafsma et al., 2015 ), our work motivates a more elaborate operationalization of its study by showing how decomposing neural mechanisms associated with different stimulus properties may yield more sophisticated insights into the neural processes summarized under ToM.
In summary, low performance on a ToM task may result from a failure of cognitive processes other than ToM. This might also relate to the ambiguous findings concerning the relationship between ToM abilities and alexithymia. However, in the present study, we did not find behavioral differences according to error rate on the ToM task, but differences regarding a more implicit behavior: natural selective attention.
The eye-tracking data of the present study largely support an association between alexithymia and atypical attention to the eyes. Both categorical and continuous approaches point toward the same direction, although sometimes at a trend level. Furthermore, LASSO regressions corroborated the particular role of alexithymia for the explanation of atypical gaze on eyes, albeit with regard to gaze during the ToM condition only for the categorical approach. An association of alexithymia with reduced attention to the eyes had previously been demonstrated in the context of emotion tasks ( Bird et al., 2011 ;Fujiwara, 2018 ). For example, in the study by Fujiwara (2018) , healthy participants rated mixed emotional expressions. Similar to our study, healthy participants with high alexithymia level, against expectations, performed equally well as typical controls but showed reduced attention to the eye region of the faces. Furthermore, when high alexithymics looked at the eyes, accuracy of emotion ratings dropped. The author interpreted her findings as indication of a general eye avoidance, similar to that described in people with ASD. Her suggestion was strengthened by the report of Bird et al. (2011) who found that alexithymia, and not autism symptom severity, explained the typical lack of eye fixation in an ASD sample. Note that results of the present study do support the role of alexithymia over autistic traits for the explanation of reduced dwell time on eyes since alexithymia was the only predictor chosen by LASSO regressions. One might speculate that the narrowed variance arising from the exclusion of participants with high load on autistic traits might have obscured potential effects, but we had to exclude only one such participant. Nevertheless, the overall level of autistic traits in our sample of healthy participants might be too low to draw conclusions on this topic. Fujiwara (2018) further proposed that the eye avoidance of high alexithymics might prevent a disruption of top-down processing of emotional faces. Interestingly, the TPJ is thought to play an important role for top-down processing during ToM ( Teufel et al., 2010 ). Results of our study suggest that high alexithymics' attention to the eye region was associated with a decrement in anterior TPJ activity during ToM judgments. 2 Thus, while the group of participants as a whole used posterior parts of the TPJ/STS for ToM (Fig. S8), the involvement of the anterior TPJ in people with high alexithymia levels ( Fig. 4 b) might indeed reflect perturbations of downstream processing during judgements about emotional states ( Fujiwara, 2018 ;Teufel et al., 2010 ). Since in people with lower alexithymia levels the anterior part of the TPJ processed physical aspects of the eye region (during the control condition, Fig. 4 b), one might speculate that high alexithymics overrate or misuse those physical aspects when trying to make ToM judgements. They might engage in a more 'technical' or 'analytical' application of ToM, deliberately scanning the eye structure for hints on emotional states rather than intuitively judging the eye region in a more holistic manner. Importantly, high alexithymics looked less at eyes. Nevertheless, they performed as good as low alexithymics on the ToM task of the present study. Along the lines of Fujiwara (2018) , those two behaviors might cohere: High alexithymics' automatic or implicit eye avoidance might prevent the disruption of top-down processing (i.e., an intense engagement of the TPJa) thereby facilitating deliberate response choices and benefiting ToM task completion.
This fits well with the recent dichotomization of the mentalizing system into two components: an acquired explicit one enabling conscious contemplation of others' states which usually requires verbal competencies and an implicit one accounting for rapid automatic responses that may not involve language ( Heyes and Frith, 2014 ). Intriguingly, studies on implicit ToM employ fixation duration as a measure of unconscious mentalizing ( Schneider et al., 2012 ). In this context, interference with TPJ function via brain stimulation was shown to influence first fixations and, thus, implicit ToM ( Filmer et al., 2019 ).
Of course, the task in our study was an explicit ToM task. However, no instructions were given on how to complete the task. We therefore argue that participants' gaze behavior was unintended and thus, implicit. Another support for this notion is that high alexithymics' eye avoidance was also observed in a free viewing context, again indicating automatic behavior. That is, reduced eye fixations became evident independently from any task, i.e. also after ToM and control decisions had been made. Critically, that high alexithymics' eye avoidance was not confined to the ToM condition brings up the question whether the behavior can still refer to implicit ToM. However, some researchers assume that people constantly engage in ToM by default ( Luyten and Fonagy, 2015 ).
Another mechanism that probably assists high alexithymics when engaging in ToM (alongside with not looking too much at eyes) may be the employment of a different neural network, likewise promoting accuracy on ToM tasks. Group analyzes we conducted in an exploratory manner (Tables S2-S4) indeed provide some evidence for a more distributed network associated with ToM processing in alexithymia, but, in our opinion, those exploratory analyzes require greater group sizes to draw valid conclusions.
Since implicit ToM mechanisms seem to be altered in alexithymia, compensatory strategies may involve explicit ToM systems. However, we found that in low alexithymics the MPFC signal increased when they attended to ToM-related words. Hence, although the neural circuit underlying explicit ToM is of current debate ( Heyes and Frith, 2014 ;Van Overwalle and Vandekerckhove, 2013 ), we suggest that the MPFC operates on language-dependent explicit ToM processes. Instead, in high alexithymics MPFC signal decreased with dwell time on ToM vocabulary. Therefore, explicit ToM processing might also be affected by alterations in alexithymia.
To our knowledge, up to now, only one study investigated neural ToM processing in alexithymia ( Moriguchi et al., 2006 ). Moriguchi et al. applied a ToM task in which participants were presented with geometric figures interacting in a social manner. In their study, alexithymia correlated with decreased activity in the same subregion of the right MPFC as in the present study. However, we did not observe such a direct link between alexithymia and the MPFC signal when controlling for eye and word fixation. Instead, we found that whereas in people with lower alexithymia levels MPFC signal increased with eye fixations on words, it decreased with longer fixations in high alexithymics. Most interestingly, while completing the task in Moriguchi et al. (2006) , participants were asked to think about words that best described the social scenes they saw. Hence, in both studies on the neural correlates of ToM processing in alexithymia, MPFC signal ostensibly varied with cognitive operations on ToM vocabulary. Therefore, both freely sampling of and forced-choice decisions on ToM-related words seem to involve closely located neuronal populations of a core ToM region, maybe reverting to very similar mechanisms.
The idea that the MPFC supports representations of mental and affective states of the self and of others is well-accepted (although the distinct role of the MPFC during ToM processing is still controversial) ( Frith and Frith, 2003 ;Kosakowski and Saxe, 2018 ;Schurz and Perner, 2015 ). Some researchers have directly linked the MPFC with operations on words for feelings. For example, Lieberman et al. (2007) proposed that during the labeling of emotions the MPFC mediates the information transfer from ventrolateral frontal cortex to the amygdala. Thus, it may be that the altered function of the MPFC also plays a role for the characteristic inability of people with alexithymia to find expressions for own and other's feelings ( Nemiah et al., 1976 ). However, this proposition remains speculative since we did not check whether high alexithymics' scores on the DDF subscale were high and, more importantly, since the participants of the present study had no problems with the assignment of emotional words to mental states.
Finally, it should be noted that beside alexithymia group, or TAS score, BDI score was selected as explanatory variable of neural changes in the MPFC by LASSO regressions. This is of particular importance because scores on the BDI differed significantly between alexithymia groups (Fig. S2). Furthermore, BDI scores were correlated with TAS scores (Table S1). Since according to LASSO regressions both BDI and TAS score contribute their unique share of variance to explain the MPFC activation pattern, we suggest that future studies investigating MPFC function during ToM may consider the effect of both depressive mood and the level of alexithymia.
In general, our study suggests a critical impact of individual gaze on signal in the social brain. Although it has been proposed more than a decade ago that gaze behavior plays a non-negligible role during social cognition (e.g., Dalton et al., 2005 ;Morris et al., 2007 ), up to now only a minority of fMRI studies from the domain picked up on this topic (e.g., Gamer and Büchel, 2009 ;Hadjikhani et al., 2017aHadjikhani et al., , 2017bKliemann et al., 2012 ;Morris et al., 2008 ;Perlman et al., 2011 ;Pfeiffer et al., 2014 ;Schilbach et al., 2010 ;Wilms et al., 2010 ;Zürcher et al., 2013 ). This may partly be due to the fact that many theories about visual attention assume that the context, i.e. stimulus features, rather than characteristics of the individual shape gaze behavior. However, recent evidence suggests a systematic influence of a person's experiences, skills or traits ( de Haas et al., 2019 ;Gottlieb and Oudeyer, 2018 ;Henderson, 2017 ;Risko et al., 2012 ;Schomaker et al., 2017 ). Hence, we propose that, in order to obtain a better understanding of neural mechanisms of social cognition, controlling for differences in visual attention may be vital. Since experimental manipulation of the scan path precludes inferences about neural processes associated with natural behavior, we suggest the simultaneous assessment and analysis of neural response patterns together with gaze behavior.

Limitations
When combining fMRI with eye-tracking one has to deal with some technical challenges that usually result in the exclusion of participants without eye-tracking data of sufficient quality. Moreover, time resolution of the two methods differs ( Peitek et al., 2018 ). For this reason, we used a simple modeling approach implemented in common fMRI software packages and related cumulated fixations to the fMRI signal. Recently, further approaches have been introduced ( Henderson and Choi, 2015 ;Marsman et al., 2012 ).
Another limitation of the study is the small group sizes when adopting the categorical approach, thus limiting the power of our study. Our results using this approach must therefore be interpreted with due caution. Nevertheless, analyzes applying the continuous approach included all 32 subjects, which is a common sample size of MRI studies. Results of both approaches essentially converge.
Finally, we used self-ratings to assess alexithymia. Although high levels of agreement between self-ratings and observer-based ratings of alexithymia were demonstrated ( Bagby et al., 2020 ), in individual cases self-ratings diverge from ratings of a third party, which might mask effects especially with regard to implicit measures.

Conclusion
We co-examined behavioral and neural ToM mechanisms underlying difficulties with the description and classification of one's own and others' emotions, i.e. alexithymia. We show that in high alexithymic individuals visual attention to salient stimulus features such as the eyes is diminished. Furthermore, core ToM regions are differentially modulated by gaze behavior, leading to the suggestion that in high alexithymics both implicit and explicit ToM mechanisms may be altered and complementary strategies during completion of ToM tasks may be applied. Thus, by combining eye-tracking and fMRI data, we are able to explicitly associate attended stimulus features with simultaneously recorded neural signal, leading to more precise insights into the underlying brain function. In this way, we can improve our understanding of neural ToM mechanisms in alexithymia. Our results more generally imply that whenever trying to investigate brain mechanisms using a task with multiple degrees of freedom in scan paths, ignoring the latter might obscure important conclusions.

Declaration of Competing Interest
None.