Neural signatures of individual variability in context-dependent perception of ambiguous facial expression

How do we incorporate contextual information to infer others’ emotional state? Here we employed a naturalistic context-dependent facial expression estimation task where participants estimated pleasantness levels of others’ ambiguous expression faces when sniﬃng diﬀerent contextual cues (e.g., urine, ﬁsh, water, and rose). Based on their pleasantness rating data, we placed participants on a context-dependency continuum and mapped the individual variability in the context-dependency onto the neural representation using a representational similar- ity analysis. We found that the individual variability in the context-dependency of facial expression estimation correlated with the activity level of the pregenual anterior cingulate cortex (pgACC) and the amygdala and was also decoded by the neural representation of the ventral anterior insula (vAI). A dynamic causal modeling revealed that those with higher context-dependency exhibited a greater degree of the modulation from vAI to the pgACC. These ﬁndings provide novel insights into the neural circuitry associated with the individual variability in context-dependent facial expression estimation and the ﬁrst empirical evidence for individual variability in the predictive accounts of aﬀective states.


Introduction
Theory of mind (ToM) can be defined as the ability to represent and attribute cognitive and affective mental states to self and other, to correctly understand and predict behavior (Abu-Akel & Shamay-Tsoory, 2011). There's still an on-going debate on whether emotion recognition and ToM are governed by separate modules (Adolphs, 2003;Blair, 2005), or they share the common systems (Beer & Ochsner, 2006;Chakrabarti & Baron-Cohen, 2006). A close interplay between the two is required especially when one tries to estimate others' emotional states, which could be a pivotal tool for maintaining successful social relationships in human ( Eisenberg and Miller, 1987 ). Classical theories on emotion perception suggested that there are universal basic emotions accompanied by unique patterns of autonomic physiological responses ( Ekman, 1992 ;Ekman and Cordaro, 2011 ). However, this notion has been challenged by more recent studies reporting that emotion perception can be influenced by various non-facial cues including body expression ( Meeren et al., 2005 ) and other external contextual information such as emotional labels, verbal description, or visual scene, especially when the facial emotional cues are unclear ( Kim et al., munication between the prefrontal cortex, particularly its medial section, and the amygdala serves a crucial role in context-dependent facial emotion recognition. Some studies have investigated the individual differences in the degree of context-dependent facial emotion processing. For example, participants' interpretation of the ambiguous facial expression could be influenced by individuals' approach and inhibition tendency ( Lee et al., 2012 ), suggesting that contextual information could bring individual differences in estimating other's facial information, especially in ambiguous settings. However, little is known about the key neural features associated with individual differences in context-dependent processing of ambiguously-valenced faces. A recent data-driven study suggested that individuals had their own specific emotional physiological fingerprints that did not coincide with established emotion categories ( Azari et al., 2020 ). This finding calls for a question whether individuals with similar interpretation of others' emotional expressions would have the similar neural representation of the affective states, especially in an ambiguous setting.
Estimating another person's affective state can be theoretically understood as formulating interoceptive prediction signals to synchronize with the target affective state based on previous experiences ( Gendron and Barrett, 2018 ). Given this, we can hypothesize that individuals with greater context-dependency would incorporate contextual cue more to simulate appropriate affective state of others as a form of interoceptive prediction toward ambiguous facial information. The anterior insula (AI) is known to integrate multiple sensory information including interoceptive signals such as visceral ( Craig, 2002 ) and nociceptive ( Mazzola et al., 2009 ) signals, and exteroceptive signals such as somatosensory, ( Pugnaghi et al., 2011 ) auditory, ( Bamiou et al., 2006 ) and olfactory signals ( Mak et al., 2005 ; for detailed review, Uddin et al., 2017 ). It has been strongly implicated in interoception and individual differences in interoceptive sensitivity ( Critchley et al., 2004 ). Because interoceptive sensitivity is associated with individual differences in empathy ( Grynberg and Pollatos, 2015 ) and accuracy in estimating others' affective state based on facial information ( Terasawa et al., 2015 ;Dal Monte et al., 2013 ), context-dependent emotional literacy towards others may be critically dependent upon the neural circuitry for interoceptive representation.
Through its intimate functional coupling with the AI, the medial prefrontal cortex (MPFC) appears to serve as an ideal location for interoceptive prediction ( Barrett and Simmons, 2015 ;Seth, 2013 ). The MPFC has been widely implicated in emotion recognition and regulation processes ( Lindquist et al., 2012 ;Etkin, Egner, & Kalisch, 2011). In addition, Klumpp et al. (2017) reported the positive relationship between the pgACC and the amygdala activity associated with the decrease in self reports of negative feeling among social anxiety patients during implicit emotion regulation task. Based on these findings, one can predict that the degree of functional coupling between these regions would be associated with individual differences in context-dependent perception of ambiguous facial expression, but this has not been formally tested yet.
In the present study, we developed a novel task where participants, after being presented with odor-cues (e.g., urine, fish, rose, water), watch an actor sniffing one of the supposedly unpleasant, pleasant, and neutral odors, and estimate the affective state (e.g., pleasantness) of the actor. This task is different from most of the previous similar studies that used predefined set of emotions (e.g., fear and happiness) and therefore confined individual responses into a set of responses that lies within the emotion categories ( Wicker et al., 2003 ;Hoemann and Barrett, 2019 ). The rationale of the current experiment design was two-fold. First, we wanted to use more naturalistic stimuli to study affective information processing, and second, we aimed to step aside from the issue related to the predefined categories of emotions and capture the individual variability in estimating other's affective states. To do so, we fitted pleasantness rating data to a linear regression model to measure individual context-dependency. Then, we searched for a neural circuitry associated with the individual variability of context-dependency by using a univariate analysis on the neural responses at the video and the response phases of each trial. In addition, using an inter-subject representation similarity analysis ( Van Baar et al., 2019 ), we aimed to identify the brain regions whose activity patterns reflect the individual variability of context-dependency. Lastly, to assess effective connectivity among the observed regions of interest, we adopted a dynamic causal modeling (DCM) and a parametric empirical bayes (PEB).

Participants
Thirty-nine healthy participants (all females; age range = 22-44; mean age = 30.82) were recruited for the study. A power analysis for a repeated-measures ANOVA testing for within factors suggested that the appropriate sample size to achieve a power of 0.95 with an of 0.05 and a medium effect size of 0.25 was 36. Expecting loss of participants due to technical difficulties, excessive head movement, and misconception of the experimental instruction, we recruited 39 participants. Given that some recent studies of empathy have suggested gender differences in the neural response to empathy-related stimuli, we recruited only female participants to maximize the homogeneity within participants ( Derntl et al., 2010 ;Schulte-Rüther et al., 2008 ). One out of 39 participants was excluded due to artifacts in neuroimaging data. The remaining 38 participants were included in the final analyses (age range = 22-44; mean age = 30.56). All participants were right-handed, had normal vision, had no history of psychiatric or neurological diseases, and had no structural brain abnormalities. Informed consent approved by the Institution Review Board of Korea University was obtained from all participants prior to the experiments.

Stimuli
The actors in the videos were asked to sniff the cup containing water and to stay in neutral facial expression, and four seemingly identical videos were recorded from each actor to be randomly assigned to the four odor conditions, although participants were told that the actors were sniffing the odor presented prior to the video. This procedure was added to ensure that participants perform the tasks based on subjective interpretations of the visual stimuli (i.e., videos) constructed from contextual information (i.e., odor cues) rather than based on visual information. Although the actors were explicitly asked to pose neutral expressions, participants must have been voluntarily engaged in deciphering the meaning behind the neutral facial expression to resolve ambiguities arisen from the conflict between the neutral expression and the contextual information because they were instructed to guess how the actors in the video would feel. We believe that this assumption is ecologically valid because people often do not reveal their emotion on faces in real life. For this reason, we will call the neutral facial expression as the ambiguous facial expression for the rest of the study. Four context cues (e.g., urine, fish, water, and rose) were selected to achieve sufficient range of valence ranting so that individual context-dependency could be the linear composition of the valnce ratings of context cues. From an independent group of samples ( N = 19), we obtained valence ratings of each odor and video stimuli to compute average valence ratings of them, and the valence rating (mean = 3.09 ± 0.36) of each video confirmed its emotional neutrality. The average valence ratings of each videos and odor words were named as Video Independent Rating (VIR) and Word Independent Rating (WIR), respectively, and were later used to estimate individual context-dependency (See Behavior Analysis).

Procedure
All subjects were checked for their eligibility for MRI scanning upon arriving at the experimental room. Prior to the scanning, all participants A schematic diagram of a typical trial of the main behavioral task. In each trial, participants were first presented with one of four odor-related words (i.e., urine, fish, rose, water) for 2 s (Context cues), and then with a 4 s video clip (Video phase). Finally, participants were presented with a question ( "how does he/she feel? ") prompting their answers on a five likert-scale (1: very unpleasant, 2: pleasant, 3: neutral, 4: pleasant, 5: very pleasant), and asked to estimate the actor's feeling accurately and quickly (Response phase).
received instructions and performed a couple of practice trials to get familiar with the experimental procedure. During the scanning session, all condition trials were pseudo-randomly presented in an event-related design. In each trial, participants were first presented with one of four odor-related words (i.e., urine, fish, rose, water) and then watched a 4 s video clip of an actor sniffing an object in a cup, which was then followed by another fixation of one to three seconds ( Fig. 1 ). Lastly, participants estimated the pleasantness level of the actor in the video. In this last response phase, participants was presented with a question ( "how does he/she feel? ") prompting their answers on a five likert-scale (1: very unpleasant, 2: pleasant, 3: neutral, 4: pleasant, 5: very pleasant). The response cursor was presented at the middle (e.g. the neutral), and the participants moved the cursor to the left or to the right to select desired pleasantness level. Participants were informed of the five likert pleasantness level prior to the experiment and were asked to provide an accurate answer as quickly as possible. Each scanning session included 80 trials (i.e., word type (4) × actor (20)). At the end of the experiment, all participants were debriefed about the goal of the experiment and paid approximately 40,000 KRW ( ≈ $38) for participation.

Behavioral data analysis
For the main purpose of this study, we estimated the degree to which each participant's pleasantness ratings of the actors in the video were influenced by contextual information when the visual information (i.e., facial expression) is held neutral across conditions. To test the main effects of odor type on pleasantness ratings and reaction times, we performed repeated-measures one-way ANOVA with odor type (i.e., urine, fish, rose, and water) as independent variables and with pleasantness ratings and reaction times as dependent variables. For the post-hoc analysis, the pleasantness rating and the reaction time of each odor cue were compared against those of the water condition. We then examined the individual differences in the degree of the context-dependency and the video dependency in estimating other's pleasantness. Both WIR and VIR were fitted to responses in each participant to generate subjectspecific indices of context-dependency ( ) and video dependency ( ). This approach measured the degree of contextual informationdependency ( WIR ) after controlling for individual propensity to incorporate the facial expression itself into their pleasantness ratings ( VIR ). We also examined the possibility that contextual information processing could be in competition with the visual facial expression processing, by checking the correlation between the two beta coefficients.

Image preprocessing
The neuroimaging data were preprocessed by using the default preprocessing pipelines of the CONN toolbox 2018b ( www.nitrc.org/projects/conn , RRID:SCR_009550; Whitfield-Gabrieli and Nieto-Castanon, 2012 ). The images were first realigned, distortion corrected, centered to (0, 0, 0) coordinates, and slice-time corrected in sequential order. The resulting images were then spatially normalized to the standard Montreal Neurological Institute EPI template and resampled into 2 × 2 × 2 mm 3 voxel size. Finally, all the resulting images were then smoothed with a 8 mm full-width at half-maximum (FWHM) Gaussian kernel. This smoothing level has been shown to improve inter-subject functional alignment, while retaining sensitivity to mesoscopic activity patterns that are consistent across individuals ( Shmuel et al., 2010 ).

Neuroimaging data analysis: univariate approach
First-level analyses: For the first-level analyses, we analyze the preprocessed neuroimaging data using the SPM12 (Wellcome Department of Imaging Neuroscience, London, United Kingdom). To examine the effects of context on neural representation during the estimation of other's affective state, we designed a general linear model (GLM) 1 that consisted of four onset regressors for video phase for each context cue, and four onset regressors for response phase for each context cue, resulting in total of eight onset regressors. In addition, GLM2 was designed to find brain regions associated with the individual dependency to the contextual information at the video phase and the response phase. Nine regressors of interest were included in the model and convolved with hemodynamic response function. The GLM2 for the 1st level anaysis included two regressors at the context-cue phase (i.e., context-cue onset and parametric modulator of WIR), three regressors at the video phase (i.e., video onset and parametric modulators of WIR and VIR), and four regressors at the response phase (i.e., response onset, actual response ratings, and parametric modulators of WIR and VIR).
To assess unique contributions of WIR and VIR to BOLD signals, we performed two different GLM analyses with orthogonalization option turned on where either WIR or VIR was the last parametric modulator of the onset regressor at the response phase. For example, in the GLM designed to assess the effect of the WIR, the order of the parametric modulators added to the response phase onset regressor was 1) actual response, 2) VIR, and 3) WIR. We ran these two separate analyses for WIR and VIR, given that, in SPM, the GLM with multiple parametric modulators could be problematic particularly when the orthogonalization option is turned on, because such an option could differentially affect the parameter estimates of the modulators and its interpretation depending on the order of the parametric modulators. ( Mumford et al., 2015 ).
Six additional head motion regressors were modeled to capture movement-related effects in both GLMs. In each participant, the parametric modulation maps of WIR and VIR at both video and response phases were estimated to identify a neural circuitry associated with changes in valence of contextual information as well as the valence of the video stimuli.
Using NeuroSynth customized meta-analysis map for Regions Of Interest : The analysis of the first-level contrast and parametric modulation was limited to the grey-matter areas with a grey-matter mask defined by the automated anatomical labeling (AAL) atlas. Based on our a priori hypotheses of the present study, we further restricted the brain regions of interest to the areas associated with affective information processing for self and others. We used the Neurosynth ( www.neurosynth.org ) to select 2200 studies contained the keywords of 'theory of mind', 'valence', and 'emotion' and subfields of 'emotion' including 'emotional responses' that showed the rate of keyword appearance above 5% in the main text. Then we generated a customized meta-analysis brain mask that survived association test of keywords at FDR p < 0.01. Some of the key brain regions in the mask included the medial prefrontal cortex including the anterior cingulate cortex (ACC), the amygdala, and the posterior cingulate gyrus and the anterior insula (Supplementary Figure S2). A small-volume cor-rection (SVC) with family-wise error at cluster level FWEc ( p < 0.05) was applied to address the issues of multiple comparison.
Second-level analyses: Using the GLM1, we investigated brain regions reflecting effects of the context cue within the search area of the meta-analysis map customized from the Neurosynth. One-way ANOVA was performed between the four context cue conditions at the video phase and at the response phase. In addition, using the GLM2, we investigated brain regions reflecting individual variability in contextdependent facial expression processing, as denoted by WIR . We performed exploratory whole-brain multiple regression analyses with the parametric modulation maps of WIR at both response and video phase to identify brain regions showing significant correlations between their WIR-dependent changes in BOLD signals across the conditions and WIR . Then, we conducted ROI analyses using the resulting brain regions to test a priori hypotheses of the present study.

Neuroimaging data analysis: multivariate approach
We also employed a multivariate pattern analysis to find specific brain regions that represent inter-subject variabilities in contextual dependency on facial emotion processing. The multivariate pattern analysis has been widely used in the field of cognitive neuroscience to examine the population codes and representations of psychological states or information processing ( Chang et al., 2015 ;Huth et al., 2016 ;Kamitani and Tong, 2005 ;Fournel et al., 2016 ). Particularly, the representational similarity analysis (RSA) can be used to identify the brain regions that reflect the structure of psychologically relevant information ( Haxby et al., 2014 ;Kriegeskorte et al., 2008 ). In this study, we used an inter-subject RSA (IS-RSA) to identify brain regions associated with the inter-subject variability in context-dependency ( van Baar et al., 2019 ).
To minimize false positive findings, we parceled the whole brain into functionally meaningful regions for optimal search, by using a priori 200-parcel map from the Neurosynth database, which clustered the whole-brain atlas based on a meta-analytic functional coactivation pattern ( https://identifiers.org/neurovalut.collection:2099 ). We excluded seventeen parcels from the map because they were not included in our scan coverage, resulting in 183 parcels for the analysis. We expanded the ROI map used for the univariate analysis to the whole-brain areas for the multivariate pattern analysis because previous studies reported different statistical properties and implications between the two approaches ( Coutanche, 2013 ;Jimura and Poldrack, 2012 ;Davis et al., 2014 ). For example, the multivariate pattern analysis is known to be more sensitive to voxel-level variability within subjects but less sensitive to subjectlevel variability in mean activation, compared to the univariate analysis ( Davis et al., 2014 ). The behavioral dissimilarity matrix representing the individual differences in context-dependency in processing facial emotions was computed by measuring the between-subject Euclidean distance of the parameter for all dyads of participants. For the neuroimaging data, we computed inter-subject dissimilarity matrix for each of the 183 parcels of the neural activation maps at the onset of the video phase paired with the water cue to measure intersubject dissimilarity of mean activity patterns while watching videos. To do so, we built first-level GLMs specifically for the IS-RSA, which was composed of six different onset regressors of 1) context-cue phase, 2) video phase with urine cues 3) video phase with fish cues, 4) video phase with water cues, 5) video phase with rose cues, 6) response phase, and one parametric modulator of the WIR at the context-cue phase. We added the parametric modulator of the WIR to the cue phase regressor to regress out any valence-related variability of BOLD signals time-locked to the cue phase. We believe that the hemodynamic responses to the video phase were successfully deconvolved from those to the cue phase, because the IS-RSA results found at the video phase are different from those found at the cue phase (Supplementary Figure S9). Furthermore, we also performed the same analysis without the WIR to assess the degree of influence of context-valence at the context-cue phase to the video phase (see Supplementary figure S1 & S2). For this analysis, we were specifically interested in the water cue condition at the video phase, which we believed is appropriate for examining individual differences in the degree of contextual dependency while inter-subject comparisons are maximally controlled for other irrelevant variables such as motor response preparation related to individual differences in ratings. The same IS-RSA was performed at the context-cue phase (see supplementary figure S10).
To identify specific brain regions whose neural representations are associated with behavior, the nonparametric Kendall's tau-a correlation was computed between parcel dissimilarity matrix and behavioral dissimilarity matrix ( Nili et al., 2014 ). To test the significance of the resulting tau-a, the behavior RDM was shuffled while fixing the neural RDM for the permutation test, and the same statistic was computed for 10,000 times to generate a null distribution for the hypothesis testing. For example, after shuffling the behavioral RDM, the behavioral distance between subject 1 and 5 could be paired with the neural dissimilarity between subject 3 and 4 rather than with the neural dissimilarity between subject 1 and 5. A Bonferroni correction was applied based on the number of parcels to adjust p-values for multiple comparison and the p-values remaining below 0.05 after this correction suggested significant relationship between inter-subject behavioral patterns of context-dependency and the neural representation patterns of the parcel.
We applied the same procedures to each of the other contextual cues (i.e., Urine, Fish, and Rose) at the video phase exclusively in the brain regions identified in the water condition to narrow down the parcels specific to each context.

Dynamic causal modeling
Dynamic Causal Modeling (DCM) infers the effective connectivity (EC) between brain regions by constructing a model with more than one node and estimating the strength between the nodes ( Friston et al., 2003 ). Parametric Empirical Bayes (PEB) generates a group-level model after considering individuals' DCMs with covariates of inter-subject variability ( Friston et al., 2016 ). The PEB analysis reports the credibility of each connectivity and the effects of the covariates, using the Bayesian approach (See Zeidman et al., 2019a , Zeidman et al., 2019b for more details about the DCM and PEB, respectively). To investigate the effective connectivity of the brain regions associated with individual variability in context-dependency, DCM-PEB analysis was performed using SPM12 based on the individuals' contrast maps from the parametric modulation analysis at the response phase and the multivariate dissimilarity maps at the video phase. The time-series of the two brain parcels from the Neurosynth 200-parcel map and two 4 mm sphere of the peak voxels [pgACC (MNI: x = 6, y = 38, z = 6), amygdala (MNI: x = − 22, y = − 8, z = − 14)] were extracted and mean-centered. For the DCM specification, the default setting was set to generate a full model (VOI timings: 2 s, Echo Time: 0.04, Modulatory Effect: bilinear, States per Region: one, Stochastic: no, center input: no, and Time series). For the PEB specification, the congregate of individuals' full model DCM was inserted to generate a single PEB model. Our primary purpose of the DCM-PEB analysis was to examine the individual differences in modulatory effects of experimental perturbation (i.e., context-dependency) on the effective connectivity among the ROIs. Thus, the PEB group-level design matrix consisted of two columns: one representing the average connectivity and the other representing the individual context-dependency. The statistical analysis for the PEB was performed at p > 0.95.

Behavioral results
Participants rated the pleasantness level of the actors in video clips following the negative odor cues (i.e., urine and fish) to be more unpleasant and those following the positive odor cue (i.e., rose) more pleasant, compared to those following the neutral odor cue (i.e., water) ( Fig. 2 ). A repeated-measures one-way ANOVA revealed a significant main effect of odor types (i.e., urine, fish, rose, and water) on pleasantness ratings, F (1.376, 50.902) = 114.910, p < 0.001 (Greenhouse-Geisser corrected). Post-hoc analyses revealed that, compared to the mean rating of the control condition (i.e., water), the average ratings of all three odor conditions showed significant differences (urine: t (37) = − 13.943, p < 0.001; fish: t (37) = − 9.739, p < 0.001; rose: t (37) = 7.737, p < 0.001). The average reaction time of the participants ranged from 628 milliseconds (ms) to the 2547 ms, with the group average of 1336 ms. A repeated-measures one-way ANOVA on the reaction time data also demonstrated a significant main effect of odor types, F (3111) = 17.740, p < 0.001 ( Fig. 2 ). Post-hoc t-tests revealed that the mean reaction times of all three odor conditions (i.e., urine: t (37) = 5.075, p < 0.001; fish: t (37) = 4.546, p < 0.001; rose: t (37) = 5.853, p < 0.001) were significantly different from the mean reaction time of the water condition. The contextdependency ( ) ranged from − 0.028 to 1.591 (mean = 0.842, ± 0.401) and the video-dependency ( ) ranged from − 0.202 to 0.372 (mean = 0.025 ± 0.128). Negative correlation between the contextdependency and the video-dependency was observed ( r (36) = -0.366, p = 0.024), indicating that the more individuals used the valence of the context cues in estimating other's pleasantness, the less they were likely to use the valence of the video stimuli.

Neuroimaging results: multiple regression analysis with individual differences in context-dependency and general effects of context cues
We first conducted a whole-brain regression analysis to identify brain regions whose activities covary with individual differences in contextual influences. With the first-level contrast maps of all parametric modulators against baseline at events of 1) context-cue, 2) video, and 3) response phase, we conducted second-level analysis using context dependency as a covariate. The whole-brain analysis revealed that individual differences in the context modulation at the response phase was negatively associated with activities in the left precuneus, anterior cingulate cortex, right supplementary motor area, right post-central gyrus, left visual cortex, and left supramarginal gyrus (see Table 1 ). No significant voxels were found in the context-cue and video phases at the whole-brain level. As a post-hoc analysis, we restricted the search space of the 2nd level analysis using a customized brain mask map from the Neurosynth package. At the video phase, this analysis revealed that the inferior frontal gyrus ( x = 50, y = 28, z = 2, Z = 5.01, p < 0.05, onetailed, SVC corrected unless otherwise stated) was positively associated with the β WIR . At the response phase, the activities of the pgACC ( x = 6, y = 38, z = 6, Z = 4.29, two-tailed, p < 0.05; Fig. 3 B ) and the amygdala ( x = − 22, y = − 8, z = − 14, Z = 4.32, two-tailed, p < 0.05; Fig. 3 D ) were negatively correlated with the . The relationship between the degree of context-dependency and the changes in the activities of the pgACC and amygdala was negative, indicating that more contextdependent participants showed increasing BOLD responses to context cues with more negative valence ratings in both the pgACC and the amygdala. To address possible inefficient deconvolution due to the relatively short time interval between the response phase and the context cue of the subsequent trial, we implemented two additional GLMs: one without response regressor (no-response GLM) and the other without context cue regressors (no-context GLM). The no-response GLM revealed no significant brain regions (see Supplementary Information), but the no-cue GLM revealed the same neural regions as the GLM2, which confirms that the main findings were not due to inefficient deconvolution.
Furthermore, we investigated the general effects of the context cue by performing one-way ANOVA between the context cues at the video and response phase. No significant brain regions survived statistical threshold. Any effects of the context cues from the baseline revealed a network of brain regions including fusiform face area (FFA), ventromedial prefrontal cortex (VMPFC), pgACC, bilateral hippocampus, and A) The negative correlation between the context-dependency and the video-dependency ( r (36) = − 0.366, p = 0.024) indicated that individuals with more context-dependent tend not to use the valence of the video stimuli in estimating other's affective state. B) The mean ratings of pleasantness across conditions from the independent study group (red) and the main study group (green). A repeated-measures one-way ANOVA revealed that participants in the main study (green) rated more negatively on the video clips following negative odor cues (i.e., urine and fish) and more positively on those following a positive odor cue (i.e., rose), compared to those following a neutral odor cue. F (1.376, 50.902) = 114.910, p < 0.001 (Greenhouse-Geisser corrected). Post-hoc analyses of the main study group (green) revealed that the average ratings of all three odor conditions were significantly different from the mean rating of the control condition (i.e., water). Fig. 3. A) Two representative participants' pleasantness rating data (Orange = context dependent, blue = context independent). The pleasantness ratings of the individual responses were fitted to the odor valence to estimate individual differences in the dependency on contextual information, . Multiple regression of the WIR parametric maps on at the response phase showed that both the pgACC (B and C: x = 6, y = 38, z = 6, Z = 4.28) and the amygdala (D and E: x = − 22, y = − 8, z = − 14, Z = 4.25) reflected individual differences in context-dependency (red, p < 0.005; yellow, p < 0.001, FWEp corrected p < 0.05) when estimating pleasantness for others. All brain regions were p < 0.001 FWEc corrected at p < 0.05, two-tailed.

Fig. 4.
Inter-subject representational similarity analysis. IS-RSA revealed that the participants who employed similar strategies (i.e., similar , here shown as behavior response histograms for graphical purpose only) in estimating other people's pleasantness levels displayed similar activity patterns in brain regions including the ventral anterior insula, dorsal anterior insula, posterior lateral orbitofrontal cortex. The neural patterns at the video phase were extracted when the context cue was water, to maximally control for irrelevant neural activities (e.g., motor preparation) while leaving relevant psychological properties intact.
temporal pole (Fig S3) at the video phase. At the response phase, the amygdala, pgACC, and VMPFC were significantly activated from the baseline (Fig S4).

Neuroimaging results: inter-subject representational similarity analysis (IS-RSA)
Significant inter-subject representational similarity effects were found in three brain parcels including the ventral anterior insula, the dorsal anterior insula, and the lateral orbitofrontal cortex ( Fig. 4 ). These results indicate that inter-subject dissimilarity of the neural activity patterns during facial processing paired with a neutral odor cue corresponded to the inter-subject distance pattern of the context-dependency (i.e., ) ( Fig. 4 ). Considering that the pleasantness ratings in the water conditions are almost identical among participants, we assumed that these neural patterns are likely to reflect individual variability in psychological processes related to context-dependent facial emotion processing, rather than individual variability in behavioral response. Interestingly, none of the reported brain regions were significant in the standard univariate analysis. The same IS-RSA analysis was repeated for each odor condition within the same three parcels. The neural activity of the rdAI was observed at the urine condition. The IS-RSA performed onto the GLM without the WIR parameter added to the context cue onset regressor revealed larger clusters of the anterior insula ( Figure S1) and activities of vAI across all conditions ( Figure S2).

Neuroimaging results: DCM-PEB analysis
The DCM-PEB analysis tested the modulatory changes in effective connectivity among the regions of interest against zero due to variations in the experimental conditions and individual variability in contextdependency ( Fig. 5 ). For each effective connectivity, the average connectivity changes and its posterior probability are shown in supplementary materials and methods ( Supplementary Table S1 ) ( Table 2 ).
The DCM-PEB analysis at the response phase explored the directional interactions among the brain regions obtained from the IS-RSA and the GLM analysis. Credible positive correlations between modulatory changes and the odor valence were observed in the projections , whereas the effective connectivity from vAI to amygdala and from pgACC to rdAI linearly decreased with the (Amyg: amygdala, pgACC: pregenual anterior cingulate cortex, rdAI: rostrodorsal anterior insula, and vAI: ventral anterior insula).
from pgACC to amygdala (beta = 0.62, pp = 1.0). No credible negative correlations between modulatory changes and the odor valence were observed. Credible positive linear relationships between effective connectivity and individual context-dependency were observed in the projections from vAI to pgACC (beta = 1.02, pp = 0.97). Credible negative linear relationships between effective connectivity and individual All brain regions were p < 0.001, FWEc corrected at p < 0.05, two-tailed. The customized meta-analysis mask was created from the python package NeuroSynth ( www.neurosynth.org ).

Discussion
This study investigated the neural correlates of individual differences in utilizing contextual information to estimate affective states from other people's faces. As predicted, we found a large variance in participants' ratings of pleasantness on ambiguous facial expressions, which was mainly caused by the degree to which participants' ratings were influenced by the contextual cues. This behavioral trend in the participants' performance was reflected in the functional neuroimaging data during the task as well. First, at the univariate level, the individual differences in context-dependency were associated with the activities of the pgACC and the amygdala at the response phase. Second, at the multivariate level, the IS-RSA results indicated that the distance pattern between the dyads of participants' pleasantness ratings was correlated with the distance pattern between the same dyads of participants' neural activities in the ventral AI and the piriform gyrus while watching the videos. In other words, individual differences in the activity patterns in these regions were associated with those in contextdependency among participants. To investigate the causal relationships among all the identified brain regions, the DCM/PEB model was built based on the neural regions from the parametric GLM analysis and the IS-RSA while considering the context-dependency as a between-subject covariate. This analysis revealed that the effective connectivity from the vAI to the pgACC at the response phase was positively modulated by the context-dependency, implying that people with greater contextdependency showed stronger connectivity strength in those neural pathways. To summarize, the present findings suggest that participants with higher dependency to contextual cues are characterized by the distinctive neural patterns in the vAI, which sends out greater modulatory signals to the pgACC while estimating other people's affective states.

The anterior insula as the hub of information integration for empathic responses
The neural mechanism behind empathy involves the representation of the internal states of others to synchronize with their affective states ( Gendron and Barrett 2018 ). Consistent with this idea, reading other people's emotional states appears to depend on individual differences in interoceptive sensitivity. For example, individuals who were more sensitive to their own heartbeats were more likely to feel greater compassion to painful pictures ( Grynberg and Pollatos, 2015 ) and more tuned at recognizing subtle emotional features (Terasawa et al., 2014). The meta-analysis of empathy ( Schurz et al., 2020 ;Fallon et al., 2020 ) and interoception ( Schultz, 2016 ) indicated strong involvement of the AI in both processes. Indeed, the right AI was the only brain region that signaled for both interoception and empathy ( Zaki et al., 2012 ) and a conjunction analysis of empathy, interception, and social cognition revealed the AI as one of the core regions for the integration of interoception and social information ( Adolfi et al., 2017 ). In an autism spectrum disorder study (ASD), the abnormal activities of the right AI of the ASD males with the alexithymia were linked to increased bodily signals (e.g., skin conductance rate) and diminished discriminability of painful images ( Gu et al., 2015 ). Based on these findings, we speculate that the activity pattern of the vAI associated with the context-dependency at the video phase reflects enhanced interoceptive information processing necessary for simulating the internal affective state of others.

The amygdala-pgACC communications in ambiguous facial information processing
Here we report that the context-dependent individuals are characterized by coactivation of the amygdala and the pgACC when estimating another person's pleasantness level and by increased modulatory effective connectivity from the pgACC to the amygdala. The amygdala signals the valence of ambiguous facial expressions (e.g., surprised) associated with context ( Kim et al., 2004 ;Vrticka et al., 2013), and amygdala activity could predict the valence of surprised faces without the effects of arousal ( Kim et al., 2017 ). The coactivation of the pgACC and the amygdala was associated with amplification of the valence of emotinoal faces associated with context compared to the faces alone ( Lee and Siegle, 2014 ), which supports the view that the pgACC modulates the amygdala responses to resolve emotion conflict ( Etkin et al., 2006 ). In addition, Klumpp et al. (2017) reported the positive relationship between the pgACC and amygdala activity associated with the decrease in self reports of negative feeling among social anxiety patients during implicit emotion regulation task. These results suggest that the communication between the pgACC and the amygdala serves a crucial role in context-dependent facial emotion recognition.

The AI-pgACC communications in context-dependent facial emotion processing
The pgACC has been extensively reported in social functions such as tracking other people's motivations ( Apps et al., 2016 ;Chang et al., 2013 ;Wittmann et al., 2018 ), self-referential processing ( Northoff et al., 2006 ), self-conscious emotions ( Sturm et al., 2013 ) as well as empathic processing ( Xu et al., 2009 ;Wittmann et al., 2018 ;Schurz et al., 2020 ;Fallon et al., 2020 ). Damage to this region impairs empathic ability and social awareness ( Seeley, 2008 ). Similarly, the activities of the pgACC correlate positively with the trait empathic scores among congenital pain insensitive patients, suggesting the pgACC involvement in the compensatory mechanism for lack of autonomous responses for painful pictures ( Danziger et al., 2009 ). The pgACC-AI communication has been also implicated in interoceptive prediction ( Barrett and Simmons, 2015 ). Here, we report the positive relationship between context-dependency and the modulatory effective connectivity from the vAI to the pgACC. These are the two major regions containing von Economo neurons, which are known to be specialized for rapid and longrange neural communications with the brainstem ( Allman et al., 2010 ;Fischer et al., 2016 ). The pgACC seems to be a part of the visceromotor region that generates allostatic predictions in response to incoming sensory information, serving to minimize prediction errors arising from viscerosensory regions such as the AI and to modulate the set points of homeostatic reflexes ( Critchley et al., 2013 ;Sterling, 2014 ;Barrett and Simmons, 2015 ;Stephan et al., 2016 ;Kleckner et al., 2017 ). Based on these findings, we speculate that individuals who were more contextdependent in estimating other people's pleasantness could be the ones who experienced greater mismatch between the predictions of other people's internal state generated by context information and the sensory information of facial emotion processing, thereby resolving prediction errors via assigning appropriate pleasantness ratings to emotionless actors.
The classical notion of the basic universal emotion ( Ekman and Cordaro, 2011 ) has been challenged by the interoceptive predictive coding, which suggests that emotional responses are not confined within the predefined physiological and psychological boundaries, but they are recollections of interoceptive prediction signals experienced from previous events constructed based on language and culture ( Barrett and Simmons, 2015 ;Hoemann and Barrett, 2019 ). This predictive account suggests that individuals do not necessarily display similar neural responses to the same affective stimuli ( Azari et al., 2020 ) and the meaning of the affective stimuli could be only understood based on individuals' perspective on the affective stimuli. We believe that our study is one of the first empirical neuroimaging study representing the individual variability in the predictive accounts of the affective states.

Conclusion
The present study investigated the individual variability in contextual dependency of ambiguous facial emotion processing. Consistent with previous studies, people showed a large variability in contextdependency when estimating others' ambiguous facial expressions. The changes in activities of the pgACC and the amygdala were correlated with the individual context-dependency, and the different activity patterns of the AI at the video phase reflected the spectrum of contextdependency during the facial information processing. Importantly, the greater the contextual dependency, the larger the effective connectivity strength from the vAI to the pgACC. These findings provide the key neural signatures of individual variability in context-dependent perception of ambiguous facial expression. This study suggests that individual differences in context-dependent ambiguous facial information processing arise from distinctive orchestration of multiple brain regions to simulate the internal states of others and modulate them to generate appropriate behavioral responses in given contexts.

Author contributions
H.K. designed the experiments. K.K., W.J. and H.K. performed data analysis. K.K., C.W. and H.K. wrote the manuscript. All authors reviewed the manuscript.

Declaration of Competing Interest
The authors declare no competing financial interests.