Stimulus Inversion and Emotional Expressions Independently Affect Face and Body Perception: An ERP Study

Faces and bodies provide critical cues for social interaction and communication. Their structural encoding depends on configural processing, as suggested by the detrimental effect of stimulus inversion for both faces (i.e., face inversion effect - FIE) and bodies (body inversion effect - BIE). An occipito-temporal negative event-related potential (ERP) component peaking around 170 ms after stimulus onset (N170) is consistently elicited by human faces and bodies and is affected by the inversion of these stimuli. Albeit it is known that emotional expressions can boost structural encoding (resulting in larger N170 components for emotional than for neutral faces), little is known about body emotional expressions. Thus, the current study investigated the effects of different emotional expressions on structural encoding in combination with FIE and BIE. Three ERP components (P1, N170, P2) were recorded using a 128-channel electroencephalogram (EEG) when participants were presented with (upright and inverted) faces and bodies conveying four possible emotions (happiness, sadness, anger, fear) or no emotion (neutral). Results demonstrated that inversion and emotional expressions independently affected the Accuracy and amplitude of all ERP components (P1, N170, P2). In particular, faces showed specific effects of emotional expressions during the structural encoding stage (N170), while P2 amplitude (representing top-down conceptualisation) was modified by emotional body perception. Moreover, the task performed by participants (i.e., implicit vs. explicit processing of emotional information) differently influenced Accuracy and ERP components. These results support integrated theories of visual perception, thus speaking in favour of the functional independence of the two neurocognitive pathways (one for structural encoding and one for emotional expression analysis) involved in social stimuli processing. Results are discussed highlighting the neurocognitive and computational advantages of the independence between the two pathways.


I. INTRODUCTION
F ACES and bodies provide extremely important non-verbal cues for social interaction and social communication.Affective computing received an important boost in developing models to gather information (e.g., emotion, age, ethnicity) from these social stimuli in the last decades [1].Given its "special" status, neurocognitive research and affective computing have so far mostly focused on face perception [2].In the last 15 years, however, research has also started to investigate body perception [3].Faces and bodies share some fundamental social features: both can convey information about identity, emotional state, gender, age and intentions.Faces are symmetric, and all share a common 3D configuration of critical elements (i.e., two eyes above the nose, above the mouth).The same holds for bodies (i.e., a torso, with a head on the top, two arms connected to the higher part and two legs to the lower part).Both categories of stimuli seem to be processed by specific cognitive mechanisms based on specialised neural bases [4], [5].
It is now widely accepted that the structural encoding of faces depends on the detailed analysis of the configuration of facial features, also known as configural processing [6], which is disrupted in clinical conditions such as prosopagnosia [7].The face inversion effect (FIE) supports (albeit indirectly) the existence of configural face processing [8]: the recognition of faces presented upside down is harder than the recognition of upright faces.
Configural processing is also fundamental to the visual perception of human bodies [9].Reed and colleagues showed that faces and bodies partly share the same processing pathway, but recognition of bodies depends particularly on a specific level of configural processing: structural information processing.Configural processing of bodies has also been confirmed by studies that investigated the body inversion effect (BIE), analogous to the FIE: recognition of inverted body postures (compared to upright ones) yielded slower reaction times and higher error rates [9], [10].
Neurophysiological studies carried out using electroencephalography (EEG) or magnetoencephalography (MEG) showed that a negative event-related potential (ERP) peaking around 170 ms after stimulus onset (named N170) is consistently elicited by human faces [11].This component presents the maximum amplitude over occipito-temporal areas.In the FIE, N170 shows longer latency and often larger amplitude for inverted (compared to upright) faces [11], [12], [13].The N170 component is thought to be generated by the neural processes involved in structural encoding stages, where the representation of face configuration is created for recognition [13].Moreover, the P100 component, an early ERP component peaking around 90-120 ms after stimulus onset, is sensitive to the processing of human faces, shows a FIE [11], shows an early global response to faces and most probably reflects the perception of a stimulus as a face [14].
An effect of inversion on the N170 component was also observed for body perception: larger amplitude and longer latencies were observed in this component for inverted (compared to upright) bodies [15].These findings demonstrate that bodies are processed by specialised cortical structures that share some commonalities with face perception (configural processing).The precise pattern of similarities and differences is still to be determined.
Bruce and Young [16] developed a cognitive model suggesting that facial expression analysis occurred after structural encoding of face representation, by means of a bottom-up process (Fig. 1A).On the contrary, Haxby and colleagues [17] theorised a model in which the early perception of facial features (in inferior occipital areas) occurs before the processing of emotional expressions (together with other changeable aspects of faces in STS), by means of both feed-forward (bottom-up) and feed-back (top-down) connections (Fig. 1B).The first model would, thus, suggest that structural encoding is completely independent of emotional expressions, in contrast the second seems to imply that, albeit occurring after the encoding of facial features, emotional expressions can influence structural encoding through top-down information.
By only considering theoretical models, we cannot conclude whether the processing of emotional expression and structural encoding are completely independent [16], interact at a late stage [17], or even during the early stages of perception [18].Several experimental studies investigated the influence of emotional expression on structural encoding by testing whether emotional expression modulated N170, the ERP component related to structural encoding and holistic processing of faces [2].The first results appeared to be in favour of dual theories (i.e.claiming that structural encoding and facial expression processing are sequential processes) (Fig. 1A).Eimer and Holmes [19] found that fearful faces, compared to neutral ones, did not influence N170 latency or amplitude, while they affected different frontal ERP components: N1, vertex positive potential (VPP) and late frontocentral positivity (starting at 250 ms post-stimulus).This proved no influence Fig. 1.Competing theoretical models of face perception.Dual theories (A) proposed cognitive models in which the facial expression is analyzed after the structural encoding of face/body representation (performed via configural processing), by means of a bottom-up process.Integrated theories (B) proposed parallel, integrated mechanisms in the processing of identity (performed through configural processing) and emotional expressions (exploiting fast saliency processing based on subcortical pathways).In this case, stimulus structural encoding and expression analysis are performed in parallel and interact in both directions through both bottom-up and top-down processes.When social stimuli are presented inverted, we expect (C) according to dual theories, that expression analysis is impossible since the configural processing (leading to structural encoding) is disrupted; (D) according to integrated theories, that emotional expressions influence stimulus structural encoding via top-down processes since fast saliency processing is not affected by inversion.
of emotional expression on structural encoding.In a follow-up study, they found the same effect for all six basic emotional expressions [20], showing that these results do not reflect emotion-specific processes, which may occur in separate neural areas at later latencies.All these findings seem to exclude an influence of emotional expression on structural encoding, whereas they identified later psychophysiological correlates of conscious evaluation of emotional content.
Vice-versa, the findings from several other studies supported integrated theories, arguing for parallel, integrated rather than segregated mechanisms in the processing of identity and emotional expressions (Fig. 1B).Batty and Taylor [21] first found global effects of emotion on P1, while latency and amplitude differences among emotional expressions were seen on the N170 component.
Positive emotions evoked N170 significantly earlier than negative emotions, and the amplitude of N170 evoked by fearful faces was larger than neutral or surprised faces.Further studies reported larger amplitude for various emotional expressions on both P1 and N170 [22] or N170 only [23].These findings are consistent with an early automatic encoding of facial expressions during the structural encoding stage or even earlier stages [21], [22].This theory was also supported by a meta-analysis including 57 studies about the N170 sensitivity to facial expressions [24].Considering this overall finding, integrated models of perception of emotional expressions [18] seem to be reliable for facial expressions.
Little is known, however, about body emotional expressions.Stekelenburg and de Gelder [25] studied ERP correlates of fearful expressions in faces and bodies.As predicted by models discussed above, results show larger occipitotemporal N170 and P2 components for fearful faces, together with larger frontocentral N2.For bodies, however, N170 was not influenced by the emotional expression, while larger frontal VPP and sustained fronto-central negativity (around 300-500 ms poststimulus) were found for fearful bodies (compared to neutral ones).This was confirmed in further studies [26], suggesting that decoding of bodily expression occurs in the early stages of visual processing and does not influence the structural encoding of these stimuli (as pointed out by the absence of N170 modifications).This hypothesis could be in line with the differences in configural processing between faces and bodies.However, few studies were available to assess the effect of emotional expression on the structural encoding of bodies.
Since stimulus inversion was shown to affect the structural encoding of both faces and bodies, the current study aims to investigate the neurophysiological correlates of emotional expressions in combination with FIE and BIE.Studies investigating the interaction between the inversion effect and emotional expressions in bodies [25] and faces [27] did not lead to clear results.According to the literature, inversion disrupts the structural encoding of these stimuli, while emotional expressions enhance it (at least in faces).Studying the interaction between these two manipulations could bring important insights into the cognitive mechanisms involved in these processes and their neural bases, in particular, what class of theories is supported by the results.Moreover, when considering applications in affective computing, knowing whether emotion expression processing depends on stimulus features (i.e., orientation, face vs. body) would be crucial to interpreting and classifying emotion-related neural activity.
Based on the previous literature, we formulated two contrasting hypotheses: on the one hand, if dual theories of face perception were correct, we expected that emotional expressions did not influence structural encoding when impaired by stimulus inversion (Fig. 1C); in other words, we expected that emotional expressions did not affect N170 for inverted stimuli.On the other hand, according to integrated theories, we expected that structural encoding was affected by emotional expressions even when this was influenced by inversion (i.e., N170 was independently affected by emotional expressions and inversion) (Fig. 1D).Given the mixed literature, we also expected that these effects may vary according to (i) social stimulus type (faces vs. bodies), (ii) different ERP components or behavioural performance (Accuracy, P1, N170, P2), (iii) task performed by participants (explicit vs. implicit processing of emotional information; [28]).Results are presented and discussed following these hypotheses' order.
The current study was composed of two experiments: Experiment 1 aimed to investigate the influence of emotional expressions and inversion on different ERP components elicited by faces and bodies.Experiment 2 acted as a control experiment and investigated psychophysiological FIE and BIE in comparison to the presentation of houses.Data collected in Experiment 2 have already been analysed in the timefrequency domain [29].

A. Participants
Twenty-four healthy Caucasian participants (11 M; mean age: 28.2 ± 5.8 years) were recruited for the experiment among university students and their acquaintances.One participant was excluded from the analyses due to technical problems related to data quality.All participants gave written informed consent before enrolment in this study and were screened for contraindications to EEG: exclusion criteria included the presence of a history of any neurological or psychiatric disease, use of psychoactive drugs, abuse of any drugs (including nicotine and alcohol), as well as any skin condition that could have been worsened by the use of the EEG cap.All participants had normal or corrected-to-normal vision and were right-handed.

B. Stimuli
In Experiment 1, 128 pictures were presented twice (once per experimental block, see Procedure) to each participant (one per trial).64 pictures of Caucasian faces were extracted from the Radboud Faces Database (RaFD), [30] and 64 pictures of bodies were extracted from the Bodily Expressive Action Stimulus Test (BEAST) [31], based on the following criteria.Half of the pictures of faces and bodies conveyed neutral expressions; half conveyed four different emotional expressions: happiness, sadness, anger and fear (8 stimuli each).The pictures depicted 32 different actors for faces and 32 for bodies (balanced for gender).Half of the pictures were presented upright, and the other half inverted, counterbalanced across participants.
In Experiment 2, 96 pictures were presented to each participant (one per trial).32 pictures of Caucasian faces were extracted from the Radboud Faces Database (RaFD) [30], 32 pictures of bodies were extracted from the Bodily Expressive Action Stimulus Test (BEAST) [31], and 32 pictures of houses were extracted from the dataset used in a previous EEG experiment [32].All pictures representing faces and bodies conveyed neutral expressions, and they depicted thirty-two different actors for faces and thirty-two for bodies (balanced for gender).Half of the pictures were presented upright, and the other half were inverted, counterbalanced across participants.
All pictures were converted into black and white and cropped to a blank background using Adobe Photoshop CS5 software and had a dimension of 7 × 10.5 cm, which subtend a visual angle of 4 • x 6 • on a 22-inch LCD monitor positioned 100 cm away from participants.To match all stimuli's lowlevel visual features, mean luminance was manipulated using MATLAB® R2016a [33] and the SHINE toolbox [34].
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

C. Procedure
After signing informed consent, participants wore the EEG cap and were seated in a dimly illuminated electrically shielded room.There the cap was connected to the EEG amplifier and participants began the experiments.Both Experiments 1 and 2 were run using E-Prime® 2.0 software.
In Experiment 1, participants were presented with 256 stimuli divided into 4 blocks of 64 stimuli each, that were randomly presented using a permutated blocks order for each participant: faces and bodies were alternately presented on even (2 and 4) or odd (1 and 3) blocks.Before each block, 5 trials were presented as training, and feedback on participants' responses was given.Each trial consisted of a fixation cross that was shown for 1 s, then the stimulus was displayed for 500 ms, followed by a response screen (max duration: 5 s), during which participants must respond by pressing one out of two buttons on an EGI ® Chronos response box.Participants' task was to detect if the stimulus was male or female in the first (or last) two blocks, or if it was emotional or neutral in the last (or first) two blocks.After the response (or after 5 s of the response screen) a grey screen was presented for one second, before beginning the next trial.Only in the emotions task, after responding "emotional", the participant was asked to choose one out of four emotions (happiness, sadness, anger, fear) that best represented the stimulus s/he had seen, by pressing one out of four buttons on the response box.The screen asking the exact emotion had no time limit and was terminated by the participant's response.
In Experiment 2, participants were presented with 96 stimuli that were divided into 3 blocks of 32 stimuli (+ 5 for practice) and randomly presented using a permutated block order for each participant.The structure of each trial was identical to Experiment 1, except for the participants' task: they were asked to detect if the stimulus was presented upright or inverted by pressing one of two buttons on the response box.

D. EEG Data Recording and Analysis
EEG data were recorded using a high-density 128-channel Hydrocel Geodesic Sensor Net (Electrical Geodesic Inc., EGI, Eugene, OR, USA) referenced to the vertex [35].The EEG signal was amplified with EGI NetAmps 400, digitised at a 1000 Hz sampling rate, and recorded.No filters were applied during signal recording.Electrode impedances were kept below 50 k during the whole experimental procedure.
EEG data were analysed using MATLAB® R2016a [33] house-made scripts, EEGLAB [36] and FieldTrip toolboxes [37].A band-pass filter (0.5-100Hz) and a notch filter (50 Hz) were applied to limit the signal of interest and remove the power line noise.Data were subsequently segmented into epochs (i.e., trials) of 2000 ms length, starting from the presentation of the fixation cross and ending 500 ms after the presentation of the response screen.Each trial was baseline-corrected by removing the values averaged over a period of 1000 ms (from 1000 to 0 ms before stimulus), during which participants were looking at the fixation cross.After visual inspection, trials affected by prominent artifacts (i.e., major muscle movement and electric artifacts) were removed, and bad channels were deleted.Also, trials in which participants gave wrong responses were removed.On average, 241 trials per participant in Experiment 1 and 90 in Experiment 2 were included in the analysis.The signal was referenced to the common average of all electrodes [38], and Independent Component Analysis (ICA) was applied to remove remaining artifacts related to the muscular and ocular activity.After removing the remaining artifacts using ICA, noisy channels were spatially interpolated.Before calculating ERPs, data were downsampled to 250 Hz and band-pass filtered in the range 1-45 Hz.All trials were then divided into categories, considering stimulus category, inversion, task and emotion in Experiment 1, and only stimulus category and inversion in Experiment 2. For each participant, all trials in each category were averaged to obtain ERPs.
To define a time window of interest (TOI) for each ERP component, we referred to both existing literature and visual inspection of butterfly plots representing the activity of all upright and inverted conditions (Fig. 2).Three different TOIs were thus defined: 80-135 ms after stimulus onset for P1, 140-200 ms for N170, and 200-350 ms for P2.
Considering our a-priori hypotheses, we defined two occipitotemporal regions of interest (ROIs) by referring to both literature and visual inspection of all-conditions average multiplot and one topographic plot for each TOI (Fig. 3).The two ROIs (left and right clusters) included five occipitotemporal symmetrical channels each (i.e., electrodes 58, 64, 65, 69, 70 for the left cluster; 83, 89, 90, 95, 96 for the right cluster -standard EGI 128-channel montage).Electrophysiological peak amplitude in µV was extracted from these channels for the three TOIs in each participant and then averaged across the five channels of each ROI.

E. Statistical Analyses
EEG data statistical analyses were performed using RStudio software [39].Mixed-Effect Linear Model analyses were run using the open-source packages "lme4" [40] and "lmerTest" [41].For the purposes of this study, we employed linear mixed-effects models (LMM) to fit ERP components and a logistic generalised linear mixed-effects model (GLMM) to fit Accuracy data.The general form of the model (in matrix notation) is: where y is the outcome variable; X is a matrix of the predictor variables; β is a vector of the fixed-effects regression coefficients; Z is the design matrix for the random effects (i.e., the random complement to the fixed X ); u is a vector of the random effects (i.e., the random complement to the fixed β); and ϵ is a column vector of the residuals, i.e., the portion of y variance that is not explained by the model Xβ + Z u.The advantage of mixed models relies in the possibility to account for individual variability (represented by random effects), that would otherwise be included in error variance.
In the Accuracy GLMM, a logistic model was fit on the binary outcome (i.e., Accuracy = 1 or 0).The model has the same structure but employs the logit link function, i.e., the model is fitted on the following outcome variable: where p represents the probability of event 1 (i.e., Accuracy = correct) and 1 − p represents the probability of event 0 (i.e., Accuracy = wrong).Participants' Accuracy data from Experiment 1 were analysed by using a GLMM computed on a binomial distribution, allowing thus to analyse Accuracy at a single-trial level.This model included as independent variables Stimulus (2 levels, Faces vs. Bodies), Inversion (2 levels, Upright vs. Inverted), Task (2 levels, whether participants were asked to identify Emotion vs. Gender) and Emotion (5 levels, Neutral vs. Happy vs. Sad vs. Angry vs. Fearful).Since the full factorial model could not converge, all main effects, two-way interaction effects, and the three-way Stimulus * Emotion * Task interaction effect were included as fixed effects.Models including any other interaction effects could not reach convergence in the optimization process.In this GLMM, the significance of each effect was estimated by performing Likelihood Ratio Tests (LRTs) with corresponding null models.LRTs can show whether including a parameter in the model significantly increases the variance explained by the model.Participants' Accuracy data from Experiment 2 were not analysed since they reached a ceiling effect (∼ 98%) due to the ease of the task (identifying orientation).
The peak amplitude and latency values were then analysed by using LMMs.Six LMMs were performed in Experiment 1, two for each component (P1, N170, P2; Amplitude and Latency), and two were performed for N170 in Experiment 2 since it was carried out to have confirmatory results on N170.Models used to analyse data from Experiment 1 included the same independent variables of the Accuracy model plus Side (Left vs. Right cluster), in a full factorial design.The model used to analyse data from Experiment 2 included as independent variables Stimulus (3 levels, Faces vs. Bodies vs. Houses), Inversion (2 levels, Upright vs. Inverted) and Side (Left vs. Right cluster), in a full factorial design.The significance of each effect was estimated using the Satterthwaite approximation for degrees of freedom in LMMs.
Since the most appropriate use of these models is still under debate [42], [43], it is important to specify the decisional pipeline we followed to decide which random effects were to be included in the models.First, only random effects that allow the model to converge were included.We included only random effects that presented a correlation |r | < .80 with other random effects to avoid multicollinearity.Whenever we needed to choose a random effect, we performed a LRT with a null model and included the random effect only if the LRT resulted as statistically significant.This last criterion showed that the proportion of explained variance was significantly higher than in the null model.All post-hoc comparisons were performed using Tukey's honest significant difference (Tukey's HSD) p-value correction.

III. RESULTS
Statistically significant fixed effects are summarized in Table I (only significant effects were reported for readability).Post-hoc multiple comparisons of highest-level interactions are presented in tables or in-text when the number of comparisons was 4 or lower.Only results relevant to the research questions presented at the end of the Introduction are shown (see Supplementary Materials for a detailed description of all results and comparisons).
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I FIXED EFFECTS -SUMMARY OF STATISTICALLY SIGNIFICANT EFFECTS
The main effects (Fig. 4) of Stimulus, Inversion, Emotion and Side were statistically significant in all ERP components' Amplitude (P1, N170, P2, and N170 in Experiment 2), except for Side in P1.In all cases, these effects showed, respectively, increased amplitude for (i) faces vs. bodies, (ii) inverted vs. upright stimuli, (iii) emotional vs. neutral stimuli, and (iv) right vs. left components.All details are in the Supplementary Materials.

A. Main Findings
To test our main hypothesis, we focused on the Inversion * Emotion interaction effect in Accuracy and all ERP Components.This interaction effect was not significant in any analysis, except for Accuracy and P1 Latency (Table I).These results play in favour of integrated theories (see Discussion).
In Accuracy, the Inversion * Emotion effect showed that stimuli conveying all emotions but happiness presented an inversion effect in Accuracy (Table II).In P1 Latency, the inversion effect (i.e., delayed P1 component for inverted compared to upright stimuli) was present in all emotions except for fear (Table III).

B. Emotions in Faces and Bodies
Differences for specific emotions and social stimuli (i.e., faces vs. bodies) were conveyed by (i) the Emotion main effect in P1 Amplitude: neutral stimuli showed smaller P1 components than any other emotional stimuli (all ts > 7, all ps < .001);(ii) the 3-way interaction effects Stimulus * Task * Emotion on Accuracy (Table IV), N170 Amplitude (Table V) and P2 Amplitude (Table VI).These results are also summarized in Table VII.
For Accuracy (Table IV, Fig. 5), post-hoc comparisons revealed that most differences were present in the emotion recognition task: fearful and sad bodies were recognized more accurately than angry, happy and neutral ones; concerning faces, happy faces were recognized with higher Accuracy than   angry, neutral, and sad faces.In the gender recognition task, the only significant difference based on emotions was found between fearful bodies and neutral bodies.Post-hoc tests on the N170 Amplitude (Table V, Fig. 6) showed important differences while participants were performing the gender recognition task: sad bodies showed a larger N170 component than angry, happy and neutral bodies, and fearful bodies showed a larger N170 than neutral bodies as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.well; regarding faces, faces revealed smaller N170 component than angry, fearful and happy faces.In the emotion  recognition task, sad faces and bodies showed significantly larger N170 components than their neutral counterparts.
Post-hoc comparisons on the P2 Amplitude (Table VI, Fig. 7) revealed that for bodies, while performing both tasks, the P2 component was significantly smaller for neutral bodies than for any other emotional bodies.For faces, the most evident differences were found during the gender recognition task: neutral faces showed a smaller P2 component than angry, fearful and sad faces; on the contrary, during the emotion recognition task, only the difference between neutral and sad faces showed a smaller P2 component.
Regarding Latency, Stimulus main effects showed delayed N170 and P2 components for bodies compared to faces.The Emotion main effect showed that angry stimuli had a delayed N170 component compared to any other emotions except fear (all ts > 3.3, all ps < .01).Moreover, the Stimulus * Emotion effect on P2 Latency showed that differences among emotions (i.e., fear delayed in comparison with happy, neutral and sad stimuli) could be observed only in bodies (all ts > 2.8, all ps < .04),while the only significant difference in faces was a delayed P2 component for sad vs. neutral faces (t 1645 = 2.98, p = .024).

C. Inversion and Lateralization
The inversion effect was statistically significant in all measures in this study (Table I and Fig. 4b).However, in both P1 and N170 Amplitude, the Stimulus * Inversion effect showed that the inversion effect was stronger for faces (P1: t 32.5 = 8.549, p < .001;N170: t 43.8 = 9.571, p < .001)than for bodies (P1: t 32.4 = 4.885, p < .001;N170: t 43.4 = 5.814, p < .001),even though both were statistically significant.An interesting pattern was highlighted in Latency results, since inverted stimuli evoked delayed P1 and N170 components (in both Experiments 1 and 2), while the opposite was true for the P2 component, with delayed P2 for upright stimuli.Moreover, Stimulus * Inversion effects showed an inversion effect in latency only for faces in N170 (i.e., delayed N170 components for inverted faces in both Experiments 1: t 31.9 = 7.776, p < .001;and 2: t 220 = 4.997, p < .001),while the opposite inversion effect in P2 (i.e., delayed P2 for upright stimuli) was found only for bodies (t 54.1 = 5.878, p < .001).
Concerning lateralization effects, the main effect of Side (Table I) indicated that the N170 and P2 components were larger in the right hemisphere than the left.The Inversion * Side interaction effect was statistically significant across all components' Amplitude, demonstrating that the inversion effect was greater on the right side for P1 (right: t 32.4 = 7.874, p < .001;left: t 32.4 = 5.562, p < .001)and N170 (right:  Regarding Latency, the same interaction effect in P2 Latency showed the inversion effect we found i.e., delayed P2 for upright stimuli) was statistically significant only on the right cluster (t 54.4 = 4.468, p < .001)but not on the left one (t 54.4 = 1.853, p = .069).Additionally, in the Stimulus * Side interaction effect, the P1 component showed a higher amplitude for faces than for bodies, with this difference being more pronounced on the right side (t 31.8 = 4.459, p < .001)than on the left (t 31.8 = 2.787, p = .042).

D. Explicit vs. Implicit Processing
All differences concerning the Task variable (i.e., explicit or implicit processing of emotional information) were presented in section IIIB, regarding the 3-way Stimulus * Task * Emotioninteraction effects (Tables IV-VII).
Authorized licensed use to the terms of the applicable license agreement with IEEE.Restrictions apply.

IV. DISCUSSION
This study aimed to investigate how different emotional expressions and inversion effect may interact with the structural encoding of faces and bodies.The structural encoding of a social stimulus [6], [9] represents the configural processing of its features and it allows us to recognize and differentiate between individual faces and bodies by analyzing features like the eyes, nose, and mouth (or torso and limbs) and their relative positions.Behavioural Accuracy and psychophysiological activity at different processing stages (occipito-temporal ERP components: P1, N170, P2) were analysed while inverted and upright pictures of faces and bodies expressing four emotions (happiness, sadness, anger, fear -compared to emotional neutrality) were presented to participants.Their task was to identify either the stimulus gender (implicit processing of emotional information) or the emotion expressed (explicit processing).

A. Main Findings
Drawing from prior research, we developed two main opposing hypotheses.Firstly, in line with dual theories of face perception (Fig. 1A), we predict that emotional expressions won't impact the structural encoding process when it is hindered by stimulus inversion (Fig. 1C).This means we expect no effect of emotional expressions on N170 for inverted stimuli and thus, operationally, an Inversion * Emotion interaction effect.Conversely, adhering to integrated theories (Fig. 1B), we anticipated that emotional expressions would influence structural encoding even in cases where inversion plays a role (Fig. 1D).In other words, we expect N170 to be independently affected by both emotional expressions and inversion (i.e., no interaction effect).Beyond the theoretical impact, this result would be crucial in the field of affective computing.Knowing whether the structural encoding of the stimuli is necessary to process the emotional expression or not is essential to interpreting and classifying neural activity.Considering the previous literature, emotional expressions were shown to enhance the structural encoding of facial stimuli [21], [24], while little evidence for body perception [44], [45] seemed to show that body structural encoding was unaffected by emotional expressions.Inversion was proven to disrupt the structural encoding of both faces [2], [8] and bodies [10].
When considering Accuracy results, the Emotion * Inversion interaction effect showed that inverted stimuli were more difficult to process for all emotions except for happiness, which revealed no behavioural inversion effect.However, no statistically significant interaction of Emotion * Inversion, or higher-level interactions including this effect, were found in psychophysiological data (except for P1 Latency).The behavioural result can be easily explained considering the so-called happy face advantage [46]: happy expression boosts the encoding of the stimulus so that both emotional expression and gender can be easily recognised in inverted stimuli, in which the configural processing was disrupted.This result, together with the effects of emotional expressions on N170, corroborates the interpretation of the influence of emotional expression on the structural encoding of social stimuli.
For P1 Latency, the inversion effect (i.e., a delayed P1 component for inverted compared to upright stimuli) was observed for all emotions except fear.This exception may represent a crucial evolutionary advantage.Fear represents a relevant social cue signalling an incoming danger.Being able to process this signal rapidly (i.e., P1 component) through a subcortical route, independently of its orientation, may play a key role in an individual's survival.
Even though the influence of both inversion and emotional expressions (with different trajectories for faces and bodies) were found on the neural encoding of these stimuli, they did not interact with each other in psychophysiological activation.These results are in favor of integrated theories of perception, i.e., structural encoding is affected by emotional expression even when this is influenced by inversion (Fig. 1D).In other words, emotional expressions can boost structural encoding, independently of its configural or part-based processing.The most likely explanation of this finding is that this "emotional boost" may occur through fast processing in the rapid subcortical stream [18], [47], independently of further slower processing through visual cortices, which may differ in configural or analytical (as suggested by inversion effects).The independence of these mechanisms may be due to the different aims of the two streams: on the one hand, the subcortical pathway (on which emotional expressions act) performs a rougher stimulus processing aimed at extracting valence through a fast appraisal of emotional information.On the other hand, the cortical pathway (on which inversion acts) processes stimuli in a more detailed system, performing a precise configural encoding aimed at identifying the social stimulus and its peculiarities (identity, familiarity, etc.).The different targets of these pathways prove that their functional independence may represent an advantage indeed.Thinking from a phylogenetic point of view, the rapid recognition of the emotional expression of another individual may help a person to understand signals of danger, reward, approach or avoidance (in the case of bodies).The recognition of these signals independently of the configural processing may be an advantage in situations where the individual may process only a part of the social stimulus (e.g., the expression of the eye region or the mouth region, an arm ready to throw a punch) or in conditions of low visibility, in which the other individual cannot be clearly seen and identified.This skill could potentially play an important role in survival and could explain why this process bypasses the configural processing aimed at identity recognition or higher-level processes.
Moreover, we expected that this effect could variate according to (i) social stimulus type (faces vs. bodies), (ii) different ERP components or behavioural performance (Accuracy, P1, N170, P2), (iii) task performed by participants (explicit vs. implicit processing of emotional information).In this regard, the Stimulus * Task * Emotion interaction effects (summarized in Table VII) are crucial to interpret.

B. Emotions in Faces and Bodies
In summary, during the emotion recognition task, faces showed higher Accuracy for happy expressions compared to angry, neutral and sad expressions, whereas fearful and Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
sad bodies displayed higher Accuracy than neutral, angry and happy ones.Regarding ERP components, faces and bodies showed smaller P1 components for neutral stimuli than any other emotional stimuli.Regarding bodies, on N170 they showed specific differences among emotions: sad bodies showed a larger N170 than angry, happy and neutral ones, while fearful bodies presented a larger N170 than neutral ones.Concerning faces, they showed a smaller N170 Amplitude for neutral faces than for emotional ones.On P2 all emotional bodies showed larger amplitude than neutral ones, in both tasks.Overall differences between face and body processing are also conveyed by delayed N170 and P2 components for bodies compared with faces.
After a first rapid coarse processing through the subcortical pathway (reflected in P1), an appraisal of the behavioural approach-avoidance tendency [48] appears to be performed at the structural encoding level for bodies.As a matter of fact, avoidance-oriented emotional expressions (fear and sadness) showed larger N170 components than neutral and approach-oriented expressions (happiness and anger).What influences this stage seems, thus, not to be the emotion itself, but the tendency to approach or avoid the observer.Approachavoidance tendency is expressed more easily through the body than through the face because approach and avoidance behaviour naturally require a movement of the body.Our visual system seems thus to be tuned to perceive the implied tendency to move as part of the body structure.This interpretation is supported by studies observing the activation of visual areas designated for motion processing while observing still images of implied motion [49], [50] and increased bilateral N170 when watching movies of biological motion [51].Movement (even implied) seems to be a key feature in body encoding.For body processing, the subcortical pathway may play a key role not only at the earliest processing stage (P1) but also at the structural encoding stage: the amygdala was proven to be a key structure in discriminating approach-and avoidance-oriented behaviour (expression and gaze, in this case) [52] and the same pattern was found on N170 for bodies in our study.Further neuroimaging research focusing on the neural bases of approach and avoidance in faces and bodies is required to confirm this hypothesis.
The proper emotional expression encoding in bodies seems to occur at P2 latency.The emotional content of bodies is categorised only at this stage, through a precise appraisal obtained by using top-down conceptual processing and categorisation of the emotion.Therefore, the emotional expression discrimination is slower and less immediate than in faces, i.e., it requires higher-level processing and conceptualisation.This interpretation appears to be corroborated by Latency results, in which differences among specific emotions in the P2 component could be observed only in bodies.On the other hand, emotional expressions of faces are already processed at the structural encoding stage (i.e., N170), showing that (for the human visual system) faces are probably the most appropriate stimulus category to convey emotions.This may also be the reason why facial expressions are universal and "spontaneous" means to convey emotions [53], while body postures typically require specific training to be fully understood.
Behavioural results for bodies exactly reflected the results found in N170: avoidance-oriented expressions are categorised better than approach-oriented and neutral expressions.Therefore, participants' Accuracy for bodily expression recognition mostly reflects the structural encoding of these stimuli, corroborating the interpretation linked to the importance of implied motion in body processing.On the other hand, behavioural results for faces reflected a well-known perceptual effect, the happy face advantage [46]: faces showing a happy expression are processed more easily, faster, and are recognised better.This reinforcing effect is most likely what drives the facial expression results in our study.P1 and N170 components did not show this clear advantage for happy expression, which displayed the same increased amplitude as all other facial expressions.Probably the neural bases of this effect can be found at later stages of processing since the P2 component showed increased amplitude only for facial expressions with negative valence.
Consequently, we can conclude that face and body structural encoding are affected by emotional expressions differently, as suggested by the different stages in which they are processed.

C. Inversion and Lateralization
Concerning the variations in different components, two effects deserve special mention, given their relevance to the present literature.In the ERP Amplitude analysis of Experiment 1, the inversion effect was present on all the investigated components: P1, N170 and P2 were all larger for inverted stimuli than for upright ones (see Fig. 4B).Besides, the Stimulus * Inversion interaction effects showed that in P1 and N170 the inversion effect was stronger for faces, even though statistically significant for both.Some studies already found the presence of an inversion effect also on P1 [11], [54], [55], [56] and P2 components [57].These results show, thus, that inversion can affect stimulus encoding at all stages, from the early coarse encoding (P1) to structural encoding (N170), to refined and integrated holistic processing (P2).The novel result is that the inversion effect seems to affect more faces than bodies on the early components related to stimulus encoding.FIE and BIE were typically described as having comparable effect sizes [10], [44], [58].This result is probably related to the differences in configural processing involved in face and body perception.If configural body processing relies only on first-order information and structural hierarchy [9] inversion may represent a "less impairing" disruption of body processing, compared to face processing, which requires all levels of configural processing [6].Thus, bodies may be encoded using a part-based strategy more easily than faces, showing that the shape of single body parts is more relevant in body processing than single facial features in face processing.This interpretation seems to be corroborated by an interesting pattern that emerged in the Latency results regarding the direction of the inversion effect: inverted stimuli caused delayed P1 and N170 components, whereas the P2 component was delayed for upright stimuli.In particular, this inversion effect was specific for faces in the N170 component and for bodies in the P2 component.This is a novel result and Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
needs confirmation by further literature.However, it appears to confirm the reliance on holistic processing for structural encoding (i.e., N170) of faces, while body processing may rely more on refined processing based on top-down conceptualisation (i.e., P2).Moreover, the results from Experiment 2 showed that N170 for bodies is significantly larger than for objects, showing that the structural encoding of these stimuli relies on configural processing and both faces and bodies are encoded as configural gestalts.
A main effect of Side showed larger N170 and P2 components on the right hemisphere than on the left one.The Inversion * Side interaction effect was statistically significant on all components, showing that the inversion effect was larger on the right side for P1 and N170, while it was statistically significant only on the right side for P2 (in both Amplitude and Latency).Moreover, the P1 component revealed a larger amplitude for faces than bodies, and this difference was larger on the right side than on the left one.These results replicate previous results found for right-lateralization of face [59] and body processing [3], [60], especially on the N170 component.Lateralization of the P2 component showed that higher levels of holistic visual processing are mostly occurring in the right hemisphere.The lateralization of the inversion effect further corroborates this finding: if the configural processing of faces and bodies is preferentially located on the right side, also inversion (disrupting configural processing) should display larger effects in the right hemisphere.When considering the significant Stimulus * Side interaction effect, the P1 component showed very early increased selectivity for faces on the right side.This selectivity was generalised to both stimuli in later stages of processing (N170 and P2).Facial stimuli seem to be coded as a face and fed in the specialised processing pathway (in the right hemisphere) sooner than bodies.Therefore, the visual system seems to be more rapidly tuning on the face than body processing.

D. Explicit vs. Implicit Processing
Regarding the influence of the task, we expected larger effects of emotional expressions during the emotion discrimination task than during the gender discrimination task due to explicit vs. implicit processing of emotional information [48].Behavioural results showed that the emotion discrimination task had an overall lower Accuracy than gender discrimination.The Task * Inversion interaction effect demonstrated that the inversion effect is larger during the gender discrimination task.In the Stimulus * Task * Emotion (summarized in Table VII), the comparisons based on emotion revealed that most behavioural differences among emotions were found during the emotion discrimination task.
The main task difference showed that the emotion discrimination task was more difficult than the gender discrimination task: while gender needed only a two-choice response (male vs. female), emotion recognition required a two-choice response (neutral vs. emotional) followed by a four-choice response (discriminating among specific emotions) in the "emotional" case.Therefore, the chance level was much lower in the second case.Moreover, some studies investigating the difference between gender and emotion processing consistently found that participants showed lower Accuracy and slower RTs for emotion than for gender recognition [23], [61], showing that it is typically experienced as a more difficult task.
The larger behavioural inversion effect found during the gender discrimination task is a further demonstration that emotional expression processing may not require full configural processing (thus in favour of integrated theories), while gender recognition may require it since this task needs to process unchangeable aspects of the face in a full configuration [17].
Most of the differences among emotions were found during the emotion discrimination task when they were explicitly processed.This effect highlights the fact that probably, during gender discrimination, the emotional information was filtered by the visual system to process only invariant information related to gender through a top-down mechanism.On the other hand, the emotional information was explicitly processed during the other task, leading to the differences found in the Accuracy.
As far as electrophysiological results are concerned, no effects of the task were found on the P1 component, while the Stimulus * Task * Emotion interaction effect was found to be statistically significant on both N170 and P2 Amplitude (see Table VII).The comparisons based on emotions revealed that most significant differences in N170 (and in P2 for faces) were found during the gender discrimination task (and on both tasks in P2 for bodies).
The importance of the gender discrimination task to elicit differences among emotions in ERPs seems to be counterintuitive but may be interpreted in association with the behavioural results.According to integrated theories, the visual system processes emotional information through the subcortical pathway in an automatic and fast way.Indeed, a fundamental aspect of integrated theories [62] is the notion that emotional stimuli, particularly those perceived as threatening, possess the ability to instinctively capture focused attention.They take precedence in processing and prompt behavioural reactions regardless of the individual's ongoing objectives.When participants were requested to process the stimulus gender, a conflict was probably created between the incoming emotional information and the information required by the task.Therefore, a top-down mechanism filtering the automatically processed emotional information is required.N170 and P2 components may display larger differences in amplitude among emotions for this reason: their amplitude may reflect the cognitive effort needed to filter the task-irrelevant emotional information.Consequently, if a larger ERP component is associated with a specific emotion, it could mean that the emotional information interfering with gender discrimination was more salient.This explanation is coherent with the behavioural results, in which almost no differences among emotions were found during the gender discrimination task.This aspect should be investigated in further research.
Future directions may also include studying how ethnicity can affect these processes.Indeed, it was previously shown that manipulating stimuli's or participants' ethnicity can influence how faces are perceived [63], [64] V. CONCLUSION This study aimed to investigate the influence of inversion and different emotional expressions on the visual processing of faces and bodies, using both behavioural (Accuracy) and electrophysiological (ERPs) measures.The key findings were in support of integrated theories of visual perception of social stimuli: (i) behavioural and psychophysiological inversion effects for both faces and bodies were found as lower Accuracy and larger ERP amplitude for inverted stimuli; (ii) emotional expressions influenced the visual processing of both faces and bodies, i.e., faces showed specific effects of emotional expressions during the structural encoding stage (N170), while N170 amplitude discriminated approach and avoidance in body perception and specific emotions are encoded only through following top-down conceptualisation (P2); (iii) no interaction of these two effects.This result demonstrated the functional independence of the two neurocognitive pathways involved in social stimuli processing (subcortical and cortical), in accordance with integrated theories [18], [47].These pathways were proven to act during all stages of visual processing.
For the first time, this study investigated in depth how these two pathways are differently involved in all the diverse stages of face and body visual processing and tested their functional independence, to be corroborated in further neuroimaging studies.

Fig. 4 .
Fig. 4. Grand-averaged ERP activity in Experiment 1, averaged through the 10 occipito-temporal electrodes considered in the two ROIs.ERPs are elicited by (A) faces (black line) vs. bodies (red line), averaged through inversion, emotion and task variables; (B) upright (black line) vs. inverted (red line) stimuli, averaged through stimulus, emotion and task variables; (C) neutral (black line), happy (cyan line), sad (blue line), angry (red line) and fearful (green line) stimuli, averaged stimulus, inversion and task variables. of stimuli are presented for each category.

Fig. 5 .
Fig. 5. Plot representing the effects of emotional expression in faces (blue line) and bodies (red line) on participants' Accuracy in Experiment 1, divided by the task performed by participants (emotion or gender recognition).In this and following plots dots represent the mean value and error bars represent 95% confidence intervals.

Fig. 6 .
Fig. 6.Plot representing the effects of emotion in bodies (red line) and faces (blue line) on N170 in Experiment 1, divided by the task performed by participants (emotion or gender recognition).

Fig. 7 .
Fig. 7. Plot representing the effects of emotion in bodies (red line) and faces (blue line) on P2 in Experiment 1, divided by the task performed by participants (emotion or gender recognition).

Fig. 8 .
Fig. 8. Grand-averaged ERP activity elicited by upright (black line) and inverted (red line) faces (first panel), bodies (second panel) and houses (third panel) in Experiment 2, averaged through the 10 occipitotemporal electrodes considered in the two ROIs.Examples of stimuli are presented for each category.

TABLE IV EXPERIMENT
1 -ACCURACY -STIMULUS * TASK * EMOTION INTERACTION EFFECT -SIGNIFICANT POST-HOC PAIRWISE COMPARISONS