Joint Modulation of Facial Expression Processing by Contextual Congruency and Task Demands

Faces showing expressions of happiness or anger were presented together with sentences that described happiness-inducing or anger-inducing situations. Two main variables were manipulated: (i) congruency between contexts and expressions (congruent/incongruent) and (ii) the task assigned to the participant, discriminating the emotion shown by the target face (emotion task) or judging whether the expression shown by the face was congruent or not with the context (congruency task). Behavioral and electrophysiological results (event-related potentials (ERP)) showed that processing facial expressions was jointly influenced by congruency and task demands. ERP results revealed task effects at frontal sites, with larger positive amplitudes between 250–450 ms in the congruency task, reflecting the higher cognitive effort required by this task. Effects of congruency appeared at latencies and locations corresponding to the early posterior negativity (EPN) and late positive potential (LPP) components that have previously been found to be sensitive to emotion and affective congruency. The magnitude and spatial distribution of the congruency effects varied depending on the task and the target expression. These results are discussed in terms of the modulatory role of context on facial expression processing and the different mechanisms underlying the processing of expressions of positive and negative emotions.


Introduction
Facial expressions of emotion are highly relevant stimuli in human social interactions. Thus, it is not surprising that the cognitive and neural underpinnings of their processing have been the object of many studies. In our daily life, we rarely perceive facial expressions of our conspecifics isolated from any other stimuli. Instead, facial expressions are typically perceived in the context of specific social interactions that include many other relevant clues about the event itself and the person expressing the emotion. However, in most experimental studies facial expressions have been presented alone in the absence of any contextual information. In consequence, there is still limited evidence regarding the way in which contextual information modulates the processing of facial expressions of emotion.
In an attempt to reproduce the situated, contextual nature of social interactions, some recent studies have explored the impact of different types of contexts on the processing of facial expressions of emotion. In these studies, facial expression targets are presented in the context of intra-subject (e.g., prosody or body posture) or situational information (see [1,2] for reviews). In the present paper, 2 of 20 we concentrate on the possible modulatory role of situational contexts on facial expression processing, where situational context refers to the information provided by the life event or social encounter in which the expressive behavior takes place. For example, the smile of a friend who has just told us about a personal misfortune will be processed and interpreted in a different way than that same smile in a casual conversation.
Studies on contextual modulation have used contextual cues stimuli that can be affectively congruent or incongruent with the emotion expressed by a target face. Those stimuli include pictures with positive or negative valence [3][4][5], or sentences describing situations inducing specific emotions such as anger or joy [6][7][8]. The results of these studies have provided evidence of context influences on the processing of emotional faces that are apparent at both the behavioral and neural activity levels. For example, behavioral studies have reported slower and/or less accurate responses on trials in which the target and the context are emotionally incongruent [4,9]. Moreover, several studies using the event-related potential (ERP) technique have revealed modulations of neural activity that are dependent on the congruency between expression targets and picture or sentence contexts. These modulations have been observed at different post-stimulus onset times, beginning at early stages of perceptual face processing. This is the case for the face-sensitive N170 ERP component, a negative deflection that peaks around 170 ms over parieto-occipital sites that is considered the earliest reliable electrophysiological index of face encoding ( [10][11][12], see [13] for a review). Amplitude modulations of the N170 by the congruency between contexts and expression targets have been observed in several studies [3,5,14]. These context effects, together with the sensitivity of the N170 to emotional expression (see [15] for a meta-analysis and review), suggest that relatively complex, context-dependent affective processing of face stimuli occurs already at early perceptual stages.
Modulations driven by contextual congruency have also been shown on a later, post-perceptual ERP component-LPP (late positive potential). The LPP is a centro-parietal, sustained positive deflection that typically appears between 300 and 700 ms after stimulus onset and that shows increased amplitude in the presence of emotionally arousing relative to neutral stimuli (e.g., [16,17]). LPP is thought to reflect facilitated processing and encoding of relevant emotional stimuli, and thus it is not surprising that it has also been shown to be modulated by affective congruency. Enhanced positive amplitudes of LPP have been observed when a target face shows an emotional expression that is incongruent with the context in which it is presented [6,7] and in trials in which the prime and target stimuli are affectively incongruent in the affective priming paradigm [18][19][20].
A further, relatively early ERP component that is also sensitive to emotion is the EPN (early posterior negativity), a negativity detected over temporo-occipital sites that reaches maximal values around 300 ms after stimulus onset. This component has been found to respond to the emotional intensity of emotional pictures and words and is thought to reflect emotional facilitation of sensory processing (e.g., [21][22][23]). Moreover, in studies with facial expressions of emotion, enhanced amplitudes of the EPN are usually seen in response to angry and fearful faces compared to neutral and happy faces (e.g., [24][25][26]). In a broader sense, both the LPP and the EPN components have been considered to reflect relatively late, post-perceptual processes, aimed at enhancing continued encoding and processing motivationally relevant stimuli. As stated above, sensitivity of LPP to affective congruency has been shown in some studies. However, whether the EPN is also sensitive to affective congruency is still unknown.
Priming and context studies with facial expressions of emotion have usually assigned the participants tasks that do not require explicit attention to the context or prime stimuli, such as emotion categorization or evaluation in terms of valence and arousal. Congruency effects observed under these conditions suggest that the affective valence of contexts and prime stimuli is automatically activated and influences processing of the target expression by means of implicit mechanisms. However, there is evidence that task demands can modify the way in which affective primes and contexts influence processing of target stimuli. This has been shown in the affective priming paradigm in which responses to affective words or pictures are influenced by their affective congruency with Brain Sci. 2019, 9,116 3 of 20 an immediately preceding prime (see [27] for a review). Several studies have shown that the affective priming effect, with slower and/or less accurate responses to the target on affectively incongruent trials, is observed in an evaluative task in which the target has to be categorized in terms of its valence but is significantly reduced or completely abolished when the task requires categorizing the target on the basis of non-affective features (e.g., [28][29][30]). Also relevant are the results of a study [31] that showed that attention to neutral picture contexts was promoted by the instruction to identify the emotion shown by a target face, compared to the instruction to make a general affective evaluation. More specifically, the participants showed better memory of the contexts after the emotion identification task. A direct demonstration of the effects of task demands also came from a recent behavioral study [9]. This study showed that the influence of affective sentence contexts on the response to target expressions varied depending on the specific task assigned to the participant. While an evaluative task (whether the face expression is positive or negative) produced only a weak affective congruency effect, an emotion categorization task (being able to discriminate between angry, fearful, and happy expressions) and a congruency categorization task (is the face expression congruent or incongruent with the context) both led to emotional congruency effects in which responses were slower and/or less accurate on (incongruent) trials in which the context and the target represented different specific emotions (e.g., fear and anger). These effects were especially strong in the congruency categorization task, in which the participants had to categorize the target expression as congruent or incongruent with the situation described by the sentence context.
The evidence mentioned above suggests that different task instructions can selectively increase the saliency of different aspects of contextual information, thus modulating its influence on target processing. A main distinction would be between tasks that orient the participant to general affective properties of the target (evaluative tasks) and those that require attention to more specific affective properties (e.g., emotion categorization). An unsettled issue is the processing stage at which task demands exert their influence. While an effect of task demands at a given stage indicates that the underlying cognitive operation is subject to the influence of top-down processes, the absence of such an effect is more compatible with the operation of stimulus-driven processes.
Due to its high temporal resolution, the ERP technique is ideally suited to track precisely the dynamics of brain activity underlying different cognitive tasks. In the study of Diéguez-Risco et al. [6], no congruency effects between contexts and target faces were found on the N170 component during an emotion categorization task. However, in a similar study in which the task required one to explicitly pay attention to the congruency between the context and the facial expression, the N170 was modulated by congruency [7], with larger negative amplitudes to faces showing expressions that were contextually incongruent. Although no firm conclusions can be drawn based on between-experiment comparisons, this discrepancy suggests that top-down processes driven by task demands might play a relevant role in the processing of affective congruency and that this influence could take place at an early perceptual processing stage such as that indexed by the N170.
Using a within-subject design, the current study aimed at directly exploring the influence of the context on the processing of facial expressions of emotion under different task demands. EEG was recorded throughout the two tasks, and reaction times and accuracy were monitored. In each trial, participants were presented a context, consisting of a sentence describing an affectively positive or negative situation, displayed directly below a neutral face. The same individual face immediately followed this, expressing an emotion (joy or anger) as if reacting to the situation. In the emotion discrimination task, participants were simply asked to identify which emotion was expressed (joy or anger). In the congruency task, participants decided whether the emotion expressed by the face was congruent or incongruent with the situation described by the sentence. An important methodological aspect of this study was the monitoring of eye movements by the participants during their reading of each sentence, using an eye tracker. This ensured that, although the context was irrelevant for the emotion discrimination task, it was nonetheless read and thus, presumably, attended to. This aspect is important, given none of the prior studies in this field ever ensured that contextual sentences were read. Brain Sci. 2019, 9, 116 4 of 20 The comparison between the congruency and the emotion discrimination task allowed us to study the influence of top-down processes on emotional processing. Although in both task conditions efforts were made to ensure that the participants read the context sentence, explicit consideration of the relation between the context and the target face was required only in the congruency task. Thus, it may be assumed that contextual modulation of target processing would be driven by top-down processes only in the congruency task. As this task requires explicit consideration of the relation between the context and the target expression, congruency should have a maximal effect. Under these conditions, explicit and more effortful processing of the target expression in relation to its context should be revealed in stronger modulation of ERP components that are sensitive to the affective properties of stimuli and that have also been found to be modulated by affective congruency in previous studies. Based on this rationale, the amplitude of three ERP components-the N170, the EPN, and the LPP-was our main dependent variable.

Participants
Fifty-eight (58) undergraduate and graduate students from the University of Waterloo participated in this study for course credit or cash payment ($10/hour). All participants were fluent English speakers who reported normal or corrected-to-normal vision, and had lived in Canada and/or the USA for at least ten years. Participants did not have a history of psychological or neurological disorders, head trauma with loss of consciousness for more than five minutes, or epilepsy or seizures, and were not taking anti-psychotic medications or medications containing cortisone at the time of testing. This study was approved by a Human Research Ethics Board at the University of Waterloo (ORE #20113) and, in accordance with the Declaration of Helsinki, all participants provided informed written consent prior to starting the experiment.
Data for 22 participants were rejected due to eye-tracking difficulties (could not calibrate the eye tracker, 10), completion of only one of the two experimental tasks (2), equipment malfunction (1), too few trials per condition due to extensive artifacts in the EEG recording (5), too few trials per condition due to eye movements during emotional face presentation (1), or too few trials due to errors or no responses (3). This resulted in a final sample of 36 participants (mean age = 21.83 years, SD = 4.02; 15 men, 21 women; and 31 right-handed).

Stimuli
Greyscaled photographs of 10 models (five males, five females; models #: 02F, 07F, 08F, 09F, 10F, 21M, 23M, 25M, 26M, and 29M) from the NimStim face database [32] were used as target stimuli. Each model expressed both open-and closed-mouth expressions of anger and happiness, as well as a neutral expression, resulting in a total of 30 face images. Faces were cropped into ovals and placed on a grey background. These stimuli were the same as those used in Diéguez-Risco et al. [6]. Four additional face identities (each with neutral, happy, and angry expressions) were selected to be used in the practice phase. Images did not differ significantly in Root Mean Square (RMS) contrast (Mean RMS = 0.42, Standard Deviation SD = 0.008) or mean Pixel Intensity (PI) (Mean PI = 0.5, SD = 0.0004).
Twenty short sentences describing emotion-inducing daily situations were used as contextual sentences. Half of these sentences described joy-inducing situations (positive sentences), and the other half described anger-inducing situations (negative sentences). The sentences used in the current ERP study were selected from a larger set, based on the data of a pilot behavioural study. In this pilot study, 51 participants (Mean age = 19.9 years, SD age = 3.3; 8 men, 43 women) categorized 30 sentences as joy-inducing or anger-inducing. Participants were required to indicate how intense (positive or negative) this emotion would be felt by the person going through the described situation (intensity rating) and how exciting or stressful this emotion would be (arousal rating). The final set of selected sentences (Appendix A) were categorized as happy-inducing or angry-inducing by at least 70% of these participants. The final positive and negative sentences were not significantly different in intensity (t(18) = 0.11, p = 0.91, mean difference = 0.042) or arousal (t(18) = −0.67, p = 0.51, and mean difference = −0.291) ratings. Mean intensity (on a scale from 1-very low to 9-very high) was 6.91 (Standard Error of the Mean SEM = 0.31) for positive sentences and 6.87 (SEM = 0.21) for negative sentences. Mean emotional arousal (on a scale from 1-very low to 9-very high) was 6.57 (SEM = 0.34) and 6.86 (SEM = 0.25) for negative and positive sentences, respectively. Eight additional sentences were used for the practice phase.

Procedure
Participants sat in a sound-and electrically-attenuated Faraday cage, and were seated 70 cm from the display computer. All participants completed two different experimental tasks: an emotional categorization and a congruency categorization task, the order of which was counter-balanced across participants.
In both tasks, a fixation cross was presented first, followed by a neutral face presented in the center of the screen and a contextual sentence (either positive or negative) situated underneath ( Figure 1). Participants were told that, in each trial, the person shown had just gone through the situation described in the sentence, and that they should read each sentence carefully. The neutral face and contextual sentence remained on the screen until the participant pressed the spacebar to continue. This triggered the presentation of a central fixation cross, and participants were instructed to remain fixated on this cross in order to trigger the emotional face presentation. Once participants fixated on the cross for 300 ms, an emotional face (of the same individual), centered on the nasion, was presented for 250 ms. If participants did not fixate on the cross after ten seconds, the trial was aborted and a drift correction was performed before continuing to the next trial. Participants were instructed to stay focused on the nasion of the face (i.e., to not make eye movements during the emotional face presentation). A fixation cross was then presented in the centre of the screen for 2000 ms or until the participant made a behavioural response (whichever criterion was met first). In the emotion task, participants indicated whether the target face was happy or angry. In the congruency task, they judged whether the facial expression and sentence were congruent (i.e., whether the person expressed an emotion that would normally be appropriate given the situation described) or incongruent (i.e., the person expressed an emotion that would not normally be appropriate given the situation). Responses were recorded using a standard keyboard, and participants pressed the left and right arrow keys (counterbalanced across participants) with their right hand.
Brain Sci. 2019, 9,116 5 of 22 0.34) and 6.86 (SEM = 0.25) for negative and positive sentences, respectively. Eight additional sentences were used for the practice phase.

Procedure
Participants sat in a sound-and electrically-attenuated Faraday cage, and were seated 70 cm from the display computer. All participants completed two different experimental tasks: an emotional categorization and a congruency categorization task, the order of which was counterbalanced across participants.
In both tasks, a fixation cross was presented first, followed by a neutral face presented in the center of the screen and a contextual sentence (either positive or negative) situated underneath ( Figure 1). Participants were told that, in each trial, the person shown had just gone through the situation described in the sentence, and that they should read each sentence carefully. The neutral face and contextual sentence remained on the screen until the participant pressed the spacebar to continue. This triggered the presentation of a central fixation cross, and participants were instructed to remain fixated on this cross in order to trigger the emotional face presentation. Once participants fixated on the cross for 300 ms, an emotional face (of the same individual), centered on the nasion, was presented for 250 ms. If participants did not fixate on the cross after ten seconds, the trial was aborted and a drift correction was performed before continuing to the next trial. Participants were instructed to stay focused on the nasion of the face (i.e., to not make eye movements during the emotional face presentation). A fixation cross was then presented in the centre of the screen for 2000 ms or until the participant made a behavioural response (whichever criterion was met first). In the emotion task, participants indicated whether the target face was happy or angry. In the congruency task, they judged whether the facial expression and sentence were congruent (i.e., whether the person expressed an emotion that would normally be appropriate given the situation described) or incongruent (i.e., the person expressed an emotion that would not normally be appropriate given the situation). Responses were recorded using a standard keyboard, and participants pressed the left and right arrow keys (counterbalanced across participants) with their right hand. Figure 1. Exemplars of congruent and incongruent trials with angry and happy expressions (trials progression from left to right). In both tasks, a fixation cross was presented for 300 ms or 500 ms (jittered by 0-100 ms), followed by the presentation of a face with a neutral expression and a contextual sentence. After reading the sentence, participants pressed the spacebar, and a gazecontingent fixation cross appeared in the centre of the screen. Participants were required to fixate on this cross for 300 ms, triggering the presentation of an emotional (angry or happy) face for 250 ms. If participants did not fixate on the cross within 10 seconds, the trial was aborted and a drift correction was performed. Following emotional face presentation, a response screen with a central fixation cross Figure 1. Exemplars of congruent and incongruent trials with angry and happy expressions (trials progression from left to right). In both tasks, a fixation cross was presented for 300 ms or 500 ms (jittered by 0-100 ms), followed by the presentation of a face with a neutral expression and a contextual sentence. After reading the sentence, participants pressed the spacebar, and a gaze-contingent fixation cross appeared in the centre of the screen. Participants were required to fixate on this cross for 300 ms, triggering the presentation of an emotional (angry or happy) face for 250 ms. If participants did not fixate on the cross within 10 seconds, the trial was aborted and a drift correction was performed. Following emotional face presentation, a response screen with a central fixation cross was presented until participants made a response, or for a maximum of 2000 ms. For the emotion task, participants indicated if the face was happy or angry. For the congruency task, participants indicated if the sentence was emotionally congruent or incongruent with the emotional face. Note: the sentence was written on one line only-displayed here on two lines for display purposes; furthermore, the NimStim model displayed here was not used in the study but is one for which we have publishing agreement.
Each task consisted of 320 trials divided into four blocks. In each block, there were four main conditions: positive sentence-happy expression (happy congruent), negative sentence-angry expression (angry congruent), negative sentence-happy expression (happy incongruent), and positive sentence-angry expression (angry incongruent). Within each task, there were 80 trials across blocks for each of the four conditions. Within each block, each model's facial expression was displayed four times (two congruent trials, and two incongruent trials), and there was a total of 20 trials for each of the main four conditions (10 per gender). The sentence-model pairings were randomized. Prior to each experimental task, participants performed a practice phase that consisted of 16 trials.

Eye-Tracking and EEG Recordings
A desk-mounted remote SR Research EyeLink 1000 eye-tracker (SR Research, http://sr-research.com) sampling at 1000 Hz was used to monitor eye movements throughout the study. A nine-point automatic calibration was conducted at the beginning of each block, recording each participant's dominant eye (as determined by the Miles test; [33]). The dominant eye was tracked, but viewing was binocular.
EEG recordings were collected continuously at 512 Hz by an active-two Biosemi system at 72 recording sites: 66 channels in an electrode-cap under the 10/20 system-extended (the default 64 sites plus PO9/PO10 sites), two pairs of electrodes situated on the outer canthi and infra-orbital ridges (to monitor horizontal and vertical eye movements), and one additional pair situated over the mastoids. A common mode sense (CMS) active electrode and a driven right leg (DRL) passive electrode acted as a ground during recording.

Data Processing
Only correct trials with reaction times ±2.5 SD of each participant's response mean were analyzed. Trials in which participants did not read the contextual sentence were rejected. Specifically, trials with fewer than two fixations within a pre-determined 14.65 • × 2.45 • sentence region of interest (ROI) were automatically detected, and visually inspected. If participants did not make at least one fixation towards an affectively-valenced word within the sentence, the trial was rejected. This criterion was used because only a small number of sentences were used in this study. Thus, upon continuous repetition participants were able to determine the nature of the sentence by scanning key affective words. Trials in which it was clear that the sentence had been read, but fixations fell outside of the pre-determined sentence ROI due to eye-recording drift, were kept. Furthermore, during the emotional face presentation, participants were required to maintain fixation on the nasion of the face. Any trials in which eye movements were made outside of a 1.37 • diameter ROI centered on the nasion were automatically rejected.
All EEG data was processed offline using EEGLab version 13_6_5b [34] and ERPLab (http: //erpinfo.org/erplab) toolboxes in MATLAB version 2014. Recordings were average-referenced offline and synchronized with the eye-tracking recordings. The data were further band-pass filtered (0.01-30 Hz) and epoched into time segments of −100 ms to 800 ms around the onset of the emotional face. Trials with artifacts above or below ±70 µV were automatically detected and removed.
Independent component analysis was also conducted for 24 participants in order to remove blink-and eye-movement-contaminated artefacts, and one participant underwent manual cleaning in order to remove remaining artifacts. The average number of trials per condition after pre-processing was 54.1 (SEM = 11.9) and did not vary significantly between the eight conditions (F(7,245) = 1.25, p = 0.29; Table 1).

Data Analysis
For each task, the percentage of correct responses was calculated using the trials in which sentences were read and the emotional face was correctly fixated. A response was deemed correct if the correct button press was made and if the Response Time (RT) was more than 150 ms (to avoid anticipatory responses) and less than 2000 ms. Mean response times were calculated for these correct responses, with the additional constraint that RTs longer than 2.5 SD from the overall mean of each participants were rejected.
Each participant's average ERP waveforms were individually inspected in order to determine the electrode within each hemisphere for which the N170 was maximal for all conditions (see also [25,[35][36][37][38][39]). The N170 was maximal at different electrodes across participants ( Table 2), but maximal at the same electrodes across conditions. The N170 peak amplitude was measured bilaterally at the electrodes specified in Table 2 between 120 ms and 200 ms post-stimulus onset. The mean N170 amplitude was also extracted within each hemisphere ±10 ms of the participant's average peak latency for all conditions within each hemisphere. For the EPN, mean amplitudes were calculated between 150-250 ms and 250-350 ms at electrodes P9, P10, PO9, and PO10, where it is classically measured (e.g., [23][24][25][26]36,37]). This choice was based on the fact that, while most studies have measured the EPN between 220-350 ms, others typically measure the EPN between 150 and 300 ms (e.g. [26]), and our recent studies suggested that the largest emotion effect was actually found before 200 ms although after the N170 [25,36,37]. We thus decided to measure the EPN at two separate time windows to better capture its time-course.
Finally, mean amplitudes were computed at two cluster locations, based on inspection of the data and on previously published studies focusing on the LPP: a frontal cluster (AF3, AFz, AF4, F1, Fz, and F2 sites) and a centro-parietal cluster (C1, Cz, C2, CP1, CPz, CP2 and P1, Pz, and P2). Mean amplitudes Brain Sci. 2019, 9, 116 8 of 20 were extracted at these electrodes between 250-350 ms and 350-450 ms (given the RT pattern, neural activity beyond 450ms would certainly be contaminated by motor preparation artifacts). Because the LPP is classically measured between 400 and 600 ms, we prefer to refer here to a LPP-like component due to its similar scalp distribution but different timing of occurrence. The choice of two time windows was driven both by visual inspection of the data and by prior existing literature in other domains suggesting an early and a late LPP occurring at different timing (e.g., [40,41]).
Separate repeated measures analyses of variance (ANOVAs) were conducted for the percentage of correct responses, mean RT, as well as for each ERP component individually for each time window. Within-subject factors included (2) task, (2) face emotion, and (2) congruency (with preceding sentence), for all analyses. For N170 and EPN, additional factors were hemisphere (2) and electrodes (2), and for the LPP, an additional factor was clusters (2: frontal, Centro-parietal). Finally, for both EPN and LPP, an additional factor was time (2). All ANOVAs used greenhouse-Geisser adjusted degrees of freedom when the Mauchly's test of sphericity was significant. Follow-up ANOVAs were conducted when three or four-way interactions were found. Pairwise comparisons were Bonferroni-corrected.

Behavioural Results
For the behavioural analyses, data from three participants were lost, resulting in a final sample of N = 33.

Percentage of Correct Responses
As can be seen on Figure 2A

Mean Response Times (ms)
As can be seen on Figure 2B, participants tended to respond slower in the congruency than in the emotion task (main effect of task, F(1,32) = 3.91, p = 0.057, MSE = 786089.1, and ηp 2 = 0.109), but were also a lot more variable in the congruency task, as reflected by the much larger standard errors for this task. Percentage of correct responses (a) and mean response times (b) for each face emotion (happy or angry) and congruency of that emotion with the preceding sentence, for each task. Condition means (with standard error (SE) of the mean in parenthesis) are reported above each bar. A small effect of congruency was found for angry faces in the congruency task for correct responses and an effect of congruency was found on Response Times (RTs) for happy faces in the congruency task.

Mean Response Times (ms)
As can be seen on Figure 2B, participants tended to respond slower in the congruency than in the emotion task (main effect of task, F(1,32) =3.91, p = 0.057, MSE = 786089.1, and ηp² = 0.109), but were also a lot more variable in the congruency task, as reflected by the much larger standard errors for this task. Responses were also overall faster for happy than angry faces (main effect of emotion Percentage of correct responses (a) and mean response times (b) for each face emotion (happy or angry) and congruency of that emotion with the preceding sentence, for each task. Condition means (with standard error (SE) of the mean in parenthesis) are reported above each bar. A small effect of congruency was found for angry faces in the congruency task for correct responses and an effect of congruency was found on Response Times (RTs) for happy faces in the congruency task.

N170 Component
The N170 could not be clearly identified for one participant, leaving N = 35 for this analysis. For the N170 peak amplitude, only the task by congruency by hemisphere interaction was significant (F(1,34) = 5.71, p = 0.023, MSE = 0.503, and ηp 2 = 0.144). When each task was analyzed separately, the congruency by hemisphere interaction was significant only for the emotion task (F(1,34) = 4.65, p = 0.038, MSE = 0.925, and ηp 2 = 0.12), reflecting a trend for an effect of congruency on the left hemisphere. However, no further post-hoc ANOVA or t-tests were significant. Similarly, when the mean N170 amplitude was used (calculated within ±10 ms around the peak of each participant), only the task by congruency by hemisphere interaction was significant (F(1,34) = 5.67, p = 0.023, MSE = 0.637, and ηp 2 = 0.143) but the analysis of each task separately did not reveal any significant effect.
Between 150 and 250ms, the analysis of the mean amplitudes at posterior sites revealed a main effect of face emotion (F(1,35) = 34.58, p < 0.0001, MSE = 3.33, and ηp 2 = 0.497), with more negative amplitudes for angry than happy faces, reflecting a classic EPN (Figure 3). A main effect of electrode (F(1,35) = 84.12, p = 0.0001, MSE = 10.23, and ηp 2 = 0.706) was due to more negative amplitudes at P9/10 compared to PO9/PO10 electrodes. We also found weak interactions of congruency by electrode For the N170 peak amplitude, only the task by congruency by hemisphere interaction was significant (F(1,34) = 5.71, p = 0.023, MSE = 0.503, and ηp² = 0.144). When each task was analyzed separately, the congruency by hemisphere interaction was significant only for the emotion task (F(1,34) = 4.65, p = 0.038, MSE = 0.925, and ηp² = 0.12), reflecting a trend for an effect of congruency on the left hemisphere. However, no further post-hoc ANOVA or t-tests were significant. Similarly, when the mean N170 amplitude was used (calculated within ±10 ms around the peak of each participant), only the task by congruency by hemisphere interaction was significant (F(1,34) = 5.67, p = 0.023, MSE = 0.637, and ηp² =0.143) but the analysis of each task separately did not reveal any significant effect.

EPN Component
The 2 (Time) × 2 (Task) × 2 (Emotion) × 2 (congruency) × 2 (electrodes) × 2 (Hemisphere) omnibus ANOVA revealed interactions between time and emotion (F(1,35) Table 3. We reported only the important main effects and interactions for clarity.   Table 3. We reported only the important main effects and interactions for clarity. A main effect of task was seen between 350 and 450 ms, but the task by cluster interaction was seen from 250 to 450 ms (Table 3a,d). This interaction was due to the task effect being significant only at frontal sites between 250 and 450 ms (and more strongly between 350 and 450 ms), with less negative amplitudes for the congruency than the emotion task ( Figure 5). A main effect of task was seen between 350 and 450 ms, but the task by cluster interaction was seen from 250 to 450 ms (Table 3a,d). This interaction was due to the task effect being significant only at frontal sites between 250 and 450 ms (and more strongly between 350 and 450 ms), with less negative amplitudes for the congruency than the emotion task ( Figure 5). Table 3. Statistical results obtained for the analyses on mean amplitudes calculated over two time windows at two clusters (frontal and centro-parietal clusters).   Importantly, the effect of congruency varied with task and emotion but in a seemingly independent way, as no interaction involving the three factors of task, emotion, and congruency was ever found in any analysis. We report in turn the interactions between congruency and task, and the interactions between congruency and emotion.
The effect of congruency varied depending on the task involved (Table 3c,f,g). Between 250 and 350 ms, the main effect of congruency was due to more positive amplitudes for congruent than incongruent conditions (with reversed-polarity effects at posterior sites, as described above, see Figure 4), and was most pronounced for the congruency task, although statistically significant in both tasks (Table 3e). However, between 350 and 450 ms, this congruency effect interacted with cluster and tasks so analyses were run separately for each cluster (Table 3g). At frontal sites, the task by congruency interaction was due to an opposite effect of congruency depending on the task. In the emotion task, the same effect was seen as previously, with more positive amplitudes for congruent than incongruent conditions, while in the congruency task, the opposite was found, with slightly more positive amplitudes for incongruent compared to congruent conditions ( Figure 6). This latter effect was due to the fact that the congruency effect in the congruency task was seen at centro-parietal sites, and was simply in opposite direction at fronto-polar sites (clearly visible on the topographic map in Figure 6 for the 350-450 ms window). At centro-parietal sites, there was no more congruency effect in the emotion task.
Brain Sci. 2019, 9,116 12 of 22 Figure 5. Group grand averaged waveforms for each task (averaged across emotion and congruency). The effect of task was mainly seen at frontal sites (exemplified by Fz here), with larger amplitudes for the congruency task compared to the emotion task, between 250 and 450 ms. The topographic map shows the task effect (congruent minus emotion task) between 250 and 450 ms. Table 3. Statistical results obtained for the analyses on mean amplitudes calculated over two time windows at two clusters (frontal and centro-parietal clusters).  The effect of task was mainly seen at frontal sites (exemplified by Fz here), with larger amplitudes for the congruency task compared to the emotion task, between 250 and 450 ms. The topographic map shows the task effect (congruent minus emotion task) between 250 and 450 ms.

Statistical Effects
Thus, overall, the congruency effect was seen between 250 and 450 ms and was larger for the congruency than the emotion task ( Figure 6 topographic maps). This effect seemed frontally distributed in the emotion task. In the congruency task, the congruency effect was initially seen at frontal, central, and parietal sites and then was more parietally distributed after 350 ms. For both tasks, the effect reflected more positive amplitudes for congruent than incongruent conditions. The congruency effect also varied with the facial expression seen (Figure 7). The emotion by congruency by cluster interaction was significant from 250 to 450 ms (Table 3i), so the analyses were performed for each cluster separately. At frontal sites, a congruency effect was seen for angry expressions. This effect was largest and clearest between 250 and 350 ms, with larger amplitudes for congruent than incongruent conditions (clearly seen on the topographic maps in Figure 7, Table 3i). The congruency effect at frontal sites was insignificant for happy expressions, except weakly during 350-450 ms. At centro-parietal sites, however, the opposite was found, with a clear congruency effect for happy faces from 250 to 450 ms, and no congruency effect for angry faces in any time window. Thus, a different topographical distribution of the congruency effect was seen between happy and angry faces. The congruency effect also varied with the facial expression seen (Figure 7). The emotion by congruency by cluster interaction was significant from 250 to 450 ms (Table 3i), so the analyses were performed for each cluster separately. At frontal sites, a congruency effect was seen for angry expressions. This effect was largest and clearest between 250 and 350 ms, with larger amplitudes for congruent than incongruent conditions (clearly seen on the topographic maps in Figure 7, Table 3i). The congruency effect at frontal sites was insignificant for happy expressions, except weakly during 350-450 ms. At centro-parietal sites, however, the opposite was found, with a clear congruency effect for happy faces from 250 to 450 ms, and no congruency effect for angry faces in any time window. Thus, a different topographical distribution of the congruency effect was seen between happy and angry faces.

Discussion
This study investigated the effect of congruency between contexts and target facial expressions of emotion under different task demands. The context was situational and presented as a sentence paired with a neutral face, followed by the same face displaying an expression as if reacting to the context. Importantly, eye tracking ensured that every sentence was read in each trial. The two main manipulations of the present within-subject design were the task condition (discriminating the emotion shown by the face or judging its congruency with the context) and the trial condition (congruent vs incongruent). Behavioral results showed significantly better performance in the emotion task, compared to the congruency task, in terms of response accuracy. Moreover, task comparisons showed an interaction between emotion and congruency only in the congruency task. In this task, a small but significant difference in accuracy was shown in the responses to angry faces that tended to be more accurate in incongruent trials. With respect to reaction times, responses to happy faces were significantly faster in congruent than in incongruent trials for that task. These behavioral results are consistent with our prediction that explicit attention to the congruency between contexts and facial expressions increases its impact on the way the expression is processed and responded to.
Significant effects of congruency and task condition were observed in different ERP components. A main effect of task was clearly manifest at frontal sites between 250 and 450 ms, with more positive amplitudes for the congruency than the emotion task ( Figure 5). This difference probably reflects the more elaborate computations required by the congruency task and is consistent with the poorer

Discussion
This study investigated the effect of congruency between contexts and target facial expressions of emotion under different task demands. The context was situational and presented as a sentence paired with a neutral face, followed by the same face displaying an expression as if reacting to the context. Importantly, eye tracking ensured that every sentence was read in each trial. The two main manipulations of the present within-subject design were the task condition (discriminating the emotion shown by the face or judging its congruency with the context) and the trial condition (congruent vs. incongruent). Behavioral results showed significantly better performance in the emotion task, compared to the congruency task, in terms of response accuracy. Moreover, task comparisons showed an interaction between emotion and congruency only in the congruency task. In this task, a small but significant difference in accuracy was shown in the responses to angry faces that tended to be more accurate in incongruent trials. With respect to reaction times, responses to happy faces were significantly faster in congruent than in incongruent trials for that task. These behavioral results are consistent with our prediction that explicit attention to the congruency between contexts and facial expressions increases its impact on the way the expression is processed and responded to.
Significant effects of congruency and task condition were observed in different ERP components. A main effect of task was clearly manifest at frontal sites between 250 and 450 ms, with more positive amplitudes for the congruency than the emotion task ( Figure 5). This difference probably reflects the more elaborate computations required by the congruency task and is consistent with the poorer performance observed in this task in terms of accuracy and response speed. Responding under this condition requires activation of conceptual emotional knowledge that refer to the behaviors that are expected in different emotion-relevant contexts. Among these are the facial expressions that are appropriate or more frequently observed in someone who experiences a specific emotional event. In order to judge the congruency of a target expression, the participant first needs to identify the expressed emotion while she keeps contextual information active in working memory and then integrate both pieces of information. Thus, explicit consideration of how the target expression and the context are related is required. In contrast, the emotion discrimination task only requires taking into account the information provided by the target face and identifying it as expressing joy or anger. These effects of task demands are consistent with the results of other studies that have reported sensitivity of the LPP to cognitive effort and task difficulty (e.g., [42,43]). Of special relevance are studies that have revealed specific modulation of the frontal component of the LPP by effortful emotion regulation strategies [44][45][46]. For example, in the study by Shafir et al. [46], reappraisal compared to distraction strategies was associated with larger frontal LPP amplitudes. This effect was observed in the presence of negative stimuli of high emotional intensity, but not of stimuli of low intensity, that is, in the situation that required superior cognitive effort. Although reappraisal and congruency judgment require different computations, both coincide in requiring cognitive effort and explicit consideration of emotionally relevant information.
Effects of congruency appeared at latencies and locations corresponding to the EPN and LPP components that have previously been found to be sensitive to emotion and affective congruency. However, the absence of such an effect on earlier perceptual components suggests that in the present study early visual processing of target faces was not modulated by the immediately preceding context. Only a trend for a congruency effect restricted to the emotion task and localized in the left hemisphere was found on the N170, a component associated with the early stages of face processing. This is in contrast to previous studies that have revealed congruency effects on this component [3,5,7]. Given the similarities between the procedure employed in the present study and in those by Dieguez-Risco et al. [6,7], one would have expected to find similar results. However, there were also some methodological differences that might have contributed to this discrepancy. One factor of potential importance is the strict control of eye gaze in the present experiment by means of an eye-tracker, which was implemented given the growing literature suggesting modulation of the N170 component with fixation location [35][36][37][38][39]47]. This procedure ensured that differences in gaze position would not modulate the processing of the target faces. Under such circumstances, it appears that early perceptual processes as indexed by the N170 are not influenced by the preceding context, a result that will need to be replicated.
The overall effect of congruency was observed with latency and localization corresponding to the EPN component (250-350 ms at posterior sites), but was seen polarity-reversed at frontal and centro-parietal sites under the form of an LPP-like component (Figure 4). At those sites, larger amplitudes were seen for congruent than incongruent conditions. Importantly, this effect of congruency between contexts and target faces was independently modulated by the task condition and by the emotional expression.
First, the magnitude and spatial distribution of the congruency effect varied depending on task demands. As predicted, the largest effect of congruency was observed in the congruency task, with a predominantly centro-parietal distribution that was seen polarity-reversed at fronto-polar sites (clearly seen on Figure 6 topographic maps between 350 and 450 ms). In contrast, the congruency effect in the emotion discrimination task was only frontally distributed and much weaker. This difference is consistent with the behavioral results previously discussed, which revealed significant congruency effects only in the congruency task. This finding is consistent with our hypothesis that congruency should have a larger effect precisely in the congruency task that requires explicit consideration of the relation between the expression shown by the target face and the situational context in which it is perceived. The demands of that task lead to the operation of top-down processes that involve deliberate access and use of conceptual knowledge referred to by specific emotions and the reactions usually associated with them. The fact that congruency effects were also obtained in the emotion task (albeit to a smaller degree) means that situational contexts influence processing of facial expressions, even when contextual information is not relevant for the task at hand. However, the different topographical distributions of these effects suggest different underlying generators and time courses for the congruency effect, and thus the operation of different underlying mechanisms, depending on the task demands. Overall, the LPP-like results revealed a dynamic, complex picture that suggests different processing operations driven by the different demands imposed by the emotion discrimination and congruency tasks. In other words, processing of the target expression was modulated by the context in different ways depending on the specific task at hand.
The timing and spatial distribution of the congruency effects were also different for angry and happy target faces. A congruency effect was seen for angry faces at frontal sites during the LPP-like component. However, the corresponding congruency effect showed a centro-parietal distribution in the case of happy faces. If the frontal LPP-like component is an index of effortful processing operations, then it might be concluded that processing the contextual congruency of an emotional expression is a more demanding task for angry than for happy faces. The different distribution and timing of these congruency effects again suggest different underlying generators depending on the face expression seen.
The suggestion that context congruency judgments require more cognitive effort in the case of angry, compared to happy, faces is also consistent with the behavioral results that showed a significant effect for angry faces in the congruency task. More specifically, judgment accuracy was higher in angry faces incongruent trials. Similar results have been reported before [7,9], showing more accurate and/or faster responses to angry faces in happiness than in anger contexts. This result, which might seem counterintuitive at first sight, can be understood in terms of the different specificities of the facial expressions of positive and negative emotions. According to this view, judging if an angry expression is contextually appropriate would be an especially difficult task because it involves discriminating between a variety of negative emotions and their corresponding situational and expressive characteristics. In contrast, the fact that a smiling face is appropriate in different happy situations would make the corresponding congruency judgment an easier task. Based on this rationale, a double-check hypothesis has been proposed according to which processing facial expressions of emotion involves a sequential check of valence and emotion category [7,9,14]. According to this hypothesis, a valence check would suffice to judge the contextual congruency of a smiling face (a smiling face is incongruent in any negative context but can be congruent in many positive contexts). However, judging the congruency of an angry face would further require an emotion category check (although affectively negative, an angry face can be incongruent with sadness or fear contexts, for example).
Of secondary importance, the results corresponding to the EPN component showed the significant effect of emotion, with larger negative amplitudes for angry than happy face targets. This effect replicates the results of previous studies that have reported specific sensitivity of this component to faces showing angry compared to happy expressions [24,26]. Interestingly, this emotion effect was found only between 150 and 250 ms, with a peak seen before 200 ms (Figure 3), a result consistent with recent studies investigating fearful expressions [25,36,37].
A few limitations to this study must be acknowledged. A fairly small number of sentences and individual faces were used and repeated numerous times across the course of the study, which might have elicited fatigue effects in participants and potentially diminished the contextual effects recorded. The gaze-contingent procedure also introduces variability in duration between the end of the sentence reading and the onset of the facial expression, which, in addition to individual variability in reading speed and sentence length and variability, might also influence the contextual effects. These factors, along with the intensity of the emotional expressions seen, might modulate the contextual effects and should be investigated in future studies.

Conclusions
The results reported in the present study confirm the modulatory role of situational contexts on the processing of target facial expressions of emotion. These effects were especially prominent in the LPP-like component that is specifically modulated by emotional content and has been previously found to be sensitive to affective congruency [7,[17][18][19]. Moreover, our results provide new evidence that this effect of congruency is significantly modulated by task demands. More specifically, larger effects of context-target congruency were observed in an explicit task that required the participants to judge the congruency of a target expression with an immediately preceding context. Smaller congruency effects were also observed in an emotion discrimination task. The different properties of the congruency effects obtained in each of these tasks suggest that they are mediated by different mechanisms and depend on the operation of different neural systems. We propose that while congruency acts via implicit, automatic mechanisms in the case of the emotion task, those observed in the congruency task are based on top-down, deliberate processes that rely on conceptual knowledge about the reactions that are expected from a person that goes through different emotional experiences.