Deliberate control over facial expressions in motherhood. Evidence from a Stroop-like task

The deliberate control of facial expressions is an important ability in human interactions, in particular for mothers with prelinguistic infants. Because research on this topic is still scarce, we investigated the control over facial expressions in a Stroop-like paradigm. Mothers of 2 – 6 months old infants and nullipara women produced smiles and frowns in response to verbal commands written on distractor faces of adults or infants showing expressions of happiness or anger/distress. Analyses of video recordings with a machine classifier for facial expression revealed pronounced effects of congruency between the expressions required by the participants and those displayed by the face stimuli on the onset latencies of the deliberate facial expressions. With adult dis- tractor faces this Stroop effect was similar whether participants smiled or frowned. With infant distractor faces mothers and non-mothers showed indistinguishable Stroop effects on smile responses; however, for frown re- sponses, the Stroop effect in mothers was smaller than in non-mothers. We suggest that for frown responses in mothers when facing infants, the effect of mimicry or stimulus response compatibility, leading to the Stroop effect, is offset by a caregiving response or empathy.


Introduction
The integration of emotion and cognitive control can be adaptative by guiding behavior in conflict situations (e.g., Baumeister et al., 2007;Gray, 2004). This integration may help to resolve awkward moments in social situations, for example, in deciding whether to respond to a provocation with a frown, showing offense, or with a polite smile, avoiding conflict. Controlling and adapting facial expressions according to the situation is essential in our daily social life and can be seen as a strategy for emotion regulation. Emotion regulation strategies often aim at controlling bodily emotional responses, such as facial expressions (Koole, 2009). This form of control over facial expressions is especially important when mothers interact with their infants as they often have to display facial expressions that differ from the infant's expression (e.g., Karreman & Riem, 2020;Needham et al., 2017). In situations where an infant is distressed, the caregiving system is automatically activated, whereby mothers employ a range of behaviours to soothe their infants. These maternal behaviours, opposite to their infant's facial and bodily expressivity of distress, include smiling, increasing proximity and responding to their signals (Mizugaki et al., 2015;Pechtel et al., 2013). The present study investigated the control over facial expressions in two groups of women (mothers and non-mothers) who saw pictures of infants and adults showing facial expressions of positive and negative affect. Investigating the differences in facial expressivity between mothers and non-mothers in response to infants and adults should advance our understanding how mothers regulate their emotional expressions during non-verbal mother-infant interactions, which, in turn might impact parenting style. Hofmann et al. (2012) identified several cognitive capacities closely related to emotion regulation strategies. For example, working memory is involved in the capacity to maintain active representations of goals, the top-down (high order) control of attention to affective stimuli, and in the downregulation of unwanted affect. Also, behavioral inhibition is involved in suppressing prepotent responses. Top-down control of attention and behavioral inhibition is often investigated with the Stroop task. In the classic version of this task, participants have to name the ink color of written words. When word meaning and ink color are incongruent (e.g., the word green written in blue color), reaction times (RTs) and error rates increase relative to combinations of ink color and letter strings without meaning (e.g., XXXX) or unrelated with the task (house) or when word meaning and ink color are congruent (Stroop, 1935). Although word meaning is irrelevant to execute this task, it affects performance because word reading and lexico-semantic access are automatic and hard to inhibit (e.g., MacLeod, 1991;Zahedi et al., 2019).

Mimicry as a source of interference for facial expression control
A number of studies investigated the interference created by pictures of emotional facial expressions when the relevant task required posing facial expressions. Lee et al. (2008) presented stimuli showing happy and angry faces, followed by an auditory stimulus indicating whether participants should produce the same or the opposite expression than the one shown in the picture. RTs of facial expressions, as measured by electromyography (EMG), were slower when the requested expression was incongruent with the expression of the stimulus face. Similar interference occurred in a Simon-type task when participants responded with facial expressions to the gender of individuals depicted on a screen, expressing emotions that were congruent or incongruent with the one required by the task (Otte, Habel, et al., 2011); a similar effect was observed in a dual-task context (Otte, Jost, et al., 2011).
What is the source of interference observed in the production of facial expressions when incongruent but task-irrelevant facial expressions are seen? In general, the mere observation of an action evokes a tendency to produce the same action, that is, mimicry (Cracco et al., 2018). The interference created by pictures of facial expressions in the production of incongruent expressions may be explained in terms of stimulus-response compatibility (e.g., Otte et al., 2011a) of the required facial response and the mimicry of the facial expression of the stimulus. Facial mimicry may occur reflexively (e.g., Dimberg et al., 2000) and is difficult to inhibit (Korb et al., 2010). Therefore, facial mimicry is a possible source of interference in the Stroop task when facial expressions are used as task-irrelevant distractors.

Impact of valence and affiliative intent on mimicry
Whereas the goal of many actions is a physical transformation of the environment, such as reaching a spatial target, the purpose of speechrelated movements and most facial expressions is communication. Fischer (2013, 2014) argued that facial mimicry takes place at the emotion level rather than the motor level, and that it is strongly modulated by the valence of the observed facial expressions and the social context. Thus, facial expressions of positive affect like smiles are more likely to be mimicked due to their intrinsic affiliative intent, whereas mimicry of negative expressions like anger is contextdependent. In line with this view, in a task with communicative aims, Künecke et al. (2017) found that facial mimicry during face-to-face interactions was most pronounced for perceived smiles, followed by expressions of sadness, but absent for anger expressions.
Several studies indicate differential top-down control of facial mimicry depending on attitudes towards the expresser, including expresser-observer relationship (Kraaijenvanger et al., 2017), or whether the depicted individuals are associated with winning or losing money (Sims et al., 2014). Facial mimicry of smiles is less pronounced for disliked than liked persons (Korb et al., 2019). Also, individuals scoring high on trait empathy tended to show more mimicry to facial expressions of happiness and anger (e.g., Dimberg & Thunberg, 2012;Sonnby-Borgström, 2002). Other variables affecting mimicry are the mood of the observer (Likowski et al., 2011), and the availability of proprioceptive feedback from facial muscles (e.g. Finzi & Rosenthal, 2016). These findings indicate that the cognitive system engages topdown control of facial mimicry depending on the social context, the commutative intention, and dispositional traits.

Empathy and caregiving responses to infants
For caregivers, understanding an infant's non-verbal signals, and appropriately regulating their facial expressions, are part of adaptive behavior because infants depend on their caregivers for survival but their communicative signals mainly involve only basic vocalizations, gestures, and facial expressions (e.g., Parsons et al., 2019). Certain facial and bodily features of infants (e.g., large eyes, plump cheeks), conforming to the Kindchenschema, are believed to automatically release parenting or caregiving behavior (Lorenz, 1943). For example, adults tend to prefer and spend more time on looking at pictures of infants compared with adults (Parsons et al., 2011). Newborns are able to smile and cry at birth (e.g., Messinger, 2002). Both, the "cuteness" of infants smiling and the frustration of seeing them in distress, impact brain systems involved in emotional and cognitive control, resulting in attentional bias and contingent responsiveness towards infants, triggering empathy and compassion, and enhancing higher-order cognitive functions like appraisal (e.g., Endendijk et al., 2018;Kringelbach et al., 2016).
Hence, both emotional responses as well as the degree of top-down control over emotion appear to differ when we observe emotional expressions in adults versus infants (e.g., Young et al., 2017). Smiles of adults or infants should induce affiliative motives and approach; in contrast, facial expressions of negative affect (e.g., anger, fear) shown by adults or infants should elicit very different reactions in the observer. Observing anger in other adults, would be mimicked if, for example, two adults share a common enemy (e.g., Hess & Fischer, 2013), but in most cases will not engage mimicry (Künecke et al., 2017). Crying and facial expressions of distress by infants are very salient for caregivers, as they signal threats to the infant's well-being (e.g., Hampson et al., 2006), motivating the provision of care and maintaining or restoring the infant's well-being, for example, by smiling, uttering comforting sounds, or nursing. In line with this view, adults increased their efforts in keypressing tasks after hearing infants cry (Parsons et al., 2012), and after viewing infant faces (Parsons et al., 2011).

Motherhood status
It has been established that maternal brains undergo structural and functional alterations (Kim et al., 2010) and the parental experience changes neural and emotion regulation processes (Hayashi et al., 2018). Alongside many neuroendocrinal and emotional changes that impact the development of parental abilities and modify the brain systems underlying social and parental behavior (Young et al., 2017), fMRI data has indicated enhanced neural responses in mothers in a region of the brain, known as the "maternal neural network", responsible for attention, emotion and regulation (Bjertrup et al., 2019;Swain et al., 2008). Since some of these neuroendocrinal changes are associated with improved emotion perception and enhanced neural responses, a common assumption is that the motherhood experience is associated with increased motivation for caregiving and stronger attentional bias towards infant's signals, increasing maternal accuracy in identifying infant emotional expressions as an important factor in securing motherinfant attachment (Bernstein et al., 2014).
The mothers' responsiveness to her infant's distress would seem to be crucial for its survival and socio-emotional development (Hahn-Holbrook et al., 2011). However, empirical studies have yielded mixed results; some indicate that, compared with nullipara, mothers tend to automatically direct their attention to infants' emotional signals (like infant cry), show greater empathic responses and increased arousal to emotional infant faces (e.g., Proverbio et al., 2006;Stallings et al., 2001), and tend to perceive infant cries as less disturbing and less aversive (Irwin, 2003). In contrast, other studies did not report differences associated with parenthood status in motivation to caregiving (Feldman, 2015), cardiac reactivity to infant cries (Hall & Morsbach, 1989), speed and accuracy of intentional movements to infant cries (Parsons et al., 2012), ratings of distress in infant faces (Irwin, 2003), and sensitivity to emotion recognition in infant or adult faces (Parsons et al., 2019). Further evidence suggests that mother's increased likelihood to mimic their own infant's facial expressions during motherinfant interactions could be an important visual feedback for the infant, linked to healthy development (Leerkes et al., 2009), infant facial mimicry (de Klerk et al., 2019) and perceptual-motor couplings for perceptually opaque actions that cannot be directly observed by the infant (Heyes, 2010;Ray & Heyes, 2011). The uniqueness of parenthood may bring a general advantage in terms of emotion regulation that may influence the mimicry of infant facial expressions. More particularly, mothers may have more clear goals and strategies as caregivers when confronted with infants' emotional expressions, and in turn, achieve better in downregulating unwanted emotion and expressive suppression. We are not aware of any study that investigated the possible impact of motherhood status on the cognitive control over facial expressions, the target of the present study.

The present study
We investigated the top-down control over facial expressions and its modulations by three independent variables: 1) the intended deliberate expression (smile vs. frown), 2) activation of caregiver behavior (pictures of adults vs. infants), and 3) motherhood status (mothers vs. nonmothers). If mothers react more quickly to infant emotional signals due to empathy (Proverbio et al., 2006), we postulate that in cases where the infant is displaying happiness or anger (Sonnby-Borgström, 2002), there should be a difference in responses to adult and infant face stimuli, which is more pronounced for mothers. Using a modified version of the Stroop task, two groups of women, either mothers or non-mothers, were required to produce two different facial expressions in response to verbal prompts, while adult and infant faces were shown in the background, displaying facial expressions that were either compatible or incompatible with the required response. Participants' facial expressions were video recorded and coded frame-by-frame with a machine classifier, indicating the presence and intensity of activation for several action units (AU). AUs refer to the minimal facial units that are anatomically separate and visually distinguishable (Ekman et al., 2002). Machine classifier accuracy has been proven by comparing their results with datasets of pictures annotated by human coders with labels for facial expressions, reporting similar or even better performance than humans (Baltrusaitis et al., 2018). RTs obtained from a machine classifier showed moderate to high correlations with electromyogram onsets, in a study using a motor control task with facial expressions as the response (Beringer et al., 2019).
We addressed the following questions: (1) Can we find congruency effects in a Stroop-like task, where deliberate facial emotional expressions as response 2 dimension are produced while both the task-relevant stimulus dimension (emotional word meaning) and the relevant stimulus dimension (facial expression of stimulus) overlap? We expected an overall Stroop-like effect consisting in better performance when the required responses are congruent with the emotional expression of task-irrelevant background face than when they were incongruent. This assumption is supported by previous observation of congruency effects in a Simon-type task, which may be explained as the effect of the automatic activation of the perceived expressions (Otte et al., 2011a). (2) Will the required facial expression modulate the amount of mimicry and, in turn, the Stroop effect? As reviewed by Seibt, Mühlberger, Likowski, & Weyers (2015) mimicry is modulated by a number of factors, for example, affective affiliative intent (Hess & Fischer, 2014), the mood of the observer (Likowski et al., 2011), and by proprioceptive feedback from facial muscles (e.g. Finzi & Rosenthal, 2016), it seems to be plausible that the requirement to smile or frown would modulate mimicry and hence the Stroop effect. Thus, when the required expression is smiling, there might be more affiliative intention and pronounced mimicry (e.g., Künecke et al., 2017) than for frown responses. Hence, seeing a smiling adult should facilitate deliberate smile responses, whereas frown responses to adult faces might engage less facial mimicry. In other words, for required smile responses we expected a significant Stroop effect in smile response RTs comparing congruent versus incongruent conditions. In contrast, when the required expressions are frowns, little or no affiliative intention and, hence, less mimicry (e.g., Künecke et al., 2017). Therefore, we expected smaller Stroop effects for adult faces during frown as compared to smile responses. Please note that affiliative intent might also be modulated by the perceived expression, being higher for happy than angry faces; however, because happy and angry facial expressions were presented in both response conditions, stimulus expression effects on affiliative intent should be balanced leaving only response effectsif present.
(3) Does the caregiver response activated by pictures of infants modulate the congruency effect and does it interact with the required expression and motherhood? We finally expected that the responses to adult faces should be similar for non-mothers and mothers since there is no reason to assume that mimicry, affiliative intentions, or empathy should be different between these groups. However, the situation might be different for infant faces, where a caregiving response might be activated (e.g., Endendijk et al., 2018), changing the motivational entity, especially in mothers. The caregiving response (or empathy) may enhance or counteract mimicry, depending on affiliative intention. The clearest predictions can be made for negative expression responses. In this condition there should be no or only minimal affiliative intent but a dominance of caregiving behavior, which should enhance the top-down control over facial responses and mimicry. Especially distressed infants should elicit empathy and motivation for caregiving behavior, which might override any mimicry. In this case, smiling to console the infant would be an adequate response, rather than frowning. Therefore, one would expect no facilitation of frowns by negative infant expressions. In addition, frown responses might not suffer interference from infant smiles because the minimal affiliative intent during frowning should diminish mimicry of infant smiles. Hence, we predicted a diminished Stroop effect when frown responses were required while viewing infant faces. This effect should be most pronounced in mothers. The results for smile responses (affiliative intent) to infant pictures should be similar as for adult faces unless the caregiver response would counteract mimicry of negative expressions. We expected infant pictures to activate automatic caregiver responses, boosting emotion regulation by increasing the control of attention towards the task-irrelevant pictures and the motor control of facial expression (e.g., Kringelbach et al., 2016), especially in mothers since they are often encouraged to 2 We refer to these facial expressions that participants were asked to produce in the task as motor response, to highlight that it was the motor response requested in the task, and to distinguish them from the facial expressions shown in the stimuli. regulate their emotions during mother-child interactions to ensure infant well-being. If this is the case, mothers might have a smaller Stroop effect than non-mothers.
In sum, we expected: (1) an overall better performance for congruent than incongruent facial expressions of the participants relative to the expressions of the faces shown as background (Stroop effect); (2) smaller Stroop effects for adult faces during frown as compared to smile responses, and (3) larger Stroop effect in response to infant faces in the smile condition but a smaller Stroop effect to infant than to adult faces in the frown condition, especially in mothers.

Participants
Because of our interest in motherhood effects, we recruited only women for this experiment -via flyers, from various hospitals and clinics, social media channels, and email newsletters. Participants should either be nullipara (non-mothers) or should have a child of two to six months of age (mothers). Additional criteria were participants age, 18 to 50 years, currently being in a heterosexual partnership, high proficiency in German, no history of psychological or neurological disorders or intake of psychoactive medication and having normal or corrected-to-normal visual acuity. G*Power a priori power analysis for a mixed-measures analysis of variance (ANOVA) with 2 groups and 4 measures was estimated. Intuitively, we aimed at detecting a small effect size f = 0.25 (as in Cohen, 1988), with a power of 0.8 (Erdfelder et al., 1996), because previous EMG studies using a similar task had not reported effect sizes. The correlation between measures was established at 0.5. Power analyses showed a minimal sample size of N = 24 to detect a small within-subject effect, N = 108 for a small between-subjects effect, and N = 176, for the within-between interaction.
Overall, 113 participants enrolled in the experiment. Video data from four participants was missing or disrupted due to technical issues. Eight further data sets were excluded because hit rates or number of correct trials were less than the overall mean minus 2 SD. The final sample consisted of 101 women (see Table 1 for details). Forty-one mothers were exclusively breast-feeding still at the time of the experiment, 11 exclusively bottle-feeding. Mean infant age = 4.7 months, SD = 1.2, range = 2.5-7 months, infant sex = 36% female, 27% male, 36% not reported). Mothers and non-mothers did not significantly differ in terms of educational background or depression scores (all Fs < 1).
All procedures were approved by the Psychology Department Ethics Review Board at the Institut für Psychologie of the Humboldt-Universität zu Berlin. Participants received a reimbursement of 10 Euro per hour.

Stimuli, apparatus, and procedure
After signing informed consent, the women were seated in a well-lit, noise-attenuated cabin, with a computer screen, placed at a distance of 70 cm. Participants' facial expressions were recorded at 25 frames per second (fps) with a static Sony EVI-D70P camera, located on a table directly below the computer monitor.
The experimental session started with the completion of a short questionnaire about age, handedness, visual acuity, and alcohol consumption, continued with the application of EEG and EMG electrodes, followed by a ca. 10-min rest, the Stroop task reported here, and a further task unrelated to present purposes.
For the Stroop task, stimuli were presented on a computer monitor using Presentation software (version 19.0 build 11.14.16). Stimuli consisted of images of 20 Caucasian adults (10 male and 10 female identities) from the Radboud Faces Database (Langner et al., 2010) showing facial expressions of joy and anger, and of images of 60 Caucasian infants (30 male and 30 female identities) taken from online media sources. Infants showed expressions of happiness or distress (see Fig. 1 for examples). Whereas for adult stimuli, the same individuals displayed joy and anger, this control was not possible for the infant stimuli, where different individuals showed joy and distress. The face images were edited, cropping background and distinguishing features (e.g., hair, ears) by an oval frame. The distance, lighting, contrast, and other physical properties of the face in all images were held constant. The resulting slide images were 7.00 × 9.33 cm on the screen (visual angle: 4.01 • × 5.35 • ).
A sample of 27 volunteers, not taking part in the main study, rated the emotion expressed by the face pictures on a 9-point scale, from 1 = very angry, 5 = neutral, to 9 = very happy. The final selection consisted of 60 adult and 60 infant face pictures. Adult pictures were of 20 identities, each showing a positive, neutral, and negative expression. For infants the three sets of 20 emotion pictures were from 60 different individuals. Table 2 presents the results of the ratings. A two-way repeated measure ANOVA over rating scores of the expressed emotions was conducted with face age (adult, infant) and facial expressions (happy, neutral, angry) as within-subject factor. The main effects of face age and facial expressions were significant, F (1, 26) = 5.68, p < .05, η p 2 = 0.18, and F (2, 52) = 566.66, p < .001, η p 2 = 0.96, respectively. Additionally, the interaction of face age × facial expressions was significant, F (2, 52) = 37.61, p < .001, η p 2 = 0.59. Post-hoc tests indicated that emotional expressions of both adult and infant significantly differed (all ps < 0.05); the emotion ratings between adult and infant for three types of facial expressions also differed significantly (all ps < 0.05). The neutral faces were not used in the present experiment.
The German words for happiness (Freude) and anger (Ärger) written in black, bold, 15-point Calibri font were superimposed on the face images, approximately in the center of the face (see Fig. 1). The words indicated the target expressions that participants should produce. The target expression was either congruent with the face image in the background (e.g., happiness written over a happy face) or incongruent (e.g., anger written on a happy face). The stimulus set consisted of 160 distinct word-face combinations, in which each face identity was paired with the two emotion words an equal number of times. Since each combination was shown three times, the experiment contained 480 trials in total: 120 congruent and 120 incongruent trials of both adult and infant faces, respectively (see Fig. 1).
Each trial started with a black fixation cross at the center of the monitor, which was shown for 500 ms, followed by a word-face combination presented for 2 s, and a "stop" signal, shown for 1 s (see Fig. 2) as reminder to relax the face to a neutral expression after the target expression. Participants were instructed to produce the emotional expression designated by the word as quickly and accurately as possible with high intensity until the "stop" signal appeared, while ignoring the face image and its emotional expression. This allowed 2 s for producing the target expression, followed by a 1 s interval to relax the facial Note: Mean (SD), significant difference between mothers and non-mother (p < .01) BDI = German version of the Beck Depression Inventory (Hautzinger et al., 2009). muscles before the next trial started. Short breaks were offered after every 160 trials. For practice, four images each of adult and infant faces with different face-word combinations preceded the main task, after which participants received verbal feedback about their performance. During the 30-min experiment proper, presentation order of face-word pairs of all combinations were fully randomized.

Video data processing
Video recordings of participants' facial expressions were coded with the software OpenFace (version 2.0.2, Baltrušaitis, et al. 2018). The AU intensity scores provided by the software provide a measure how strongly a certain AU is activated in each video frame on a scale of 0 to 5, with 0 indicating no activity, 1 indicating weak, and 5 indicating maximal activation. The text files exported from OpenFace were converted to .xlsx format in Microsoft Excel (v. 2016) and then merged with the data stream containing stimulus events, using the timestamp information contained in the output. The datasets containing information from stimulus events were processed with MATLAB R2016a (The MathWorks Inc., Natick, MA) following a similar procedure as reported by Recio and Sommer (2018). In each trial, a fixed interval of 85 frames (3 s) following stimulus onset was defined as target epoch to measure facial expressions. This 3-s epoch included the 2-s time interval showing face-word stimuli, where we expected expression onset and apex, and the 1-s interval with the stop signal, where we expected expression offset. A baseline correction was applied for each trial to control for inter-individual and trial-by-trial differences in the neutral (resting) face expression, subtracting the average AU intensity scores over 5 frames (200 ms) before stimulus onset from each intensity score of this AU in the trial.
Data processing focused on AU4 and AU12, as defined in the Open-Face output, as target channels for "frown" and "smile" responses, respectively. Conversely, AU4 was the distractor channel for smiles and AU12 was the distractor channel for frowns. On each trial, facial expressions were measured using three parameters, namely, the onset, offset, and duration of the expression. For measuring expression onset and offset, a threshold value for AU activation was defined in each target channel (either AU4 or AU12) for each participant, in order to account for inter-individual differences in the scores reflecting AU activation intensity (Beringer et al., 2019). The mean thresholds across participants were 0.25 (range 0.01-0.74) for AU4 and 0.86 (range 0.23-1.76) for AU12.
Based on previous experience (Recio & Sommer, 2018), we considered very brief activations in the data as noise. Only facial expressions with an onset and an offset in the intensity scores above target channel threshold, and lasting for at least seven consecutive frames (i.e., 210 ms) were accepted as reflecting a distinct facial expression. Trials where activity onsets occurred within the first three frames (120 ms) after stimulus onset, were considered fast guesses and excluded from further analyses (1% of all data).
Based on the onset, offset and duration scores, each trial was classified as hit, error or omission. A trial was classified as omission (13.2 % in total) if there was no activity in the target or distractor channels in the target epoch of 2.9 s, that is, between frame 3 and frame 85 after stimulus onset. We considered any trial as error (0.7 % in total) if the distractor channel showed activity in the target epoch of 2.9 s for a minimum of seven consecutive frames after stimulus onset or if it  preceded activity in the target channel, for example, when the participant incorrectly smiled but then corrected to a frown, or if there was simultaneous activation of both target and distractor channel (4.8% in total). Trials were classified as hits (76.3% in total) when target channel activity was above threshold for at least seven consecutive frames and preceded any activation in the distractor channel. All 480 trials of each participant were classified according to these criteria. The overall good performance and the lack of unclassified trials or too early activations, indicate that subjects followed the instruction, producing discrete facial expressions and relaxing their face within the 2.9-s target epoch in the vast majority trials. RTs were calculated only for hit trials.

Statistical analysis
We analyzed RTs and hit rates with two separate mixed-measures ANOVAs on within-subjects factors required response (smile, frown), congruency (congruent, incongruent), and stimulus age (adult, infant) and between-subjects factor motherhood status (mother, non-mother). We also conducted exploratory analyses of intensity values of all AUs, aiming to establish whether the variables on study would qualitatively affect the expressions in any conditions. Activations of non-target AUs would indicate qualitative alterations in the production of the target facial expressions, for example, showing a blend between the target and the distractor expression (BLINDED).

Results
At first, we were interested whether the intensity of the required responses is affected by congruency or by any of the other experimental factors. We considered this question by submitting the intensity score outputs of the OpenFace software to the same type of overall ANOVA as the RTs but separately for smile and frown responses. As it turned out, there was no modulation of the intensities of AU 4 (corrugator supercilii) during frown responses by congruency as a main effect or in interaction with any other experimental factors (Fs < 2.0). The same held true for AU 12 (zygomaticus major) during smile responses (Fs < 1.5). Hence, the intensity of the required responses seems to have been sufficiently stable across conditions and participant groups to allow for the measurement of RTs from video scores, which was the main objective of the present study. Fig. 3 shows mean RTs for all conditions (for details please see Table 3). Pearson correlations of RTs in congruent trials showed that participants who smiled faster tended to frown faster, whether shown images of adults (r = 0.388, p < .001) or infant (r = 0.447, p 〈001). As the most basic result, ANOVA of RTs confirmed the expected faster responses in congruent relative to incongruent trials, F (1, 99) = 123.64, p ≤ .001, η p 2 = 0.56, with a main effect of congruency of M diff = 57.5 ms.

Follow-up analyses of RTs
The interaction A × C × R × M was elucidated with separate followup analyses for images of adults and infants, estimating how the congruency effect is influenced by response type and motherhood status. The significance level was established at p < .01 in order to correct for multiple testing.

Adult images as distractors
The ANOVA of RTs confirmed a main effect of congruency, F (1, 99) = 114.40, p ≤ .001, η p 2 = 0.54 (M diff = 220 ms) but did not show any other main effects or interactions (Fs < 1).

Fig. 2.
Example of a trial sequence with incongruent condition. A fixation cross is followed by a target word superimposed on a face, prompting a facial expression according to the word. The stop signal is a reminder to relax the face to a neutral expression.
3 The interaction C × R × M seemed to be a consequence of the interaction A × C × R × M, because the reduction of the congruency effects in mothers when required to frown with infant distractors in the background prevails also when infant and adult background pictures are averaged. Therefore, the C × R × M does not seem to bear meaning of its own.

Infant images as distractors
The ANOVA of RTs also confirmed the main effect of congruency, F

Adult vs infant images when frown was the requested response
Exploratory follow-up ANOVAs to further clarify the interaction compared the effect of stimulus age in mothers and non-mothers separated for the congruent and incongruent conditions. Mothers showed slower RTs in the congruent condition for infants relative to adult images, but the result did not reach significance, F = 3.48, p = .068, η p 2 = 0.062, but the incongruent condition did not differ between adult and infant pictures, F < 1. Conversely, for non-mothers, RTs did not differ between stimulus age in the congruent condition, F < 1, but were significantly slower for infant relative to adult images in the incongruent condition F = 7.29, p = .010, η p 2 = 0.013. We conclude that the interaction A × C × R × M is mainly due to a reduction of the congruency effect when mothers frown in the presence of infant images as distractors, together with an increase in the congruency effect when nonmothers frowned at images of smiling infants.
In the ANOVA of hit rates (

Intensity analysis of AU6
Although the present study focusses on chronometric questions, as suggested by a reviewer we also assessed the activation intensity of AU6 (orbicularis oculi). Activation of AU6 is considered a sign of genuine Fig. 3. Mean RTs to adult and infant stimuli as a function of required response, congruency, and participant group. Arrows with solid and broken lines mark pairwise comparisons that are significant or a trend, respectively.

Table 3
Mean RTs (in ms), hit rates and standard deviations (SD) as a function of required response, congruency, and participant group.

RTs (SD)
Hit Rates (SD)  (Duchenne) smiles. Therefore, data from OpenFace scores of AU6 intensity were submitted to separate ANOVAs for smiles and frowns. The factor required response is not meaningful because smiles and frowns differ per se in the activation of AU6. For smiles, results revealed a significant effect of stimulus age, F = 4.08, p < .05, reflecting larger AU6 intensity for pictures of infants compared with adults, mean diff = 0.030. No other main effects or interactions were significant (all Fs ≤ 1). For frowns, we observed again greater AU6 intensity for infants than adults, mean diff = 0.067, F = 12.41, p < .001, and also for incongruent relative to congruent trials, mean diff = 0.034, F = 4.81, p < .05. All other main effects or interactions failed significance (all Fs ≤ 2, ps > 0.1).

Discussion
We investigated cognitive control over deliberate facial expressions in women, challenged in a Stroop-like task by congruent and incongruent expressions of distractor faces. Apart from an overall congruency effect, we were interested in the impact of three variables over facial expression control: first, the required response (smiles vs. frowns) as they may differ in affiliative intent; second, caregiving response, induced by distractor faces of infants rather than adults; and third, motherhood status, as mothers may show enhanced caregiver responses and empathy relative to non-mothers. Results confirmed large and consistent congruency effects, except for a single condition, when mothers were asked to frown in presence of an infant face in the background.

Overall congruency effects
The overall congruency effect confirmed our prediction of slower RTs (by about 60 ms) for incongruent as compared to congruent wordface pairs. As performance accuracy was diminished in incongruent relative to congruent trials, speed-accuracy trade off cannot account for the RT effect. The congruency effect in RTs suggests that task-irrelevant stimuli showing incongruent facial expressions interfere with the production of a deliberate response and, conversely, congruent facial expressions facilitate responses. As there was no neutral expression in the present study, the relative contributions of interference and facilitation cannot be disentangled. To our best knowledge, this is the first study reporting congruency effects in onsets of facial expression in a Stroop task using video coding with a machine classifier. The present findings are in line with previous Stroop studies for facial expressions requiring manual responses (e.g., Beall & Herbert, 2008;Stenberg et al., 1998) and with EMG studies in a Simon task also requiring facial expressions as responses (Otte et al., 2011a;Otte et al., 2011b). Together, these interference effects may be due to stimulus-response compatibility (e.g., Otte et al., 2011a), or automatic facial mimicry (e.g., Lee et al., 2008). Such response tendencies would activate the incorrect response in incongruent conditions, which has to be inhibited via top-down control processes, inducing incorrect responses when inhibition fails and slowing down correct responses due to the necessary resolution processes. Conversely, these response tendencies would facilitate the required responses in congruent trials.

Effects of the required response
Social context modulates facial mimicry interacting with the type of expression. Because smiles supposedly signal affiliative intention, they commonly engage more mimicry in the observer than negative expressions like anger (e.g., Hess & Fischer, 2014;Künecke et al., 2017). In the present study we manipulated the type of deliberate expression required by the receiver, that is, our participants were receivers and posers at the same time. If the deliberate expression of a smile induces affiliative intention, whereas frowns do not, one might expect more automatic mimicry, and in turn, larger Stroop effects for deliberate smiles than frowns. However, we did not observe a two-way interaction of congruency and response type. Hence, the deliberate production of smiles or frowns as implemented in the present study do not seem to differentially modulate the automatic effects of task-irrelevant background adult faces. The lack of a significant modulation of the congruency effect does not support that the type of required response is sufficient to induce a specific affiliative intent that would be able to have a differential effect on mimicry (e.g., Künecke et al., 2017). However, the effects may be different in social interactions in daily life, where the affiliative context will build up over a longer time and is anchored more deeply. Pictures of faces in an experimental context are only a representation of social stimuli and induce weaker effects than when we are facing real persons (Risko et al., 2012), especially when smile and frown responses are constantly changing.
The present results are also at variance with two studies using the response-priming task, in which facial expressions were cued with congruent or incongruent prime stimuli. These studies reported smaller congruency effects and better control over facial expressions for responses of anger (Recio et al., 2014), and disgust (Recio & Sommer, 2018) relative to smiles. Possibly, being prepared to show a negative facial expression, requires more top-down control and emotion regulation than preparing to express positive affect, improving performance in negative affect expression (Katembu et al., 2022;Recio et al., 2014). Using a Stroop-like task the present results did not show better performance for frowns, which we had predicted for pictures of adults. The apparent discrepancy may be due to different sources of the congruency effects in the two tasks. In the response-priming task, the incorrect preparation of a motor response (either using words as primes or face stimuliallowing for imitation) seem to interfere more with the control of smiles, than the task-irrelevant facial mimicry generate by the incongruent facial expression in the Stroop task.

Motherhood and caregiver responses
The central question of the present study concerned the consequences of motherhood and caregiving tendencies on the congruency effect. We expected that seeing faces of infants, especially of infants in distress, would engage empathy and caregiving responses, characterized by the motivation to reduce distress and enhanced top-down inhibitory control due to larger recruitment of brain regions involved in cognitive control (e.g., Kringelbach et al., 2008). We predicted that especially mimicry of negative expressions would be reduced under these conditions, diminishing the congruency effect for infant distractor faces relative to adult distractor facesthat is, diminishing both, the inhibition of smile responses as well as the facilitation of frown responses. These effects should be especially pronounced in mothers because they presumably experience greater empathy and emotional arousal towards pictures of infants (e.g., Proverbio et al., 2006). Our results do not support an overall effect of caregiving response since there was no global modulation of the congruency effect due to showing infant rather than adult background images. Only data from AU6 intensity, revealed an overall larger activation of AU6 while seeing pictures of infants relative to adults, when both smiles or frowns were the requested expression and regardless of the facial expression of the infants. However, this activation of AU6 neither differed between mothers and non-mothers, nor impacted the congruency effect at all, indicating that the activation of caregiver responses override mimicry. Also, there was no evidence for a global modulation of the congruency effect by motherhood. The present data suggest that caregiving and motherhood modulate the congruency effect only when the deliberate requested response is a frown. In this case, the congruency effect for infant distractor faces was reduced in mothers relative to non-mothers. Indeed, the condition when mothers frowned at infant faces was the only one where the congruency effect (of 30 ms) failed significance, whereas the Stroop effect was three times this size in nullipara women. By and large, this finding is in line with our expectation, albeit restricted to a specific combination of conditions. We suggested that seeing infant faces would induce automatic caregiver responses, especially in mothers, counteracting or overriding mimicry of facial expressions (apparently present for adult background faces) by empathy with infant distractor faces. Essentially this is what we found, but only for negative deliberate facial expressions. Here, mothers showed a significantly smaller Stroop effect to infant distractor faces than non-mothers.
We offer three alterative explanations for this finding. First, the attenuation of the congruency effect in mothers when required to frown might reflect some form of emotional interference. According to the caregiver account, a mother's reflexive response to an infant in distress is to smile in order to comfort it, but when she is instructed to frown, she cannot follow the smile elicited by her caregiver reflex but has to do the opposite, generating a time-consuming conflict with the tendency to mimic the infant's negative expression. However, the follow-up analyses were not conclusive about this account, because the difference in RTs between mothers and non-mothers did not reach significance.
Second, non-mothers were slower when they frowned to images of smiling infants compared with smiling adults. In contrast, as we predicted, mothers' performance did not differ significantly between adults and infants while frowning. This indicates that when asked to frown, non-mothers were more susceptible to interference from images of smiling infants than mothers were. This finding does not refer to the downregulation of negative affect in particular, but also to positive emotions, as expected when seeing a happy infant. Hence, it is possible that motherhood experience and the hormonal changes associated with it, gives mothers an advantage in their emotion regulation in terms of motor control when confronted with emotional expressions of infants, as they are accomplishing a task. Results suggest that mothers are less emotional, and less susceptible to attentional interference from affective signals from infants. The present finding is in line with reports that observing infants triggers automatic caregiving responses in mothers, boosting emotion regulation by improving key executive functions such as top-down control of attention to affective stimuli, the downregulation of affect and emotional scaffolding (e.g., Kringelbach et al., 2016). The specification whether the attenuated congruency effect of mothers relative to non-mothers when they frown at infants would require an experimental condition with a neutral facial expression, which was not available in the present study.
Third, although we hold the caregiver account of the attenuated congruency effect in frowning mothers to be the most plausible, we also want to briefly discuss an alternative explanation. For mothers, the sight of a distressed infant may be less of an emotional situation but rather a task presented to her: What is the need of the infant (food, sleep, contact) and how to satisfy it? At least in the lab, such a task situation in response to an infant picture may only be triggered when the corrugator is activated by the deliberate frown. But why should this be so? The argument here is two-pronged. Firstly, following Darwin's (1872) facial feedback hypothesis, there is a lot of evidence that facial muscle activity (or the lack thereof) affects emotional states (e.g., Cupchik & Leventhal, 1974;Finzi & Rosenthal, 2016;Strack et al., 1988) and the ability to judge emotional expressions of others (e.g., Storbeck et al., 2019). Second, the corrugator supercilii muscle is not only activated during anger but also when people are puzzled (Darwin, 1872), during the exertion of mental effort (Van Boxtel & Jessurun, 1993), and during states of reduced mental fluency (e.g., Topolinski & Strack, 2009). In addition, there seems to be a link between the rostral cingulate zone, which is involved in cognitive control and upper muscles of the face, such as the corrugator (Shackman et al., 2011). Therefore, it is also possible that the activation of the corrugator muscle as required in deliberate frowns diminishes the congruency effect in the present study because it increases cognitive control and shields the expression production system against mimicry-related influences. However, additional questions would have to be addressed if facial feedback were to account for the present findings: Why is the Stroop effect during frown responses not also attenuated for adult faces and why not also for non-mothers? Answers to these questions would require further research.

Limitations and perspectives
Despite its considerable sample size and an intriguing finding, the present study has its limitations. The main finding is based on a significant within-between interaction, and the power analyses revealed that the study might be underpowered for such interactions despite a sample of N = 101 participants, which took about one year to collect. Hence, the results involving interactions of all 4 factors reported here could be false positives or false negatives. Future studies might aim at investigating the reported effects with larger samples.
As mentioned, the lack of a condition with neutral faces as distractors limits our interpretation of the observed congruency effect, as we cannot establish if they were driven by facilitation or interference. A furtherbut hard-to-avoidlimitation is the problem to distinguish different negative emotional expressions in infants (Camras & Shutter, 2010). Interestingly, the fact that for required smiles, the Stroop effect to adults and infants was indistinguishable makes the muscle-specific account for mimicry less plausible because the required response was a frown, whereas the infant expression may not have been clearly categorizable as anger, fear, or pain (note that the emotion ratings for the stimuli allowed only anger as a negative category). Therefore, if there was mimicry of infant anger it may have been based on a general sign of distress, rather than on a specific facial expression of anger.
One could argue that seeing pictures of their own infants would have been more powerful distractor stimuli for mothers, as they are motivationally more relevant than unknown infants. The impact of this and other parenting variables and biological markers like hormones, on the Stroop effect could be the topic of future studies. Maternal sensitivity is known to predict better development of emotion regulation in children (Frick et al., 2018). Whether it also predicts better regulation strategies in mothers, and how the use of their facial expression in particular may impact the development of self-regulation in children could also be of interest, as would be the cross validation of the findings reported here obtained with a machine classifier with other measure of facial expressions like EMG or manual coding.
Finally, individual face identities could have affected or obscured the observed effects. Since we could not use a standardized data base for infant faces, we cannot rule out such effects. This issue deserves further investigation in future studies using item analysis or mixed linear models.

Conclusions
The present study shows, chronometric analyses of video recordings are a suitable instrument for investigating cognitive control over facial expression, which is in high demand in everyday life. For adult models no principled difference in the congruency effect was seen for the type of required expression. When infant pictures were used as stimuli, mothers, required to frown, showed a significant attenuation of their congruency effect. This effect cannot be accounted for by mimicry or stimulusresponse compatibility. The most plausible alternative account may be the activation of caregiver responses, counteracting or overruling mimicry of infant expressions.

Data availability statement
Data that support the findings of this article are available at htt ps://osf.io/rtjh9/.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.