Face masks influence emotion judgments of facial expressions: a drift–diffusion model

Face masks slow the spread of SARS-CoV-2, but it has been unknown how masks might reshape social interaction. One important possibility is that masks may influence how individuals communicate emotion through facial expressions. Here, we clarify to what extent—and how—masks influence facial emotion communication, through drift–diffusion modeling (DDM). Over two independent pre-registered studies, conducted three and 6 months into the COVID-19 pandemic, online participants judged expressions of 6 emotions (anger, disgust, fear, happiness, sadness, surprise) with the lower or upper face “masked” or unmasked. Participants in Study 1 (N = 228) correctly identified expressions above chance with lower face masks. However, they were less likely—and slower—to correctly identify these expressions relative to without masks, and they accumulated evidence for emotion more slowly—via decreased drift rate in DDM. This pattern replicated and intensified 3 months later in Study 2 (N = 264). These findings highlight how effectively individuals still communicate with masks, but also explain why they can experience difficulties communicating when masked. By revealing evidence accumulation as the underlying mechanism, this work suggests that time-sensitive situations may risk miscommunication with masks. This research could inform critical interventions to promote continued mask wearing as needed.

. Emotion judgment task and modeling. (a) Sample task instructions. On each round, participants rated whether or not faces showed one of six emotions (anger, disgust, fear, happiness, sadness, surprise) by pressing a key with either their left or right hand (key pairing randomly assigned by participant). Emotion rating type was randomized across task rounds. (b) Hypothetical drift-diffusion model (DDM). This example shows how rapidly participants accumulate evidence for disgust-as indicated by drift rates-from disgusted expressions without masks (v none ) versus disgusted expressions with lower masks (v lower ). Here, steeper slopes reflect faster evidence accumulation-via more positive drift rates.
Procedure and design. Experiments were developed using the jsPsych JavaScript library 43 , which measures response times with high precision and reliability comparable to lab-based software 44,45 . Participants completed six rounds of viewing and rating facial expressions of 6 emotions (anger, disgust, fear, happiness, sadness, surprise; within-subjects) with 3 types of "face masks" (lower, upper, none; within-subjects). Participants viewed a different block of faces each round, counterbalanced for the 6 expressions and 3 masks, and rated whether or not faces expressed one of 6 emotions (anger, disgust, fear, happiness, sadness, surprise; within-subjects). Blocks of faces and types of emotion ratings were randomized across task rounds and participants viewed faces in random order within each round. On each trial, participants viewed a centered fixation cross for 1000 ms, followed by a face with the tip of the nose centered to the preceding fixation cross. Participants rated whether or not each face expressed a given emotion as quickly and accurately as possible by pressing a key with either their left or right hand (Fig. 1). Response options were randomly assigned to the left and right hands for each participant. Stimuli. Six blocks of faces were fully counterbalanced for emotion expressions and "face masks" such that participants made all 6 ratings of all 6 expressions with all 3 masks. We selected 36 white female and male actors from the Radboud Faces Database for Study 1 46 and 18 East Asian, Black, Hispanic, and white female and male actors from the Racially Diverse Affective Expression (RADIATE) face stimulus set for Study 2 47 . To create lower and upper masks, faces were split in half at the tip of the nose and either the lower half or the upper half of each face image was blacked-out (Fig. 2). This generated 648 total faces for Study 1 and 324 total faces for Study 2. Each block of faces included one expression from each actor, with all three masks, presented in random order for 108 faces per block in Study 1 and 54 faces per block in Study 2. Critically, this design counterbalanced blocks of faces for expressions and masks, as well as for actors, actors' gender, and in Study 2, actors' ethnicity.

Data analysis. Power analysis.
To determine a target total sample size that would generate high statistical power in Study 2, we conducted a power analysis of Study 1 in G*Power 48 . Here, we examined participants' emotion ratings on trials where they could have made false negative errors-that is, trials where they viewed expressions congruent with the emotion they rated. For each expression, we selected the mask vs. no mask contrast with the largest effect (lower: disgust, happiness, sadness, surprise; upper: anger, fear) and we calculated odds ratios by taking the exponent of the corresponding b values (Table 1). Power analysis revealed that a target final N = 230 would achieve 95% power for detecting five of these six effects in two-tailed binomial logistic regression models (α = 0.05). We pre-registered this target final sample size and a target total N = 300 based on the rate of failed attention checks in Study 1.
Exclusions. For both studies, we pre-registered that we would exclude participants who failed attention checks, resulting in the exclusion of 72 participants from Study 1 (final N = 228) and 26 participants from Study 2 (final N = 264). We also pre-registered that we would exclude trials with response times less than 100 ms 49 . Hierar- www.nature.com/scientificreports/ chical drift-diffusion models required further excluding trials with response times in the top 0.5% of all trials (> 9303 ms in Study 1 and > 5737 ms in Study 2).
Statistics. We pre-registered the following analyses testing whether masks influenced how accurately participants rated expressions, how quickly they rated expressions, and how rapidly they accumulated evidence for rating expressions. Analyses of false negative errors examined trials where participants viewed emotion expressions congruent with the emotion they judged that round-that is, trials where participants either correctly identified expressions (e.g. rating disgust expressions as disgusted) or failed to correctly identify them (e.g. rating disgust expressions incorrectly as not disgusted). We analyzed emotion ratings, response times, and drift rates, for all 6 congruent pairings of expressions and judgments. In contrast, analyses of false positive errors examined trials where participants viewed expressions incongruent with the emotion they judged that round-that is, trials where participants either correctly distinguished between expressions (e.g. rating disgust expressions as not angry) or misidentified them (e.g. rating disgust expressions incorrectly as angry). We analyzed emotion ratings for all 30 incongruent pairings of expressions and judgments. When masks influenced these ratings, we further analyzed response times for that incongruent pairing, and when masks influenced these response times, we further analyzed drift rates for that pairing. Analyses of emotion ratings tested the main effects of masks, collapsing across expressions, and within each type of emotion expression. Emotion ratings were analyzed through two-tailed mixed effect binomial logistic regression models including participant intercepts as random effects. Analyses of response times tested the interaction between masks and rating accuracy, as well as the simple main effects of masks for correct and incorrect ratings, collapsing across expressions and within each expression type. Response times were analyzed through two-tailed mixed effect linear regression models including participant intercepts as random effects. Response times were log-transformed to correct for their inherent skewness and flipped (i.e. multiplied by − 1) for plotting purposes. Regression coefficients and 95% confidence intervals were estimated through bootstrapped regression analyses using 1,000 iterations 50 .
Analyses of drift rates tested the main effects of masks, across expressions overall and within each expression type (Fig. 1). Drift rates were estimated through hierarchical Bayesian estimation of drift-diffusion model parameters (HDDM) using Markov chain Monte Carlo (MCMC) sampling methods 14,16,17 . These models estimated posterior distributions of regression coefficients for the effects of masks over drift rate (v) parameters and included participant intercepts as random effects. In addition, these models also estimated parameters for response bias (z), boundary separation (a), non-decision time (t), and inter-trial variability in all parameters (sz, sa, st, sv), as covariates averaging over mask conditions. Parameter coefficients and 95% credible intervals were estimated by drawing 5000 samples from posterior distributions and discarding the initial 200 'burn-in' samples. Bayesian hypothesis testing compared the posterior distributions of drift rate coefficients across mask conditions and estimated the probability that they differed from each other and the probability that they differed from 0. Follow-up analyses additionally tested the effects of masks over boundary separation (a) controlling for mask effects over drift rate (v).

Results
Over two sets of pre-registered analyses, we tested the influence of lower and upper masks over how accurately participants rated expressions, how quickly they made these ratings, how rapidly they accumulated evidence for emotion from expressions, and how much evidence they required to make these judgments. Here, we report analyses of false negative errors-that is, trials where participants viewed expressions congruent to the emotion they judged, and could have failed to correctly identify expressions-for emotion ratings, RTs, drift rates, and boundary separations (see Supplementary Note for analyses of false positive errors).

False negative errors. Emotion ratings.
Analyses of emotion ratings compared how accurately participants identified expressions with lower and upper masks versus chance (50%), versus expressions without masks, and versus one another, as increasingly conservative tests of emotion rating accuracy. Participants in Study 1 correctly identified all 6 types of expressions above chance with lower masks, and expressions of disgust, happiness, sadness, and surprise, but not anger or fear, with upper masks.
However, participants were less accurate identifying expressions with lower and upper masks than without masks, collapsing across emotions (Fig. 3, Table 1). With lower masks, participants were less accurate identifying expressions of disgust, fear, happiness, sadness, and surprise, but not anger, relative to without masks (Fig. 4). With upper masks, they were less accurate identifying expressions of anger, fear, sadness, and surprise, but not disgust or happiness.
Participants did not significantly differ in their accuracy identifying expressions overall with lower versus upper masks (Table 1). Nonetheless, participants were less accurate identifying disgusted, happy, and sad expressions with lower masks versus with upper masks (Fig. 4). By contrast, they were less accurate identifying angry and fearful expressions with upper masks as compared to lower masks. www.nature.com/scientificreports/ relative to without masks (Mdn = 838.78 ms, SD = 734.29), as revealed by an interaction between mask conditions and emotion ratings for lower masks only (Fig. 3).
With lower masks, participants were slower to correctly identify disgust, happiness, sadness, and surprise than without masks, and as shown by interactions for disgust, happiness, and surprise (Fig. 5, Table 2). In contrast, they were faster to correctly identify fear with lower masks, versus without masks, and they showed no significant difference for anger. With upper masks, participants were slower to correctly identify anger and surprise, but no other expressions, than without masks, and as revealed by an interaction between masks and emotion ratings for surprise ( Supplementary Fig. S1).
Participants were also slower to correctly identify expressions with lower masks versus with upper masks across emotions overall, and as indicated by an interaction (Supplementary Table S2). In particular, they were slower to correctly identify disgusted, happy, and sad expressions with lower masks, versus with upper masks, and as revealed by an interaction for disgusted expressions (Supplementary Fig. S2). Conversely, they were slower to correctly identify angry and fearful expressions with upper masks in comparison to with lower masks.
Emotion rating drift rates. Analyses of drift rates compared how rapidly participants accumulated evidence for emotion from expressions with lower and upper masks versus null (0) drift rate-indicating evidence accumula- www.nature.com/scientificreports/ tion towards correct or incorrect judgments on average-as well as versus expressions without masks, and versus one another, as increasingly conservative tests of evidence accumulation rate. Participants accumulated evidence towards correctly identifying all 6 types of expressions with lower and upper masks, as indicated by greater than null (0) drift rates, all P(lower/upper mask < 0) < 0.0001. However, consistent with their decreased accuracy and speed, participants accumulated evidence more slowly from expressions with lower and upper masks than from expressions without masks, collapsing across emotions (Fig. 3). With lower masks, participants accumulated evidence for disgust, fear, happiness, sadness, and surprise more slowly than without masks, all P(lower mask > none) < 0.0001; but not anger (Fig. 6). With upper masks, they accumulated evidence for anger, fear, sadness, surprise, all P(upper mask > none) < 0.0001; disgust, P(upper mask > none) = 0.0029; and happiness, P(upper mask > none) = 0.029; more slowly than without masks.
Participants also accumulated evidence more slowly from expressions with lower masks, as compared to upper masks, collapsing across emotions, P(lower mask > upper mask) = 0.017 (Fig. 3). Specifically, they accumulated evidence more slowly from expressions of disgust, happiness, sadness, all P(lower mask > upper mask) < 0.0001; and surprise, P(lower mask > upper mask) = 0.0052; with lower masks relative to upper masks (Fig. 6). By contrast, participants accumulated evidence for anger and fear more slowly from expressions with upper masks as compared to lower masks, both P(upper mask > lower mask) < 0.0001.   Study 1 summary. Participants in Study 1 rated expressions above chance overall with both lower and upper masks and they accumulated evidence in favor of identifying masked expressions correctly on average. However, participants were worse at perceiving expressions of disgust, happiness, sadness, and surprise with lower masks relative to without masks, and worse at perceiving expressions of anger and surprise with upper masks as compared to without masks-as indicated by decreased accuracy, speed, and rate of evidence accumulation. Similarly, participants were also worse at judging disgusted, happy, and sad expressions with lower versus upper www.nature.com/scientificreports/ masks, and worse at judging angry and fearful expressions with upper versus lower masks-via reduced accuracy, speed, and rate of evidence accumulation.

Study 2 summary. Participants in Study 2 closely replicated this pattern 3 months later when judging an ethnically diverse set of facial expressions (see Supplementary Note for analyses of Study 2 false negative errors).
They also rated expressions above chance with lower and upper masks, and accumulated evidence in favor of correctly identifying these expressions. Nonetheless, they too were worse at judging all expressions except anger with lower masks as compared to without masks, and worse at judging expressions of anger and fear with upper masks relative to without masks-as reflected by decreased accuracy, speed, and drift rate. These participants were likewise worse at perceiving disgusted, happy, and sad expressions with lower versus upper masks, and worse at perceiving angry expressions with upper versus lower masks.  www.nature.com/scientificreports/

False positive errors. Masks also influenced how participants distinguished between expressions (see Sup-
plementary Note for analyses of false positive errors). With lower masks, relative to without masks, participants in Study 1 were more likely to misperceive surprised expressions as angry, sad expressions as surprised, and all expressions except fear as happy-as reflected by decreased accuracy, speed, and rate of evidence accumulation. However, they were also less likely to misperceive fearful expressions as disgusted or sad with lower masks as compared to without masks-via increased accuracy, speed, and drift rate. With upper masks, in comparison to without masks, participants were less likely to misperceive surprised expressions as fearful, fearful expressions as surprised, disgusted expressions as angry, angry expressions as disgusted, and angry and sad expressions as fearful-via increased accuracy, speed, and drift rate. Study 2 replicated many of these effects after 3 months with participants judging an ethnically diverse set of faces. With lower masks, as compared to without masks, participants were more likely to misjudge surprised expressions as angry, angry expressions as surprised, sad expressions as happy, happy expressions as sad, and surprised expressions as sad-as shown by decreased accuracy, speed, and rate of evidence accumulation. However, they were less likely to misjudge angry expressions as sad, and disgusted expressions as surprised, with lower masks relative to without masks-via increased accuracy, speed, and drift rate. With upper masks, in comparison to without masks, participants were less likely to misjudge surprised expressions as fearful, fearful expressions as surprised, and angry expressions as disgusted, sad expressions as surprised or fearful-via increased accuracy, speed, and evidence accumulation rate. www.nature.com/scientificreports/

Discussion
Across two pre-registered studies collected three and 6 months into the COVID-19 pandemic, we tested whether masks influence how individuals perceive facial expressions of emotion by comparing (i) how accurately individuals judge expressions, (ii) how quickly they make these judgments, and (iii) how rapidly they accumulate evidence for emotion, with and without masks. Participants identified expressions above chance with lower "face masks" and they accumulated evidence towards judging these expressions correctly. However, participants were also less likely-and slower-to identify expressions correctly with lower masks, and they accumulated evidence of emotion more slowly, as compared to without masks. This pattern replicated and intensified in a new sample collected 3 months later. In sum, individuals perceive facial expressions relatively accurately with masks, and yet masks influence communication by slowing the rate at which individuals accumulate evidence of emotion. By clarifying how masks impact facial emotion communication, these findings could inform interventions to promote mask wearing. Face masks slow the spread of the SARS-CoV-2 virus 51 and they continue to play a role in the global response to COVID-19 4 , as they likely will for future pandemics and seasonal viruses. However, many individuals have reported not wearing masks due to experiencing difficulties with communication 31,32 . On the one hand, these data show that masks do influence how individuals perceive facial expressions, helping to explain why individuals experience difficulties at times. On the other hand, these data also show that individuals perceive facial expressions well above chance with masks, even without any real-world context, suggesting that concerns over communicating emotion while masked may be overstated. These findings could be used in interventions that acknowledge the challenges individuals may experience, but that also highlight how effectively individuals still communicate emotion when masked 52 .
This research examines emotion perception as a process that unfolds over time and points to settings in which individuals may misperceive masked expressions 53 . Participants were not only less accurate, but also slower at identifying masked expressions, because masks slowed the speed with which they accumulated evidence for emotion. Here, by applying drift-diffusion modeling (DDM) 14,1617 , this work extends recent evidence that individuals judge expressions less accurately with face masks [33][34][35][36][37][38][39][40][41] , by demonstrating that masks also influence the speed of these judgments, and revealing evidence accumulation as the mechanism underlying both of these effects. This suggests that individuals may struggle to perceive masked expressions under time pressure, such as when they quickly pass other people in the grocery store or other brief interactions. Future studies could examine how face masks may impact consumer behavior in these time-sensitive settings.
By showing that masks influence evidence accumulation, this work further predicts that individuals may communicate more effectively by simply taking their time. Face masks influence how individuals judge expressions by slowing the rate at which they accumulate emotion evidence. However, individuals require varying amounts of evidence to reach these decisions, depending upon whether they prioritize speed or accuracy when judging emotion 54 . When individuals deprioritize speed, they require more evidence-by increasing the boundary separation between evidence thresholds for each response option-and they make fewer errors as a result 13,15 . Here, deprioritizing speed should give individuals time to accumulate more evidence and judge masked expressions more accurately, and thus compensate for masks slowing evidence accumulation. Future investigations could test this model-based prediction by explicitly instructing participants to prioritize either accuracy or speed when judging masked expressions. If successful, this data-driven 'take your time' strategy could also be used in interventions to promote mask wearing.
Masks influenced how individuals perceived facial expressions to a greater extent at six versus 3 months into the COVID-19 pandemic. Participants were slower to accumulate evidence of emotion from expressions with lower masks, relative to expressions without masks, at 6 months into the pandemic as compared to an independent sample collected 3 months earlier. However, these groups did not significantly differ in how rapidly they accumulated evidence from expressions with upper masks. This suggests that the effects of masks may have intensified as individuals socially distanced and interacted face-to-face less often. This interpretation is consistent with recent evidence that individuals learned to focus relatively more on the eyes when judging emotion over the first 6 months of the pandemic, and that this adaptation was most pronounced in individuals who were exposed to masked faces more often 55 . However, it is also possible this finding reflects subtle differences between the Radboud Faces Database used in Study 1 and the Racially Diverse Affective Expression (RADIATE) face stimulus set used in Study 2. How else might individuals have adapted to seeing fewer faces in general and fewer lower faces in particular? This question may be particularly important for better understanding potential developmental 56 and individual 57 impacts of masks. One possibility is that individuals may have relied more upon context, for example by using more information from individuals' body postures and their surroundings 6,58,59 .
These data generate fundamental and applied insights into how individuals infer emotion from facial expressions. When directly compared, lower masks interfered more with how participants perceived disgusted, happy, sad, and surprised expressions, whereas upper masks interfered more with how they perceived angry and fearful expressions. Here, upper masks provide an important control by clarifying when lower masks influence emotion judgments by hiding the mouth in particular, versus when they simply conceal half of the face. Upper masks also show how concealing the upper face-with sunglasses for example-may impact emotion communication in daily life. Using both types of masks further reveal which emotion judgments rely more upon information conveyed by either the lower face-such as disgust, happiness, sadness, and surprise-or the upper face-such as anger and fear. These comparisons were made possible by blacking-out either the lower or upper half of each face image. However, while this design choice enhanced experimental control of whether each half of the face was visible 42 , it also limited ecological validity to some extent-for example, by occluding head shape.
In combination with DDM, this design builds on prior research by examining how individuals accumulate evidence for emotion judgments from facial features. Past work found that individuals gaze more towards the eyes of angry, fearful, and sad expressions, versus more towards the mouth of disgusted and happy expressions 25  www.nature.com/scientificreports/ and that gazing more towards the eyes increases confusion between anger versus disgust and fear versus surprise, whereas gazing towards the mouth decreases confusion 24,26 . Likewise, we find that individuals accumulate evidence for anger and fear more rapidly from the upper face, whereas they accumulate evidence for disgust, happiness, sadness, and surprise more rapidly from the lower face. These findings suggest that individuals gaze at facial features in ways that generally reflect how they accumulate evidence of emotion from those features. Evidence accumulation further links attention with decision-making as the mechanism by which individuals can make optimal use of facial information when judging emotion [27][28][29] . More broadly, this work contributes to theoretical perspectives on communication and social influence by highlighting evidence accumulation as a key mechanism linking social perception and decision-making. Communicative theories propose that individuals produce facial expressions to influence others' inferences and behavior 60,61 . For instance, "target" individuals generate larger and clearer facial expressions when conveying their experiences to other people 62 . Here, we show that "observer" individuals likewise accumulate evidence for emotion more rapidly from increasingly visible expressions, and that they are more likely-and faster-to perceive these expressions correctly. Evidence accumulation thus provides a core mechanism for facilitating clear communication between targets and observers, and it could help explain recently documented impacts of masks over facial mimicry 63 . Future experiments could explore how evidence accumulation underlies other types of social judgments 64 and helps individuals connect and cooperate across diverse real-world environments 9,11 , ranging from classrooms 12 to doctors' offices 7 and consumer settings 8 .
As individuals around the world began wearing face masks, they also adapted to how masks influence social interaction. Hospitality staff practiced "smizing"-smiling with the eyes-to convey warm feelings while masked 65 . Just as individuals adapt how they produce expressions behind masks, our findings predict they might also adapt how they perceive masked expressions to improve communication from start to finish.