Emotion recognition and mood along the menstrual cycle

Previous studies have demonstrated menstrual cycle dependent changes in the recognition of facial emotional expressions, specifically the expression of fear, anger, sadness or disgust. While some studies demonstrate an improvement of emotion recognition performance during the peri-ovulatory phase, when estradiol levels peak, other studies demonstrate a deterioration of emotion recognition performance during the mid-luteal phase, when progesterone levels peak. It has been hypothesized, that these changes in emotion recognition performance mirror mood changes along the menstrual cycle. In the present study, we investigate, whether changes in emotion recognition performance along the menstrual cycle are mediated by mood changes along the menstrual cycle. In a combined cross-sectional and longitudinal study design, two large samples of women completed an emotion recognition task, as well as several mood questionnaires during their menses, peri-ovulatory or mid-luteal cycle phase. 65 women completed the task thrice, once during each cycle phase, order counter-balanced. In order to control for potential learning effects, a sample of 153 women completed the task only once in one of the three cycle phases. In both samples, results demonstrated no significant changes in emotion recognition performance along the menstrual cycle, irrespective of the performance measure investigated (accuracy, reaction time, frequency of emotion classifications) and irrespective of the emotion displayed. Bayesian statistics provided very strong evidence for the null hypothesis, that emotion recognition does not change along the menstrual cycle. There was also no moderation of emotion recognition changes along the menstrual cycle by mood changes along the menstrual cycle. Mood changes along the menstrual cycle followed the expected pattern with highest positive affect and least premenstrual symptoms around ovulation and lowest positive affect, but strongest premenstrual symptoms during menses. Interestingly, premenstrual symptoms were negatively related to estradiol, suggesting a protective effect of estrogen during the luteal cycle phase against mood worsening during the premenstrual phase.

Given that a female advantage in emotion recognition has been repeatedly replicated (Thayer and Johnsen, 2000;Montagne et al., 2005;Hoffmann et al., 2010;Rukavina et al., 2018;Connolly et al., 2019;Olderbak et al., 2019), several studies have investigated the role of ovarian hormones in emotion recognition by tracking emotion recognition performance along the female menstrual cycle (compare Table 1; compare Gamsakhurdashvili et al., 2021 for a review). Ovarian hormone levels are lowest during menses and follow distinct patterns throughout the menstrual cycle. Estradiol peaks shortly before ovulation, drops shortly after ovulation and rises again to medium levels during the mid-luteal cycle phase. Progesterone levels start rising after ovulation and peak during the mid-luteal cycle phase, i.e., one week before the onset of the next menses. In a sample of 52 women, Conway et al. (2007) demonstrated a heightened sensitivity to facial cues signalling nearby threats in women with high progesterone levels. Specifically, high progesterone women indicated a higher intensity for fearful expressions in faces, who had their gaze averted.
Regarding the course of emotion recognition performance however, results are mixed. Response times did not change along the menstrual cycle in the majority of studies (compare Table 1) and the significant results reported concern overall emotion recognition accuracy or the accuracy in detecting specific emotions. These results have been interpreted along two lines of reasoning. On the one hand, various studies demonstrate a reduction in emotion recognition accuracy across a variety of emotions during the mid-luteal phase of the menstrual cycle, i.e., when progesterone levels peak (Derntl et al., 2008b;Guapo et al., 2009;Derntl et al., 2013;Mikolić, 2016;but see: Á lvarez et al., 2022). It is possible that a progesterone-dependent increase in the sensitivity towards threatening stimuli, as observed by Conway et al. (2007) contributes to this finding, in the sense that even neutral or other non-threatening expressions are mis-perceived as fearful, angry or disgusted. It has been discussed whether such an overperception of threatening emotions may aid women to avoid potentially harmful situations in preparation for a potential pregnancy (Derntl et al., 2008b). It has further been speculated, whether this increased salience of threatening stimuli during the mid-luteal cycle phase may contribute to the heightened vulnerability for mood disorders attributed to this time window (Andreano and Cahill, 2009). However, to the best of our knowledge, no study has actually assessed the frequency with which individual emotions were perceived, irrespective of the emotion displayed. Focusing on the frequency of emotional perceptions in addition to recognition accuracy may help to more clearly delineate the changes in facial emotional perceptions related to elevated progesterone levels.
On the one hand, two studies demonstrate an improved recognition of negative emotions, i.e., fear and sadness respectively, during the periovulatory phase of the menstrual cycle (Pearson and Lewis, 2005;Ramos-Loyo and Sanz-Martin, 2017). While the studies discussed above attribute emotion recognition changes along the menstrual cycle to fluctuations in progesterone levels, Pearson and Lewis (2005) attribute their findings to estradiol as they observed the best fear recognition associated to the highest estradiol levels. Unfortunately, only few studies included the peri-ovulatory phase in their design, given the methodological difficulties in capturing the peri-ovulatory estradiol peak. Nevertheless, various interpretations surround the finding of a potential increase in the recognition of certain emotions around ovulation. First, ovulatory performance improvements are frequently interpreted as increasing the probability of conception, though it is unclear how the specific ability to recognize fear or sadness in other people's faces might contribute to that. Second, Pearson and Lewis (2005) attributed their finding to the feminizing effects of estradiol on the brain, arguing that the female brain is in its most feminine state when estradiol levels peak.
However, this interpretation ignores the role of progesterone and estradiol-progesterone interactions for the female brain. Finally, it has been proposed that the likelihood to perceive an emotion in someone else's face is linked to one's own mood (mood-congruity effect, e.g., Schmid and Schmid Mast, 2010). Mood changes along the menstrual cycle have been researched extensively (for reviews see Romans et al., 2012;Sundström Poromaa and Gingnell, 2014;Welz et al., 2016;Paludo et al., 2020). Though not undisputed (Romans et al., 2012), several studies suggest a mood improvement with more positive affect during the peri-ovulatory phase (Rebollar et al., 2017;Kimmig et al., 2021;Hromatko & Mikac, 2023), and a mood worsening throughout the luteal cycle phase (Bäckström et al., 1983;Kuehner and Nayman, 2021). However, to the best of our knowledge, no study has explicitly tested whether menstrual cycle dependent changes in emotion recognition performance are mediated by menstrual cycle dependent changes in mood. Given that mood changes along the menstrual cycle demonstrate strong inter-individual variability, this approach would even account for individual differences in menstrual cycle dependent changes in emotion recognition.
It is also worth noting, that the majority of studies reporting positive findings (compare Table 1) use a cross-sectional study design and very small group sizes, which inflates the risk for false positive results, while more subtle changes may remain undetected. The average group size in the cross-sectional studies listed in Table 1 was 15 participants. To detect a significant difference between two groups of 15 participants with 80 % power, the effect size would have to be at least 1.06 standard deviations. It would be impressive, if ovarian hormones caused such a strong change in cognitive performance within the same person in a matter of days. It is more likely, that cognitive fluctuations are more subtle and effect sizes range from weak to moderate. Furthermore, crosssectional study designs are subject to confounds, like coincidental differences in age, education or IQ, which may relate to cognitive performance. Longitudinal studies on the other hand have the advantage of subjects being their own controls in terms of IQ and demographic criteria. However, repeated cognitive testing results in a training effect, which may mask subtle changes related to ovarian hormones even when the order of cycle phases across sessions is counterbalanced. Ovarian hormones may affect performance on a task encountered for the first time, but participants may be able to adjust for difficulties due to hormonal status with training. If the training effect is strong, while hormonal effects are subtle, counterbalancing test sessions may not be enough to capture those subtle influences. Accordingly, menstrual cycle dependent changes in cognitive performance should ideally be replicated by a mixture of well-powered cross-sectional and longitudinal  studies. Interestingly, the most recent longitudinal studies included impressive sample sized, but did not find any differences in emotion recognition performance along the menstrual cycle. The largest longitudinal study on emotion recognition along the menstrual cycle included 192 participants (Shirazi et al., 2020), which allows for the detection of effects as small as 0.20 standard deviations with 80 % power. However, all but one longitudinal study to date included only 2 cycle phases, lacking either the ovulatory phase or menses phase required to demonstrate a peri-ovulatory improvement in the recognition of negative emotions.
To address the open questions summarized above and methodological concerns outlined in the previous paragraph, we conducted a largescale study on emotion recognition performance along the menstrual cycle including the low hormone menses, high estradiol peri-ovulatory, and high progesterone mid-luteal phase. First, we include both a crosssectional and longitudinal sample to compare results taking into account the advantages and potential biases associated with both designs. Second, we based our sample size on a-priori power analyses and corrected for repeated testing in order to be able to detect more subtle changes in emotion recognition performance, while simultaneously decreasing the risk for false positive results. Third, we administered a comprehensive battery of mood questionnaires in order to address the research question, whether changes in emotion recognition performance along the menstrual cycle are mediated by changes in mood along the menstrual cycle. Finally, we assessed three parameters of emotion recognition, i.e., speed, accuracy and the frequency with which a specific emotion was reported irrespective of the emotion displayed.
Based on the previous findings listed in Table 1, we hypothesize the following: 1. Emotion recognition speed does not change along the menstrual cycle. We approach this null hypothesis using a combination of frequentist and Bayesian statistics. 2. Emotion recognition accuracy decreases during the luteal phase compared to menses irrespective of the emotion displayed. 3. Emotion recognition accuracy increases during the peri-ovulatory phase compared to menses. We explore whether this association is stronger for negative compared to neutral or positive expressions. 4. The perception frequency of angry, fearful and disgusted expressions increases during the luteal phase of the menstrual cycle irrespective of the actual emotion displayed. 5. Emotion recognition changes along the menstrual cycle are mediated by emotional changes along the menstrual cycle, specifically changes in premenstrual symptoms. Pending the positive evaluation of this hypothesis, we will explore more specifically, whether: a. Changes in the recognition accuracy or frequency for sad expressions are mediated by changes in negative affect. b. Changes in the recognition accuracy or frequency for fearful expressions are mediated by changes in state anxiety. c. Changes in the recognition accuracy or frequency for angry expressions are mediated by changes in irritability.

Sample size
Prior to the experiment, power estimations were performed using G*Power 3.1.9.7. We envisioned a combined cross-sectional and longitudinal approach with a larger number of subjects participating in one test session during a certain cycle phase, and a sub-sample returning for follow-up session scheduled in the other two cycle phases. Given the well-controlled nature of the longitudinal approach, we aimed to evaluate all hypotheses in the longitudinal sample first and follow up with cross-sectional analyses for those measures that are prone to learning effects. Accordingly, for the longitudinal sample the significance level was corrected by six comparisons (speed, accuracy, frequency, affect, anxiety, PMS), resulting in p = 0.008, and for the cross-sectional sample the significance level was corrected by two comparisons (speed, accuracy), resulting in p = 0.025. Using these significance levels and assuming moderate effect sizes of f = 0.25, for the main effect of cycle phase 95 % power can be achieved with 60 participants for the longitudinal sample and with 170 participants for the cross-sectional sample. According to sensitivity analyses, these sample sizes still allow for the detection of even smaller effects sizes around f = 0.20 for the main effect of cycle phase with 80 % power. Taking into account a drop-out rate of 25 % in the longitudinal sample and roughly 10 % in the cross-sectional sample, 185 participants were recruited in total and 75 participants were invited and agreed to return for follow ups.

Inclusion criteria
All participants had been assigned female sex at birth, were aged between 18 and 35 years, right-handed, had no psychological, endocrinological or neurological disorders, did not use any medication and had not used hormonal contraception for the past six months. Participants reported a regular menstrual cycle for the past six months prior to study participation, i.e., according to the criteria of Fehring et al. (2006) their cycle length was 21 to 35 days and variation between cycles was <7 days. A minimum of three menstrual periods prior to study participation was recorded to confirm participants self-reports. Nevertheless, 16 participants (7 in the longitudinal sample) experienced a prolonged cycle of >35 days during the study and were excluded from further analysis. In combination with low progesterone levels the increased cycle length is suggestive of anovulatory cycles. Test sessions were scheduled either during menses, the peri-ovulatory or mid-luteal cycle phase according to the procedures described below. Test sessions were included as peri-ovulatory if backwards counting from the onset of next menses confirmed a cycle day between − 19 and -12 andin the longitudinal sampleestradiol levels were higher than during menses. Midluteal sessions were accepted if backwards counting confirmed a cycle day between − 11 and -3 andin the longitudinal sampleprogesterone levels were higher than during menses and the peri-ovulatory session. Based on these criteria, an additional 16 participants had to be excluded from the cross-sectional sample. In the longitudinal sample, three additional participants lost both their peri-ovulatory and luteal sessions and were excluded. In 13 participants only the peri-ovulatory session was excluded and in five only the luteal session.

Final sample
Final analyses were performed on a longitudinal sample of 65 participants (47 with three test sessions, 18 with two) and a cross-sectional sample of 153 participants. In the cross-sectional sample, the final sample size is below our aspired sample size of 170 participants, but still provides 92 % power for the main effect of cycle phase.
In both samples, the average cycle length was 29 days (SD = 2 days), mean age was 24 years (SD = 4 years) and the average IQ was 107 (SD = 10 IQ points) across all 3 cycle phases. In the cross-sectional sample, neither age, nor IQ, nor cycle length differed significantly between cycle phases (all F < 1.81, all p > 0.16). Furthermore, the cross-sectional sample was comparable in education, employment status, gender identity, sexual orientation, relationship status, parity, prior contraceptive use and smoking habits across cycle phases and to the longitudinal sample (all X 2 < 4.14, all p > 0.18; compare Table 2). The majority of participants (> 90 %) had passed general qualification for university entrance, identified as women, were heterosexual or bisexual, nulliparous and non-smokers. A little more than half of the participants were unemployed, in a relationship and had previously used hormonal contraceptives. Only one woman had ever received a diagnosis of premenstrual dysphoric disorder.

Procedure
Prior to study participation, participants filled out an online questionnaire to screen for inclusion criteria and determine cycle length, cycle irregularities and current cycle phase. Average cycle length was calculated based on the onset dates of the last 3 menstrual periods. By adding the average cycle length to the onset date of the last menstrual period, the expected onset of next menses was calculated. Since luteal phase length shows little variation across participants and cycles (Fehring et al., 2006), ovulation was assumed 14 days before the onset of next menses.
Menses sessions were scheduled on days 2 to 6 of the current menstrual cycle. To schedule peri-ovulatory sessions, participants received urinary ovulation tests (Pregnafix®) for the 5 days before the expected ovulation. For this purpose, we specifically chose ovulation tests developed for women who wish to conceive, with a low threshold for luteinizing hormone (LH) indicating positive results already 1-2 days prior to ovulation with 99 % accuracy according to the test description. Accordingly, all women did observe a positive LH surge. Positive results were confirmed from images sent by the participants by a research assistant. Peri-ovulatory sessions were scheduled within 2 days of a positive ovulation test result. Luteal sessions were scheduled earliest 3 days after the expected ovulation and up to 3 days before the expected onset of next menses. All sessions were confirmed via backwards counting from the onset of the next menstrual period after study participation, as well as salivary hormone analysis (compare hormone analysis section). For the longitudinal sample, three lab sessions were scheduled in counterbalanced order, one during menses, one during the periovulatory phase and one during the mid-luteal cycle phase. 20 participants had their first test session during menses, 22 during the periovulatory phase and 23 during the mid-luteal cycle phase.
During each test session, participants completed a health screening questionnaire, a series of cognitive tasks, including the emotion recognition task described below, as well as a series of mood questionnaires described below. Three saliva samples were collected in the beginning, middle and end of the experiment in order to assess salivary estradiol and progesterone for confirmation of cycle phases. Participants received either course credits or monetary compensation for their participation.

Emotion recognition task
During each session, participants completed one of three versions of an emotion recognition task, order counterbalanced across phases and sessions. Each version included 60 faces from the FACES database (htt p://faces.mpib-berlin.mpg.de/), 10 each displaying either a neutral expression or happiness, sadness, anger, fear or disgust (see Fig. 1 for example faces), order randomized. Each face was displayed on a computer screen for 4 s with an inter-stimulus interval of 495 ms using the stimulus presentation program OpenSesame (Mathôt et al., 2012). Participants rated each emotional expression as neutral, happy, sad, angry, fearful or disgusted by pressing the respectively marked keys on a computer keyboard. For each response, the following were recorded: (i) speed in ms, (ii) accuracy, i.e., correspondence between the response displayed and participants' ratings, (iii) the specific emotion rated irrespective of the emotion that was displayed. Prior to the first test session, participants received a short training session to familiarize them with the task and keyboard setup.

Mood questionnaires
Mood was assessed by a comprehensive battery of questionnaires combined trait and state measures. Trait measures were administered during the first session only, while state measures were administered during each test session.
The disposition towards premenstrual symptoms was assessed using the Premenstrual Symptom Screening Tool (PSST, Steiner et al., 2003), while the current strength of premenstrual symptoms during each test session was assessed using the Daily Rating of Severity of Problems (DRSP, Endicott et al., 2006). The PSST is a 20-item instrument assessing how strongly participants perceive, that symptoms like depression, anxiety, mood lability, irritability, problems concentrating, sleep problems, as well as a variety of physical symptoms recur monthly in their premenstrual phase (14 items) and how strongly they perceive these symptoms impact their daily life on a 4-point Likert scale. The PSST score was calculated as average over the 14 items assessing symptom strength. Cronbach's Alpha for the current sample was 0.80. The DRSP is a daily scoring sheet for patients to track common premenstrual symptoms and their impact on daily life on a 6-point Likert Scale. In the current study, the DRSP was adapted to include 20 items and DRSP scores during each session was calculated as average over the 9 and 5 items assessing psychological and physiological symptom strength respectively. Cronbach's Alpha for the current sample was 0.87 for psychological symptoms and 0.69 for physiological symptoms.
Depression scores were assessed using the German version of the Becks Depression Inventory (BDI, Hautzinger et al., 2006), while the Table 2 Demographic data of the samples used for analysis. Demographics were comparable between the cross-sectional sample and the longitudinal subsample. Furthermore, demographics did not differ between cycle phases in the crosssectional sample (all X 2 ˂ 4.14, all p ˃ 0.18).  Krohne et al., 1996), as well as it's Emoji version (EPANAS, Beltz et al., in press). The Emoji version was included for easier accessibility in the younger generation that our study participants were sampled from. The BDI is the most commonly used depression inventory worldwide and assesses the severity of depression. It contains 21 items covering the diagnostic criteria for depression according to the DSM-V (APA, 2013), each providing participants with a choice between 4 statements expressing varying symptom strength (0-3). For the current study, we used the raw score, calculated as sum over the 21 item responses. The PANAS is a well-validated 20-item instrument consisting of 10 positive and 10 negative affective adjectives, which has been widely used in menstrual cycle studies. Participants rate how much each adjective applies to their current mood on a 5-point Likert scale. In the current sample, Cronbach's alpha for positive affect was 0.88, while Cronbach's alpha for negative affect was 0.82. The EPANAS is a 16-item instrument consisting of 6 positive and 10 negative emojis. Participants rate how much each emoji expresses to their current mood on a 5-point Likert scale. In the current sample, Cronbach's alpha for positive affect was 0.94, while Cronbach's alpha for negative affect was 0.87. PANAS and EPANAS scores were highly correlated (positive affect: r = 0.70, p < 0.001; negative affect: r = 0.80, p < 0.001). Accordingly, a composite score for affect was calculated by averaging PANAS and EPANAS scores.
Trait anxiety was assessed using Becks Anxiety Inventory (BAI, Margraf and Ehlers, 2007), while the current level of anxiety during each test session was assessed using the state version of the German translation of the State Trait Anxiety Inventory (STAI, Laux, 1981). The BAI was designed to assess the severity of anxiety disorders. Participants rate a list of psychological and somatic anxiety symptoms according to their severity on a 4-point Likert scale. For the current study, we used the raw score, calculated as sum over the 21 item responses. The state scale of the STAI is a well-validated 20-item instrument including adjectives related to the psychological or physical symptoms of anxiety or nervousness or their opposite, i.e., relaxation. Participants rate on a 4point Likert scale, how much each adjective describes their current emotional state. In the current sample the Cronbach's alpha was 0.87.

Hormone analysis
Three saliva samples à 2 ml were collected via the passive drool method throughout each session and stored at − 20 • . Prior to analysis, samples were thawed and centrifuged twice for 15 and 10 min respectively at 3000 rounds per minute using and Eppendorf 5702 centrifuge in order to remove solid particles. The supernatant of the three samples per sessions was pooled in order to control for pulsatile hormone release throughout the day. Salivary estradiol and progesterone were assayed using Salimetrics ELISA kits. For the longitudinal sample, the three pooled samples of one participant were always analysed on the same plate. Each sample was analysed twice and in case the coefficient of variance in a sample superseded 25 %, analysis of all samples of the respective participant was repeated.

Statistical analysis
Statistical analyses were performed in R 4.2.3. To confirm the correct time window of the cycle phases selected, estradiol and progesterone values were compared between cycle phases using the lme function of the nlme package (Pinheiro et al., 2017) for the longitudinal sample, as well as the lm function of the stats package for the crosssectional sample. Significance of main effects were determined using the anova function of the stats package followed by pairwise Tukey tests using the glht function of the multcomp package (Hothorn et al., 2008). In order to test hypotheses 1-4, the following variables of interest were compared between cycle phases using the lme function of the nlme package in the longitudinal sample: (i) emotion recognition speed, (ii) emotion recognition accuracy, (iii) emotion detection frequency. In order to account for learning effects, analyses for (i) and (ii) were repeated in the crosssectional sample.
According to hypotheses 5, mediation analyses by mood were planned for those emotion recognition variables for which significant effects of menstrual cycle phase could be confirmed in the longitudinal sample. In order to confirm menstrual cycle dependent changes in mood, the following potential mediators were compared between cycle phases using the lme function of the nlme package in the longitudinal sample: (iv) premenstrual symptoms, (v) affect, (vi) state anxiety.
Specific formulas for each comparison are listed in the results section. For each analysis, continuous dependent and independent variables were scaled, such that b-values represent standardized effect sizes based on standard deviations comparable to Cohen's d. P-values for the main effect of phase were FDR-corrected for multiple comparisons across models. In case the main effect of phase was non-significant, Bayes factors in support of the null hypothesis were calculated using the lmBF function of the BayesFactor package (Morey et al., 2015). Specifically, we calculated the Bayes factor for a model without the main effect of cycle phase relative to the model including cycle phase (BF 01 ) using 10.000 iterations for Monte Carlo sampling. The prior distribution was set to medium.

Emotion recognition speed
Menstrual cycle dependent changes in emotion recognition speed were evaluated by a linear mixed effects model including cycle phase, emotion andin the longitudinal samplesession as independent variables and participant number as random factor (long: RT ~ phase*emotion + session +1|PNr; cross: RT ~ phase*emotion 1|PNr). ANOVA results revealed that in the longitudinal sample speed increased significantly with increasing number of sessions (F (1,973) = 61.63, p < 0.001), i.e., there was a significant learning effect. In both samples, there was a significant main effect of emotion (long: F (5,973) = 61.49, p < 0.001; cross: F (5,755) = 58.87, p < 0.001), but no significant main effect of cycle phase (long: F (2,973) = 1.59, p = 0.21; cross: F (2,151) = 0.12, p = 0.88) and no interaction between cycle phase and emotion (long: F (10,973) = 0.77, p = 0.66; cross: F (10,755) = 1.18, p = 0.30). After removing the phase*emotion interaction (both BF 01 > 93.85), comparing the models without the main effect of cycle phase to the respective models including cycle phase yielded Bayes Factor of BF 01 = 0.69 ± 2.67 % for the longitudinal and BF 01 = 26.65 ± 1.23 % for the cross-sectional samples. Thus, Bayes factors provide strong evidence for the null hypothesis that emotion recognition speed does not change along the menstrual cycle in the cross-sectional sample, but allow no decision in favour of any hypothesis for the longitudinal sample (compare Fig. 2A).
Reactions were fastest for neutral and happy faces, but significantly slower for negative emotions as revealed by pairwise comparisons (all b > 0.71, all z > 6.25, all p < 0.001). Among the negative emotions, fear was recognized significantly faster than any other emotion (all b > 0.45, all z > 4.00, all p < 0.001), while no significant differences were observed in reaction speed to anger, disgust and sadness (all |b| < 0.21, all |z| < 1.84, all p > 0.43). Neither estradiol nor progesterone or their interaction were related to emotion recognition speed, irrespective of the emotion displayed (all F < 1.88, all p > 0.13).

Emotion recognition accuracy
Menstrual cycle dependent changes in emotion recognition accuracy were evaluated by a linear mixed effects model including cycle phase, emotion andin the longitudinal samplesession as independent variables and participant number as random factor (long: Accuracy ~ phase*emotion + session +1|PNr; cross: Accuracy ~ phase*emotion +1|PNr). ANOVA results revealed that in the longitudinal sample accuracy improved significantly with increasing number of sessions (F (1,970) = 8.15, p = 0.004), i.e., there was a significant learning effect. In both samples, there was a significant main effect of emotion (long: F (5,970) = 53.72, p < 0.001; cross: F (5,755) = 54.24, p < 0.001), but no significant main effect of cycle phase (long: F (2,970) = 0.01, p = 0.99; cross: F (2,151) = 0.16, p = 0.85) and no interaction between cycle phase and emotion (long: F (10,970) = 0.46, p = 0.91; cross: F (10,755) = 0.68, p = 0.74). After removing the phase*emotion interaction (both BF 01 > 398.56), comparing the models without the main effect of cycle phase to the respective models including cycle phase yielded Bayes Factor of BF 01 = 54.64 ± 1.55 % for the longitudinal and BF 01 = 38.93 ± 0.93 % for the cross-sectional samples, providing very strong evidence for the null hypothesis that emotion recognition accuracy does not change along the menstrual cycle irrespective of the emotion displayed (compare Fig. 2B).
Accuracy was almost perfect for neutral and happy faces, but significantly worse for negative emotions as revealed by pairwise comparisons (all |b| > 0.40, all |z| > 3.09, all p < 0.05). Among the negative emotions, recognition accuracy followed the order: fear = anger > disgust > sadness (all |b| > 0.47, all |z| > 3.56, all p < 0.005). Neither estradiol nor progesterone or their interaction were related to emotion recognition accuracy, irrespective of the emotion displayed (all F < 0.69, all p > 0.63).
All further analyses were performed in the longitudinal sample only.

Emotion recognition frequency
Finally, we assessed how often a specific emotion was perceived, irrespective of the emotion displayed. Menstrual cycle effects on emotion recognition frequency were evaluated by a linear mixed effects model including cycle phase, emotion and session as independent variables and participant number as random factor (frequency ~ phase*emotion + session +1|PNr). ANOVA results revealed a significant main effect of emotion (F (5,973) = 28.48, p < 0.001), but no significant main effect of session (F (1,973) = 2.87, p = 0.09) or cycle phase (F (2,973) = 0.14, p = 0.87) and no interaction between cycle phase and emotion (F (10,973) = 0.68, p = 0.74). After removing the phase*emotion interaction (BF 01 > 690.05 ± 2.37 %), comparing the model without the main effect of cycle phase to a model including cycle phase yielded a Bayes Factor of BF 01 = 84.89 ± 1.89 %, providing very strong evidence for the null hypothesis that emotion recognition frequency does not change along the menstrual cycle irrespective of the emotion perceived (compare Fig. 3). Anger and disgust were perceived more often than displayed, while sadness was perceived less often than displayed. Fear, happiness and neutral expressions were perceived as often as displayed. Neither estradiol nor progesterone or their interaction were related to emotion recognition frequency, irrespective of the emotion displayed (all F < Table 3 Hormones and mood along the menstrual cycle (means ± SD). M = menses, O = peri-ovulatory phase, L = mid-luteal phase.

Longitudinal
Cross-sectional Menstrual cycle related changes in emotion recognition.

Mediation/moderation of menstrual cycle related changes in emotion recognition by mood
Given that no menstrual cycle associations to emotion recognition were observed, the mediation hypothesis was discarded. Instead, we performed exploratory analyses to address, whether the association strength between menstrual cycle and emotion recognition was moderated by mood or premenstrual symptoms, such that only participants with strong mood changes or premenstrual symptoms, show changes in emotion recognition speed, accuracy or frequency. For the respective mood measures (affect, anxiety, irritability) we selected performance measures of the corresponding emotional expressions (sadness, fear, anger) (e.g., RT_sadness ~ phase*Affect + session + 1|PNr) and used an FDR correction for multiple comparisons. There was no association between participants own mood and their recognition or perception of sadness, fear or anger in either sample (all F < 3.37, all p FDR > 0.18). Neither for speed, nor for accuracy or frequency did we observe a significant interaction between phase and mood in either sample (all F < 2.53, all p FDR > 0.24). For moderation of menstrual cycle related effects by PMS symptoms we selected overall emotion recognition performance (e.g., RT ~ phase*emotion*PMS + session +1|PNr). PMS symptom strength was not related to emotion recognition and there was no significant interaction between menstrual cycle phase and PMS symptoms for behavioral measures in either sample (all F < 1.52, all p > 0.17).

Menstrual cycle related changes in mood
Given that mood changes along the menstrual cycle were intended as mediators regarding emotion recognition changes along the menstrual cycle, we did evaluate changes in premenstrual symptoms, positive and negative affect, as well as state anxiety along the menstrual cycle in the longitudinal sample.

Discussion
The present study was designed to assess changes in emotion recognition along the menstrual cycle in a well-powered combined cross-sectional and longitudinal design. The aim of the study was (a) to replicate or disentangle previous findings associating menstrual cycle phase with emotion recognition and (b) extend previous findings by analysing whether menstrual cycle changes in emotion recognition were mediated by menstrual cycle changes in mood.
Regarding (a) we hypothesized more specifically, that emotion recognition speed would not change along the menstrual cycle, while emotion recognition accuracy would increase during the peri-ovulatory phase, but decrease during the luteal phase due to increased frequency of negative perceptions during the mid-luteal cycle phase. As confirmed by Bayesian statistics providing very strong evidence for the null hypothesis for all but one emotion recognition measure, the main result of the current study was that none of the emotion recognition parameters changed along the menstrual cycle in either the longitudinal sample or the cross-sectional sample. This supports our first hypothesis regarding emotion recognition speed, but no support for any of the hypotheses regarding emotion recognition accuracy could be obtained. While these results are in contrast to previous findings of cross-sectional studies suggesting improved emotion recognition during the peri-ovulatory and decreased emotion recognition during the luteal phase, several recent longitudinal studies with impressive sample sizes also obtained null findings regarding menstrual cycle dependent changes in emotion recognition (Rafiee et al., 2023;Á lvarez et al., 2022;Shirazi et al., 2020). Our findings regarding emotion recognition frequency also suggest that women do not over-perceive threatening emotions during the mid-luteal cycle phase as discussed as potential explanation for emotion recognition changes in the luteal cycle phase (Derntl et al., 2008b).
Of course, this result is restricted to our stimulus material, which displayed only the most basic emotions. It is possible that more subtle changes would become visible using more complex emotion recognition tasks including for example dynamic instead of static facial expressions, different face angles, a subdivision of the basic emotions or a rating of the strength of the emotion expressed. However, recent studies using more complex emotions or non-static stimulus material, i.e., film clips, did also not observe changes in emotion recognition performance along the menstrual cycle (Pahnke et al., 2019;Di Tella et al., 2020;Shirazi et al., 2020). Furthermore, we only used 10 stimuli per category. However, the number of stimuli per category was comparable to or even higher than in studies reporting positive findings (e.g., Pearson and Lewis, 2005;Derntl et al., 2008bDerntl et al., , 2013Mikolić, 2016). Finally, the fact, that comparable results were obtained in both our samples suggests that this lack of changes is not attributable to training effects due to repeated testing. Note however, that our cross-sectional sample, while being the largest cross-sectional sample investigating this research question to date, is still not as well powered as recently recommended for menstrual cycle designs (Gangestad et al., 2016).
Regarding (b) we hypothesized that in accordance with the mood congruity effect negative affect would mediate menstrual cycle dependent changes in the recognition of sadness, state anxiety would mediate menstrual cycle related changes in the recognition of fear and irritability would mediate menstrual cycle dependent changes in the recognition of anger. However, neither a mediation nor a moderation of menstrual cycle related changes in emotion recognition by any of the mood measures were observed. The changes were also not restricted to participants with substantial mood changes along the menstrual cycle, since an exploratory moderator analysis also yielded non-significant results. We conclude that emotion recognition performance remains stable along the menstrual cycle at least for the task employed in the current study irrespective of mood changes along the menstrual cycle. While all mood measures showed the expected trajectory along the menstrual cycle, these changes do not appear to be reflected in the way women perceive emotions in other peoples faces.
Our findings on mood changes along the menstrual cycle are in accordance with previous studies (e.g., Rebollar et al., 2017;Kimmig et al., 2021;Hromatko & Mikac, 2023;Kuehner and Nayman, 2021). Across questionnaires we observed the worst mood during menses and the best mood during the peri-ovulatory phase. While some previous studies did report the worst mood during the luteal phase, they often focused on the late luteal phase (e.g., Bäckström et al., 1983), which was not included in the present study. It has also been repeatedly reported, that symptoms can persist throughout the first few days of the menstrual cycle. A recent study tracking mood over 3 cycles in 60 participants found no differences in affect between the luteal and early follicular phase (Hromatko & Mikac, 2023). A few observations regarding these mood changes are interesting.
First, the fact that participants with high scores on the PSST showed the strongest changes in mood and premenstrual symptoms, further validates this instrument, though a diagnosis of PMS can only be obtained after prospective tracking of symptoms over two menstrual cycles (Yonkers et al., 2018). To the best of our knowledge, no study has so far validated the trait assessment of premenstrual symptoms with the PSST (Steiner et al., 2003) by longitudinal approaches. Interestingly, only one woman in the present study had ever received a diagnosis of premenstrual syndrome/premenstrual dysphoric disorder (PMS/PMDD), although prevalence rates are estimated to be up to 50 % for PMS (Direkvand Moghadam et al., 2013) and 15 % for PMDD (Halbreich et al., 2003). Given that many other women in the sample had similar PSST and DRSP scores to this participant, this pattern suggests that PMS and PMDD remain underdiagnosed in the German-speaking population from which the present sample was recruited.
Second, it is worth noting that mood changes are mostly attributable to a reduction in positive affect rather than an increase in negative affect. These results are in accordance with a multisite longitudinal study across two menstrual cycles, suggesting no changes in negative affect with fluctuation levels of ovarian hormones (Hengartner et al., 2017). Unfortunately, this study did not assess the associations between sex hormone levels and positive affect. A reduction in positive affect along the menstrual cycle was also reported by Alonso et al. (2004) and Hromatko & Mikac (2023).
Third, we observed some interesting hormonal association with regards to premenstrual symptoms, and by trend also anxiety symptoms. Premenstrual symptoms were negatively related to estradiol levels, suggesting that high estradiol in the luteal cycle phase protects against a mood worsening, e.g., by upholding positive affect during this phase. Given that previous studies have linked premenstrual symptoms mostly to an excess or deficiency of progesterone or heightend progesterone sensitivity (Bäckström et al., 1983;Redei and Freeman, 1995;Sundström-Poromaa et al., 2020), this finding offers an additional avenue. A similar report of protective effects of luteal estrogen against premenstrual symptoms was provided by Yen et al. (2019).
Finally, we would like to point out a few limitations of the current study. First, while the samples included in the current study were definitely larger than many of the previous studies investigating this research question, they may still not have been large enough to detect subtle interactions between mood and menstrual cycle phase as assessed in the exploratory moderation analysis (compare Gangestad et al., 2016). Second, hormone levels were assessed via salivary immunoassays, for which validity has recently been questioned (Arslan et al., 2023). While we did use hormone levels in reference to menses values and combine our hormone assessments with various other methods of menstrual cycle staging, liquid chromatographymass spectrometry on blood samples may have been able to provide more accurate measures. These considerations should be taken into account when planning future menstrual cycle studies. For instance, in order to definitively conclude that mood changes along the menstrual cycle do not affect emotion recognition, more studies using complex non-static stimuli might be informative.
In summary, the main finding of our study is the very strong evidence that the recognition of basic emotions does not change along the menstrual cycle, even though mood does. Mood changes were attributable to a flattening of positive affect rather than increase in negative affect and stronger premenstrual symptoms related to lower estradiol levels.

Ethics statement
The study was approved by the University of Salzburg's ethic committee and conforms to the Code of Ethics of the World Medical Association (Declaration of Helsinki). All participants gave their signed written consent to participate in the study.

Funding sources
This study was in part funded by the ERC Starting Grant BECONTRA (850953).

Data availability
Data and scripts are openly available at https://osf.io/dmhrw/.