Neuroimaging the menstrual cycle_ A multimodal systematic review

Increasing evidence indicates that ovarian hormones affect brain structure, chemistry and function of women in their reproductive age, potentially shaping their behavior and mental health. Throughout the reproductive years, estrogens and progesterone levels fluctuate across the menstrual cycle and can modulate neural circuits involved in affective and cognitive processes. Here, we review seventy-seven neuroimaging studies and provide a comprehensive and data-driven evaluation of the accumulating evidence on brain plasticity associated with endogenous ovarian hormone fluctuations in naturally cycling women (n =1304). The results particularly suggest modulatory effects of ovarian hormones fluctuations on the reactivity and structure of cortico-limbic brain regions. These findings highlight the importance of performing multimodal neuroimaging studies on neural correlates of systematic ovarian hormone fluctuations in naturally cycling women based on careful menstrual cycle staging.


Introduction
Women of reproductive age represent approximately 49.7% of the worldwide female population and 24.6% of the total population (United Nations, 2019b). Among these women, about 58% are naturally cycling (United Nations, 2019a) and undergo the physiological estradiol (E2) and progesterone (P4) fluctuations that define the menstrual cycle (Roos et al., 2015). A typical menstrual cycle is 28-32 days long, and starts with a follicular phase (FP, 12-14 days) characterized by increasing E2 concentration reaching a pre-ovulatory peak and low P4 levels. The subsequent luteal phase (LP, 12-14 days) is characterized by a progressive increase of P4 concentrations and a lower secondary E2 peak followed by a decrease of both hormone levels in the last days of the menstrual cycle (Abraham et al., 1972). Through their widespread classical nuclear E2 α and β, P4 A and B receptors, and membraneassociated E2 and P4 receptors (Brinton et al., 2008;Osterlund and Hurd, 2001), these hormones hold the potential to modulate brain structure, chemistry and function Catenaccio et al., 2016;Rehbein et al., 2020), and shape the behavior and mental health of women in their reproductive age (Zsido et al., 2017). The highest T concentrations of E2 and P4 receptors are found, as mainly demonstrated by animal studies, in regions often construed as being part of the limbic system such as the amygdala, hippocampus, thalamus, and hypothalamus, although they are also expressed in the cerebral cortex to a lesser extent (Brinton et al., 2008;Osterlund and Hurd, 2001). Considering that ovarian hormones can influence glutamatergic, GA-BAergic, dopaminergic, and serotoninergic systems, this broad expression of E2 and P4 receptors in the brain may be of particular relevance to affective and cognitive processes Zsido et al., 2017). Thus, it is likely that, in healthy naturally cycling women, neuroadaptive mechanisms arise monthly to modulate brain structure and function in response to the hormonal fluctuations across the menstrual cycle.
Current neuroimaging techniques constitute useful tools to evaluate in vivo brain structural (MRI and DTI), functional (fMRI, resting state-fMRI) and molecular (PET, SPECT) changes associated with menstrual cycle-related hormonal fluctuations. Hence, the effects of E2 and P4 on the brain have increasingly been explored in neuroimaging studies, and associations have been reported between these hormones and behavioral correlates of affective and cognitive processes . Remarkable attempts have been made to summarize the accumulated evidence for menstrual cycle effects on the brain, by either focusing on structural MR studies (Catenaccio et al., 2016;Rehbein et al., 2020), functional MR studies Toffoletto et al., 2014), both structural and functional recent MR studies on cognitive processes (Beltz and Moser, 2020), or results obtained from different imaging modalities in healthy women and women with premenstrual dysphoric disorder (PMDD) (Comasco and Sundstrom-Poromaa, 2015;Dubol, 2020). However, our understanding of the neurobiological mechanisms underlying ovarian hormones' influence on the brain throughout the reproductive life (Beltz and Moser, 2020;Catenaccio et al., 2016;Comasco and Sundstrom-Poromaa, 2015;Rehbein et al., 2020;Toffoletto et al., 2014), as well as in the presence of mental illness (Comasco and Sundstrom-Poromaa, 2015;Moses-Kolko et al., 2014;Stickel et al., 2019) remains limited.
To characterize the impact of physiological variations in ovarian hormones concentration on the human brain via a systematic review has important implications, as a potential bias can be introduced in studies including women in different menstrual cycle phases (Fehring et al., 2006). To date, studies of the menstrual cycle diverge in terms of methodology applied for hormonal assessment, menstrual cycle phase comparisons, neuroimaging techniques and analyses. Therefore, it is not clear whether and how specific brain regions are affected by the menstrual cycle in healthy naturally cycling women in terms of brain structure, functional networks and chemistry. Thus, there is a need for an integrative view of the findings accumulated across imaging modalities in order to provide a clearer and more consistent picture of the neuroplastic changes associated with hormonal changes throughout the menstrual cycle. The present systematic literature review aimed to provide an up-to-date comprehensive and integrative summary of the multimodal neuroimaging findings on structural, functional and molecular changes related to hormonal fluctuations throughout the menstrual cycle in healthy naturally cycling women, provided that menstrual cycle phase confirmation through biological assays was ensured. Furthermore, we highlight the relevance of whole-brain coordinatebased findings as particularly significant evidence, and provide a systematic quality assessment of the reviewed studies. The present review will provide the rationale for future multimodal neuroimaging studies of the menstrual cycle, inform about adequate study design, and serve as basis for future meta-analyses.

Methods
According to PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) guidelines (Moher et al., 2009), we performed a PubMed/MEDLINE search using the following terms: "menstrual cycle", "sex hormones", "estrogen", "progesterone", "neuroimaging", "magnetic resonance imaging", "diffusion tensor imaging", "positron emission tomography", "single photon emission computed tomography", "MR spectroscopy", and relevant abbreviations or variations. For functional neuroimaging studies, we used the additional keywords "emotion", "cognition" and "reward". Additionally, references cited in the retrieved articles were screened in order to find relevant studies that were missed during the database search. The literature screening included studies published until July 2020 and is presented as a flowchart in the supplementary material (Fig. S1). Following title and abstract screening, we excluded papers upon full-text review if they failed to meet the following criteria: (1) neuroimaging study; (2) cross-sectional, prospective, retrospective, case-control or randomized controlled trials study designs; (3) confirmation of menstrual cycle phase through the analysis of blood, salivary or urinary hormones levels; (4) healthy naturally cycling women included in the study (5) English language.
For each study we extracted the following information when available: sample size, mean age, menstrual cycle phase, type of hormonal assay, scanning modality, functional task for fMRI studies or brain imaging technique for molecular imaging studies, brain imaging analysis (i.e. whole brain and/or regions of interest (ROIs)), peak loci of brain differences/changes during the menstrual cycle and correlations between brain imaging and hormonal levels (Tables 1-5). It is worth noting that seventeen fMRI studies investigating E2 and P4 effects on emotional and cognitive processes have been previously reviewed by the team (Toffoletto et al., 2014), and twenty-three additional fMRI publications were included in the present review. We reported brain regions significantly different between menstrual cycle phases, or correlated with ovarian hormone levels, as identified by the original statistical analyses, without any restrictions regarding statistical thresholds or correction for multiple comparisons. Nonetheless we indicate whether the results were corrected for multiple comparisons or not .
In order to provide the precise localization of the most consistent results reporting menstrual cycle effects in the brain while excluding bias related to the use of ROIs, we conducted a dedicated review of the voxel-based findings obtained from hypothesis-free neuroimaging analyses. When available, we extracted the peak voxel coordinates of the brain regions that were consistently reported across brain imaging modalities (supplementary Tables 1-4). To increase the comparability between studies, coordinates given in Talairach space were converted into Montreal Neurological Institute (MNI) space using the conversion tool implemented in GingerALE 3.0.2 (Laird et al., 2010) (http://www. brainmap.org/). The anatomical brain regions corresponding to the extracted MNI coordinates were localized using Anatomical Automatic Labeling (AAL) in WFU PickAtlas Toolbox in Statistical Parametric Mapping (Tzourio-Mazoyer et al., 2002), and reported in the Supplementary Tables 1-4. In case AAL labeling was not applicable, coordinates were excluded.
To estimate the quality of the studies behind the reviewed findings, we followed the Grades of Recommendation, Assessment, Development, and Evaluation (GRADE) criteria (Brozek et al., 2009), including study design, study limitations (risk of bias), inconsistency of results, indirectness of evidence, and imprecision. Initial level of confidence was defined according to the study design, observational studies being associated with a low confidence and interventional (e.g. randomized controlled studies) studies being associated with a high confidence. Factors raising confidence included the exclusion of any brainrelated disorder through clinical ratings, confirmation of menstrual cycle phase through direct E2 and P4 assays, randomized menstrual cycle phase testing to prevent order effects, correction for multiple testing, multimodal neuroimaging analyses, whole brain analysis, and correlation analyses between imaging and behavioral measures. Factors lowering confidence included failure to control for brain-related disorders, the order of menstrual cycle phase assessments and confounding variables (e.g. age, total brain volume), the use of a ROI approach alone, absence of correction for multiple testing, no direct E2 and/or P4 assay (allopregnanolone, ALLO, and/or luteinizing hormone, LH, assays), and sample size (small, n < 100 and very small n < 25 subjects). In addition, we considered the test-retest reproducibility of brain imaging measures as factors raising or lowering confidence, based on previous evidence showing good reliability of structural MRI, resting-state fMRI, MR spectroscopy and PET/SPECT measures and poor reliability of task-based fMRI measures (Alakurtti et al., 2015;Elliott et al., 2020;Hirvonen et al., 2009Hirvonen et al., , 2007Kim et al., 2006;Lundberg et al., 2006;Shungu et al., 2016;Staley et al., 2005;Terpstra et al., 2016). To account for gradual effects, we attributed "very low", "low", "moderate", and "high" estimates (−2, −1, 0, and 1, respectively) to the factors raising or lowering confidence, depending on their influence on the level of confidence. The final level of confidence was based on the initial level of confidence and the number and value of factors raising or lowering confidence for each study included in the review. In cases where the difference between the number of factors raising and lowering confidence was 2 or higher, the level of confidence was raised or lowered by one confidence category, accordingly. In cases where the difference between the number of factors raising and lowering confidence was 4 or higher, the level of confidence was raised or lowered by two confidence categories. The studies were evaluated by four coders. The final quality estimates of the reviewed studies following agreement between the coders are illustrated in Fig. 2, and a detailed summary is provided in Supplementary Table 5.

Descriptive characteristics
Following literature screening, a total of 1795 citations were identified and reviewed. The selection process yielded seventy-seven relevant publications (Fig. S1), gathering a total of 1304 naturally cycling women (age range 16-49 years; sample size range 1-90). Average sample size was twenty women per study (excluding four studies with repeated measurements on one individual). In eight instances, samples overlap across several publications Franke et al., 2015;Gingnell et al., 2014Gingnell et al., , 2013Gingnell et al., , 2012Hagemann et al., 2011;Petersen et al., 2018Petersen et al., , 2019Petersen et al., , 2014Petersen et al., , 2015Pritschet et al., 2020;Taylor et al., 2020;Thimm et al., 2014;van Wingen et al., 2007van Wingen et al., , 2008Weis et al., 2011Weis et al., , 2008Weis et al., , 2017. However, because different methodological approaches were used, the results of all these articles were included in the systematic review. Regarding task-related fMRI, thirty-two different tasks have been employed, with no more than three studies assessing the same one; about half being affective and half being cognitive tasks. A summary of the functional tasks included in the reviews is presented in supplementary table 6. Although affect and cognition are functional domains being closely interlinked and involving common brain regions (Pessoa, 2008), we present the results in terms of affective and cognitive processing separately for clarity purposes, based on the definitions provided by the American Psychological Association, the type of stimulus presented and the comparisons made in each study.
Descriptive characteristics of the reviewed studies are illustrated in Fig. 1 and described in detail for each study in Tables 1-5, along with information on the methodology applied for hormonal assessment, menstrual cycle phase comparisons, neuroimaging techniques and analyses, and summaries of results. The quality estimates of the studies included in the review according to the GRADE criteria are illustrated in detail in the supplementary table 5 and summarized in Fig. 2. In sum, most of the studies were attributed with a low (48.0%) and very low (31.2%) level of confidence, 11.7% with a moderate level of confidence, and a minority of studies (9.1%) with a high level of confidence. However, it is important to note that an initial low confidence level was assigned to observational studies, which represent 94.8% of the studies included in the review. In addition, all reviewed studies included small (81.8%) and very small (18.2%) samples of women, further lowering the level of confidence. Moreover, quality estimates were negatively impacted by the absence of randomized timing assessments across the menstrual cycle (18.2%), confirmation of menstrual cycle phase through indirect hormonal measures (ALLO and/or LH assays, 7.8%), no exclusion of brain-related disorders (18.2%), low test-retest reliability of brain measurements (54.5%), the use of a ROI approach only (45.4%), and the absence of correction for confounding variables (84.4%) and multiple testing (28.6%).

Coordinate-based findings
In order to summarize the reviewed findings with precise localization in the brain, we provide here a summary of the most consistent results reporting menstrual cycle effect by mean of brain coordinates in MNI space reported in the supplementary tables 1-4.
Brain structure. Across structural studies providing coordinates, variations related to the menstrual cycle were reported primarily in the hippocampus, insula, and cerebellum (supplementary table 2). Thus, studies consistently reported an increased grey matter volume in the hippocampus during the late FP compared to the early FP and the mid-LP (Lisofsky et al., 2015b), and a positive correlation between E2 levels and both hippocampal grey matter volume and fractional anisotropy measures (Barth et al., 2016). Grey matter volume in the insula seems to follow the same pattern, a larger insula being associated with the late FP (De Bondt et al., 2016), and positively correlated with E2 levels (De Bondt et al., 2013a). Conversely, a reduction of grey matter volume in the cerebellum was reported from the late FP to the mid-LP, along with a positive correlation with E2 levels (Lisofsky et al., 2015b), and a negative association with P4 levels (De Bondt et al., 2016). Of note, coordinate-based findings include variations in grey matter volumes in the ACC, fusiform gyrus and inferior parietal lobule across several studies as well, although the direction of effect appears less consistent (supplementary table 2).
Functional activation. At the functional level, task-based fMRI studies provided the highest number of coordinate-based findings and primarily revealed menstrual cycle effects on the brain reactivity of the hippocampus, ACC, and prefrontal regions (supplementary table 2). The most consistent result relates to an enhanced brain reactivity during affective processing in the hippocampus during the late FP and the mid-LP compared to the early FP and the late LP, reported across five studies (Albert et al., 2015;Andreano and Cahill, 2010;Bayer et al., 2014;Frank et al., 2010;Goldstein et al., 2005). In line with these findings, positive correlations were reported between the hippocampus BOLD response during an affective task and the concentrations of E2 during the mid-FP and P4 during the mid-LP (Dreher et al., 2007). Furthermore, the coordinate-based findings from a hormonal suppression study (van Wingen et al., 2008) follows the same pattern, showing greater brain reactivity in the hippocampus during affective processing after E2 and P4 add-back compared to placebo. In prefrontal regions (inferior, middle and superior frontal gyri), menstrual cycle effects on brain reactivity were reported through coordinates across eighteen studies (supplementary table 2). Across the inferior, middle and superior frontal gyri (IFG, MFG, SFG), coordinate-based results point to an increased BOLD response during cognitive processing in the early FP (Bayer et al., 2013;Pletzer et al., 2013;Thimm et al., 2014;Weis et al., 2011Weis et al., , 2008. Furthermore, the MFG and SFG showed an increased BOLD response during affective processing in the mid-LP (Abler et al., 2013;Amin et al., 2006;Dreher et al., 2007). In the IFG, brain reactivity during affective processing appeared elevated in the mid-FP (Protopopescu et al., 2005), and positively correlated with E2 levels (Dreher et al., 2007;Henningsson et al., 2015). Variations in the ACC BOLD response to affective and cognitive processing across the menstrual cycle were reported by eight studies providing coordinates (supplementary table 2). Among these studies, the most consistent Sunburst charts illustrate the distribution of imaging modalities and type of brain measure across the reviewed studies (left) and type of biological assay carried out to confirm menstrual cycle phases through the measurement of hormonal concentrations (Top right). Pie charts illustrating the distribution of the reviewed studies in terms of study design, assessment timing, and menstrual cycle phase comparisons (bottom, from left to right), as well as neuroimaging analysis (top, center). Additive percentages > 100 indicate overlap between the categories. "Mixture model" refers to a modified mixture model cluster approach applied to compute grey matter volumes. Abbreviations: Ach, acetylcholine; AHSH, automatic segmentation of hippocampal subfields; ALFF, amplitude of low-frequency fluctuations; ALLO, allopregnanolone; BOLD, blood oxygen level dependent; DA, dopamine; DTI, diffusion tensor imaging; DWI, diffusion weighted imaging; EC, eigenvector centrality mapping; E2, estradiol; FC, functional connectivity; FP, follicular phase; fMRI, task-based functional MRI, ICA, independent component analysis; LH, luteinizing hormone; LP, luteal phase; MRS, magnetic resonance spectroscopy; OVU, peri-ovulatory phase, PET, positron emission tomography; P4, progesterone, ROI, region of interest analysis; rs-fMRI, resting-state fMRI; SPECT, single photon emission computed tomography; VBM, voxel-based morphometry; SBM, surface-based morphometry; WBA, whole-brain analysis; 5HT, serotonin.

Fig. 2.
Quality estimate of the reviewed findings according to the GRADE criteria. The Grades of Recommendation, Assessment, Development, and Evaluation (GRADE) criteria include study design, study limitations (risk of bias), inconsistency of results, indirectness of evidence, and imprecision. Initial level of confidence was defined according to the study design, observational studies being associated with a low confidence and interventional studies being associated with a very high confidence. Factors raising confidence as highlighted in green and illustrated by upwards pointing arrows, while factors lowering confidence are highlighted in red and illustrated by downwards pointing arrows. A white background indicates non-applicable criteria. Factors increasing confidence included the exclusion of any brain-related disorder through clinical ratings, confirmation of menstrual cycle phase through direct E2 and P4 assays, randomized menstrual cycle phase testing to prevent order effects, adjustment to confounding variables, correction for multiple testing, multimodal neuroimaging analyses, a good test-retest reproducibility of brain imaging measures, exploratory whole brain analysis, and association with behavioral measures. Factors lowering confidence included failure to control for brain-related disorders, the order of menstrual cycle phase assessments and confounding variables the use of a ROI approach alone, poor test-retest reproducibility of brain imaging measures, absence of correction for multiple testing, no direct E2 and/or P4 assay and sample size (small, n < 100 and very small n < 25 subjects). The final level of confidence was based on the initial level of confidence and the number of factors raising or lowering confidence for each study included in the review. Very low, low, moderate, and high confidence levels are highlighted in dark red, light red, yellow, and light green, respectively. A detailed summary of quality estimates is provided in Supplementary Table 5. M. Dubol, et al. Frontiers in Neuroendocrinology 60 (2021) 100878 Dubol, et al. Frontiers in Neuroendocrinology 60 (2021) 100878 finding relates to an increase of the brain reactivity of the ACC during the mid-LP compared to the early and mid FP, reported by four studies (Amin et al., 2006;Diekhof and Ratnayake, 2016;Schoning et al., 2007;Thimm et al., 2014). In addition, coordinate-based findings include variations in the BOLD response of the amygdala, middle cingulate cortex (MCC), fusiform gyrus, inferior and middle temporal gyrus (ITG, MTG), postcentral gyrus, inferior parietal lobule, insula and basal ganglia across several studies as well, although the direction of effect appears less consistent (supplementary table 2).
Resting-state functional connectivity. Regarding resting-state studies, the coordinate-based findings gathered across three studies point to a greater functional connectivity of the middle frontal gyrus (MFG) with the ACC, the amygdala and the fronto-parietal network during the mid-LP (Engman et al., 2018;Pletzer et al., 2016), along with a positive correlation between E2 levels and the functional connectivity between the MFG and the default mode network (DMN) (De Bondt et al., 2015b). Four studies reported menstrual cycle effects on the functional connectivity of the IPL as well, albeit not reaching a consistent pattern of results (supplementary table 2). Fig. 3 illustrates the main structural and functional variations reported across the menstrual cycle by the reviewed studies, while a thorough description of the results by neuroimaging modality is presented as supplementary material. Findings on key regions (i.e. hippocampus, amygdala, anterior cingulate cortex, insula, inferior parietal lobule, and prefrontal cortex) are here critically reviewed across neuroimaging modalities.

Hippocampus
In line with the coordinate-based findings, variations in the structure, function and connectivity of the hippocampus have been reported among the reviewed studies, and associated with E2 and P4 fluctuations throughout the menstrual cycle. Particularly, the FP seems to be associated with an increased hippocampal volume (Lisofsky et al., 2015b;Pletzer et al., 2018), a greater hippocampal BOLD response during affective (Albert et al., 2015;Frank et al., 2010;Goldstein et al., 2005) and cognitive (Pletzer et al., 2019) processing. In line with these observations, positive correlations between E2 concentrations and the hippocampus grey matter volume, white matter integrity, and activity during affective and visuospatial processing have been repeatedly reported (Albert et al., 2015;Barth et al., 2016;De Bondt et al., 2013b;Dreher et al., 2007;Lisofsky et al., 2015b;Pletzer et al., 2019). While two of these studies were assigned with a very low confidence estimate (De Bondt et al., 2013b;Frank et al., 2010), it is noteworthy that this multimodal observation was based on findings rated with moderate (Lisofsky et al., 2015b) and high (Pletzer et al., 2018) confidence as well. Interestingly, elevated hippocampal activations in the mid-LP have been detected during the processing of images of negative valence (Andreano and Cahill, 2010;Bayer et al., 2014), along with a higher functional connectivity at rest with the whole brain (Hidalgo-Lopez et al., 2020). Interestingly, these observations are consistent with reports of associations between P4 levels and both the hippocampal BOLD signal during a facial recognition task (van Wingen et al., 2007) and a reward task (Dreher et al., 2007) and the functional connectivity between the hippocampus and the DLPFC (Arelin et al., 2015). Overall, reports of elevated hippocampal reactivity in the mid-LP were associated with a moderate confidence level (supplementary table 5, Fig. 2).

Amygdala
Similar to the hippocampus, the amygdala displayed structural and functional variations throughout the menstrual cycle, as well as associations between brain features and ovarian hormones levels. An elevated grey matter volume of the amygdala was shown during the late LP, compared to the late FP (Ossewaarde et al., 2013). Functional

Table 2
Diffusion Weighted Imaging studies of menstrual cycle effect on white matter integrity in healthy naturally cycling women.   M. Dubol, et al. Frontiers in Neuroendocrinology 60 (2021) 100878 reports of menstrual cycle phase effects in the amygdala appear more heterogeneous, possibly due to the low reliability of task-based fMRI findings (supplementary table 5, Fig. 2). The most reliable findings emerged from two pharmacological studies rated with a high confidence, showing that short-term exposition to P4 was associated with a greater reactivity of the amygdala during the processing of facial expressions (van Wingen et al., 2008), and a reduced reactivity during memory encoding (van Wingen et al., 2007). During the emotion recognition task, exposition to P4 was associated with an increased functional connectivity between the amygdala and the ACC, and a decreased connectivity between the amygdala and the fusiform gyrus (van Wingen et al., 2008). Studies with a low to moderate confidence estimate point to variations in the functional connectivity of the amygdala throughout the menstrual cycle. Thus, during the late LP, the functional connectivity at rest between the amygdala and the posterior and middle cingulate cortex, angular gyrus and middle temporal cortex was lower than in the mid-FP (Petersen et al., 2019). This phase was also associated with negative correlations between ovarian hormones levels and the functional connectivity of the amygdala with the orbitofrontal cortex (OFC) during amusement (Dan et al., 2018). During the mid-LP, an increased BOLD response to negative pictures was found in the amygdala (Andreano and Cahill, 2010;Bayer et al., 2014), along with a stronger functional connectivity at rest between the amygdala and both frontal and cerebellar regions (Engman et al., 2018).

ACC
According to the quality estimates of the reviewed findings, the most significant result suggesting menstrual cycle effects on the anterior cingulate cortex (ACC) is the increased reactivity to negative facial expression following hormonal suppression through GnRHa treatment . In line with this, a greater ACC BOLD response to negatively valenced pictures, loss anticipation, and response inhibition was reported in the early FP, characterized by very low levels of E2 and P4, compared to the mid-LP (Bayer et al., 2013(Bayer et al., , 2014Thimm et al., 2014). During this phase, a stronger functional connectivity between the ACC and the executive control network was found, in comparison with the mid-LP (Petersen et al., 2014). Conversely, the ACC showed an increased BOLD response to affective (Abler et al., 2013;Arnoni-Bauer et al., 2017;Diekhof and Ratnayake, 2016;Dreher et al., 2007) and cognitive (Amin et al., 2006;Schoning et al., 2007) stimuli during the mid-LP compared to the FP. In the mid-LP compared to the early FP, a stronger functional connectivity at rest was found between the ACC and both the middle frontal gyrus and the superior temporal gyrus (Engman et al., 2018). Of note, while the findings showing higher ACC reactivity and connectivity during the early FP and following hormonal suppression were assigned with a moderate confidence in average, opposite observations suggesting higher reactivity and connectivity of the ACC during the mid-LP were rated with a low confidence. At the structural level, while one study reported a reduced grey matter volume in the ACC during the mid-LP (De Bondt et al., 2013a), a positive association between the volume of the ACC and P4 concentrations, and a negative association between cortical thickness of the ACC and E2 levels were reported both in the early FP (De Bondt et al., 2016;Petersen et al., 2015). However, these findings are associated with a low confidence rating.

Insula
In line with the coordinate-based results, an increased insular grey matter volume has been observed in the late FP compared to both the early FP and the mid-LP (De Bondt et al., 2016;Lisofsky et al., 2015b). These findings were assigned with a moderate confidence level, while one study reporting a higher grey matter volume in the insula during the early FP compared to the mid-LP along with a positive correlation with E2 concentrations (De Bondt et al., 2013a) only rated with a low confidence level. As for the ACC, the most significant result suggesting menstrual cycle effects on the insula relates to an increased reactivity to       M. Dubol, et al. Frontiers in Neuroendocrinology 60 (2021) 100878 1,2,3,4,5,6,7,8,9,10 Overlapping samples. negative facial expression following hormonal suppression through GnRHa treatment . Although assigned with low confidence level, findings from cognitive fMRI studies point to an increase of insular reactivity across the FP and an association with E2 (Joseph et al., 2012;Protopopescu et al., 2005;Thimm et al., 2014). Conversely, in the mid-LP compared to the mid-FP, elevated insular activations during reward anticipation and a positive association with P4 levels have been reported (Arnoni-Bauer et al., 2017; Dreher et al., 2007), as well as an elevated glucose metabolism in this region (Reiman et al., 1996). However, these findings were attributed with a low confidence as well.

PFC
Although the prefrontal cortex (PFC) gathered the highest number of findings related to menstrual cycle effects across the reviewed studies, the variety of prefrontal sub-regions defined in the different Up-right and inverted triangles indicate increases and decreases of functional or structural brain measures, respectively. Functional variations are denoted by the letter "f" and characterized by changes in BOLD signal intensity. Structural variations are denoted by the letter "s" and characterized by changes in grey matter volume. Blue, light green and dark green triangles describe variations observed in comparisons to the early, middle and late follicular phases, respectively. The structural findings were based on eight studies reporting effects of the menstrual cycle on grey matter. The functional findings were based on the thirty-seven taskrelated functional MRI studies of affective and cognitive processing reporting effects of the menstrual cycle on BOLD signal. Abbreviations: ACC = anterior cingulate cortex, Amg = amygdala, Cb = cerebellum, E2 = 17 β-estradiol, FP = follicular phase, FSH = follicle-stimulating hormone, FuG = fusiform gyrus, HC = hippocampus, IFG = inferior frontal gyrus, Ins = insula, IPL = inferior parietal lobe, LH = luteinizing hormone, MFG = middle frontal gyrus, P4 = progesterone, PCL = paracentral lobule, SFG = superior frontal gyrus.
M. Dubol, et al. Frontiers in Neuroendocrinology 60 (2021) 100878 reports and the discrepancy of the results considerably reduce the chance of detecting specific patterns of hormonal influence on these regions. Nevertheless, two pharmacological fMRI findings stand out from the reviewed studies, as they were assigned with a high confidence level van Wingen et al., 2007). These results suggest a positive association between the reduction in E2 levels and the reduced ventrolateral PFC reactivity to emotional facial expressions following ovarian hormone suppression , and a reduced reactivity of the IFG during face recognition after a single P4 administration during the early FP (van Wingen et al., 2007). Overall, fMRI studies investigating affective and cognitive processing were rated with a low confidence level, and showed greater prefrontal reactivity during both menstrual cycle phases. Among these studies, the most convergent finding relates to an enhanced prefrontal BOLD response to cognitive processing in the mid-LP, associated with both E2 and P4 levels (Amin et al., 2006;Fernandez et al., 2003;Konrad et al., 2008;Pletzer et al., 2019;Protopopescu et al., 2005;Schoning et al., 2007). In line with this, studies rated with a high confidence reported positive correlations between P4 levels and the prefrontal functional connectivity at rest with temporal regions, mainly involving the dorsolateral PFC (Arelin et al., 2015;Syan et al., 2017). Similarly, the functional connectivity of the medial and orbital PFC within the frontoparietal network and the default mode network correlated positively with allopregnanolone levels (Syan et al., 2017). Resting-state fMRI studies suggest an influence of E2 on the functional connectivity of prefrontal regions as well, although the findings appear less consistent (De Bondt et al., 2015b;Pletzer et al., 2016;Weis et al., 2017). In contrast, structural findings point to a neurotrophic influence of E2 on prefrontal regions, as positive correlations were found between E2 levels and both prefrontal grey matter volume and cortical thickness (De Bondt et al., 2013a;Lisofsky et al., 2015b;Petersen et al., 2015).

IPL
The most robust findings including the inferior parietal lobule (IPL) in menstrual cycle effects emerged from structural and functional connectivity analyses, and were assigned with an average moderate confidence level. Thus, grey matter volume in this region was positively correlated with E2 levels in the early FP (De Bondt et al., 2016). Likewise, E2 concentrations correlated positively with the intrinsic functional connectivity of the IPL (Hidalgo-Lopez et al., 2020), the functional connectivity between the IPL and the executive control network (ECN) (De Bondt et al., 2015b), and the functional connectivity between the IPL and the IFG during sadness induction (Dan et al., 2018). In line with this, greater functional connectivity was found in the FP compared to the mid-LP, within the IPL and between the IPL and the DMN (Hidalgo-Lopez et al., 2020;Petersen et al., 2015). Additional evidence from task-based fMRI studies rated with a very low to low confidence level appears particularly consistent with the aforementioned findings, suggesting an increased reactivity of the IPL during cognitive processing in the FP compared to the LP (Pletzer et al., 2013;Schoning et al., 2007;Weis et al., 2011), along with a positive correlation with E2 levels (Schoning et al., 2007). In addition, a stronger functional connectivity between the IPL and the MFG during response inhibition was reported in the early FP compared to the mid-LP as well (Thimm et al., 2014).
Remarkably, negative findings suggesting no menstrual cycle effects on brain function and chemistry were reported in eleven out of the seventy-seven reviewed studies. While the majority of these reports includes molecular imaging studies and were assigned with low and very low confidence estimates (supplementary table 5, Fig. 2), two PET studies associated with either a moderate or high confidence level reported negative results. Thus, glucose metabolism over the whole brain did not differ between the mid-FP and the late LP (Rapkin et al., 2011), and no significant influence of ovarian hormones suppression on the availability of serotonin transporters (SERT) was found . The latter finding is consistent with two other PET studies reporting no menstrual cycle effect on SERT availability (Best et al., 2005;Jovanovic et al., 2009), although a negative association between SERT binding and ALLO levels was found as well, in prefrontal regions, basal ganglia, insula, hippocampus and posterior cingulate cortex . In addition, one rs-fMRI study attributed with a moderate confidence level did not detect any menstrual cycle effects on the functional connectivity of fronto-parietal networks (Hjelmervik et al., 2014).

Relation to behavior
Interestingly, while most studies did not investigate the relationship between brain changes and behavioral changes occurring during the menstrual cycle, some reports indicate that menstrual cycle-related structural and functional brain plasticity relates to behavioral measures and symptom ratings. For instance, during the mid-LP, grey matter volume was found to be negatively correlated to the psychosocial premenstrual symptoms in the MFG and cerebellum, and positively correlated to the somatic premenstrual symptoms in the precentral gyrus (De Bondt et al., 2016). Similarly, a positive relationship was found between increased amygdala grey matter volume in the late LP and increased rating of stress-induced negative affect during this phase of the menstrual cycle (Ossewaarde et al., 2013). At the functional level, the increased ventral striatum activity from the late FP to the late LP during a monetary reward task correlated positively with the change in Menstrual Distress Questionnaire scores including the "negative affect", "behavioral changes", and "control" subscales (Ossewaarde et al., 2011). In addition, subjective distress scores were inversely associated with bilateral hippocampal activity during psychosocial stress and E2 levels during the FP (Albert et al., 2015). Severity of depressive symptoms after GnRHa treatment has also been positively associated to neocortical 5HTT availability  and BOLD response to emotional facial expressions in the amygdala, insula and ACC . Likewise, an interaction between amygdala reactivity to emotional facial expressions, P4 treatment, and mood ratings has been reported, showing a positive relationship between BOLD signal in the amygdala and positive mood items such as alertness and contentedness in the placebo group only (van Wingen et al., 2008). After P4 treatment, positive correlations have also been reported between the fusiform gyrus, IFG and amygdala decreased activity during memory encoding and recognition of faces, and the decrease in memory performance (van Wingen et al., 2007). Furthermore, women who experienced greater subjective arousal in response to the perception of emotionally salient pictures displayed higher BOLD responses in the hippocampus during memory encoding and retrieval, irrespective of the menstrual cycle phase (Bayer et al., 2014). From the early to the late FP, changes in working memory performance was associated to changes in IFG, MFG and cerebellum BOLD response (Joseph et al., 2012), while mental rotation performance was associated to changes in SFG and SPL BOLD response, which correlated with the change in E2 levels (Zhu et al., 2015). Moreover, across the menstrual cycle, right and left hippocampus activations have been positively associated with navigation and verbal fluency performance respectively, while the left DLPFC correlated positively with the verbal fluency performance and negatively with the navigation performance (Pletzer et al., 2019). Similarly, BOLD signal in the caudate nucleus correlated negatively with navigation scores (Pletzer et al., 2019). Finally, sensitivity to punishment was positively correlated to the ACC BOLD response to negative feedback during a probabilistic learning task in the mid-LP, while in the FP, a greater mOFC BOLD response to positive feedback was associated with a higher sensitivity to rewards (Diekhof and Ratnayake, 2016). Thus, these neuroimaging findings suggest that hormonal fluctuations during the menstrual cycle have an influence on brain structure, chemistry and function, including affective and cognitive processing, which in turn impact behaviors.

Discussion
Comprehensiveness is the major strength of the present review as, for the first time, findings of seventy-seven studies across neuroimaging modalities on menstrual cycle neural correlates are systematically summarized, thus providing an integrated overview of how endogenous ovarian hormone fluctuation modulates the reactivity and structure of corticolimbic regions in the brain of naturally cycling women. The field is rapidly expanding, with twenty-nine neuroimaging studies published in the last five years, thus contributing to the novelty of the reviewed findings. Compared with previous overviews (Lisofsky et al., 2015a;Toffoletto et al., 2014), this review is based not only on data from various neuroimaging modalities but also includes a finer categorization of the findings based on E2 and P4 fluctuations (Tables 1-5), a summary of whole-brain coordinates-based findings (supplementary tables 1-4), and quality estimates of the reviewed studies ( Fig. 2 and  supplementary table 5). Overall, brain differences between and/or changes throughout the menstrual cycle phases and associations with ovarian hormones were observed in sixty-six out of the seventy-seven neuroimaging reviewed studies (Fig. 3).

Integrated discussion
As brain regions of interest, the hippocampus, amygdala, ACC, insula, PFC and IPL are the most consistently reported by use of both whole brain and ROI analyses (Tables 1-5). The key findings regarding these brain regions, though related to effects subtle in nature and despite small effect-sizes, can be summarized as follows: i) associations between E2 fluctuations and variations in the hippocampus grey matter volume and activation related to affective and cognitive processing; ii) the increase of grey matter volume and activation related to affective processing in the amygdala during the mid-and late LP, and their association with negative affect during the premenstrual period; iii) greater ACC activation in the mid-LP during cognitive processes, reward processing and negative feedback, along with a significant association between ACC activation and negative affect, iv) higher late FP activations in the insula during cognitive processing, associated with E2 fluctuations, and accompanied with higher insular volumes, v) enhanced prefrontal reactivity associated with E2 and P4 levels during cognitive processing in the mid-LP, vi) positive associations between E2 fluctuations and structural measures of the PFC, vii) positive associations between E2 concentrations and the volume, functional connectivity and reactivity to cognitive processing of the IPL.
Although E2 and P4 receptors are widely distributed within the brain (Brinton et al., 2008), translational studies have mainly investigated the effect of the oestrus cycle on the anatomy of the hippocampus. Studies conducted in rodents evidenced variations in hippocampal spine density (Woolley et al., 1990) and hippocampal volume (Qiu et al., 2013) that paralleled E2 and P4 fluctuations. While a positive association between E2 and hippocampal synaptogenesis has been consistently reported (Lee and McEwen, 2001;McEwen et al., 2015), an opposite action of P4 on synapse regulation has been observed (Woolley and McEwen, 1993). In humans, corroborating findings are presented for the hippocampus. Indeed, it seems that brain structure in this region can be rapidly affected by endogenous ovarian hormone fluctuations, as illustrated by the increased hippocampal grey matter volume found in the late FP (Lisofsky et al., 2015b;Pletzer et al., 2018;Protopopescu et al., 2008), and the positive correlation between E2 concentrations and hippocampal grey matter volume as well as the white matter integrity (Barth et al., 2016;Lisofsky et al., 2015b;Pletzer et al., 2018). These findings are corroborated by menopausal hormone replacement therapy (HRT) studies, indicating larger hippocampal volumes and enhanced hippocampus function in HRT users . In line with this, higher E2 concentrations were associated with a greater white matter microstructural integrity in the fornix, the major output tract of the hippocampus (De Bondt et al., 2013b).
Similarly, increased insular and cerebellar grey matter volumes were shown in the late FP as well (De Bondt et al., 2016;Lisofsky et al., 2015b), suggesting that endogenous ovarian hormones fluctuations can affect brain structure beyond the hippocampus. Furthermore, in line with preclinical findings (Lee and McEwen, 2001;McEwen et al., 2015;Woolley and McEwen, 1993), grey matter volume correlated positively with E2 levels, and negatively with P4 levels, in the insula and the cerebellum, respectively (De Bondt et al., 2013aLisofsky et al., 2015b). Additionally, measures of cortical thickness in the OFC correlated positively with E2 levels during the mid-LP of the menstrual cycle (Petersen et al., 2015), further supporting the hypothesis of a neurotrophic effect of E2 on brain structure. However, opposite patterns of hormonal influence on brain structure were shown for the amygdala, basal ganglia, cingulate gyrus, fusiform gyrus and PFC. Thus, positive associations between P4 concentrations and grey matter volume have been reported in the basal ganglia (Pletzer et al., 2018), the ACC, the fusiform gyrus and prefrontal regions (De Bondt et al., 2013a, although these relationships sometimes differ depending on the menstrual cycle phase. In addition, negative associations between E2 and structural brain measures were shown in the fusiform gyrus (De Bondt et al., 2013a), the ACC and the PFC (De Bondt et al., 2013aPetersen et al., 2015). In line with these findings, a reduction in grey matter volumes was reported in the amygdala and the basal ganglia from the late FP to the mid-and late LP (Ossewaarde et al., 2013;Pletzer et al., 2018). Hence, while both animal and human studies suggest a neurotrophic effect of E2 and an opposite effect of P4 on the hippocampus, the results of the reviewed findings indicate that the influence of ovarian hormone fluctuations is brain region-and menstrual cycle phase-specific. These observations are consistent with the results of a systematic review reporting prominent effects of ovarian sex hormones on grey matter volumes among many regions of the limbic system as well as in the basal ganglia, cerebellum and prefrontal areas in the context of natural menstrual cycle, use of hormonal contraceptives, pregnancy, and menopause (Catenaccio et al., 2016;Rehbein et al., 2020). Interestingly, the menstrual cycle-related structural variations reported in the amygdala, cerebellum and prefrontal regions were associated to the severity of premenstrual symptoms, including negative affect. This is in line with studies of rodents demonstrating an effect of E2 on depressive-like and anxiety-like behavior (Hiroi et al., 2016;Lovick, 2012;Mueller et al., 2014).
Whether such structural variations exert a functional role during brain activation remains unknown, as no multimodal neuroimaging analysis has been conducted so far on this topic. However, indirect evidence supports anatomically-driven variations in brain reactivity of the hippocampus, amygdala and ACC to affective and cognitive stimuli, as illustrated by the results of functional and anatomical MRI studies. Focusing on the majority of the studies, functional correlates of menstrual cycle hormonal fluctuations include: i) increased hippocampus reactivity to emotionally salient stimuli and cognitive processes in the late FP, associated with higher E2 concentrations and a better cognitive performance; ii) increased activations related to affective processing in the amygdala during the mid-and late LP, and associations with ovarian hormone levels; and iii) reduced ACC activations during the processing of negative valence stimuli and elevated activations during cognitive processes and reward in the mid-LP. In addition, the reduced cerebellar volume reported during the late FP seems to be associated with reductions in cerebellar activity during affective processing and working memory, and a better memory performance. It is likely that the regions highlighted by the functional neuroimaging studies conducted on affective fMRI paradigms are driving mood changes across the menstrual cycle. These changes could include the higher well-being reported in the late FP as well as the premenstrual symptoms in women with a PMS and PMDD (Halbreich et al., 2003;Sanders et al., 1983). Important to note is that, upon the assumption that the metabolic expenditures by neurons or glia during and immediately after the task indexed by the blood oxygen level dependent (BOLD) signal and cerebral blood flow reflects neural activity, differential brain activation during task performance could reflect a difference in how efficiently neuronal resources are used, be an indication of altered use of these resources, or imply a compensatory response of a distinct brain region (Henderson and Greicius, 2010). Thus, joint analyses of fMRI and behavioral data can contribute to explain mechanistic underpinnings. Considering the widespread distribution of E2 and P4 receptors in the brain (Brinton et al., 2008), and the interactions between these hormones and serotoninergic, dopaminergic, GABAergic and glutamatergic systems , the functional variations highlighted by fMRI studies throughout the menstrual cycle most likely result from hormonal modulatory actions on these neurotransmitter systems. Unfortunately, molecular imaging studies of the menstrual cycle remain limited in number, and the results appear rather inconsistent. Combined functional MRI and molecular imaging measures are needed to determine if a specific neurotransmission system is involved in the functional variations observed throughout the menstrual cycle. Nevertheless, our review includes resting-state and task-related fMRI studies reporting functional connectivity changes throughout the menstrual cycle (Tables 3 and 5), based on different analysis methods (i.e. independent component analysis (ICA), eigenvector centrality mapping, seed-based analysis). These functional connectivity findings go beyond the results involving differential activations in individual brain regions, as they arise from more reliable measures (Elliott et al., 2020) and involve functional brain networks defined by interconnected brain regions, further suggesting a widespread influence of ovarian hormones on functional brain organization.

Methodological considerations
When interpreting the findings discussed in this review, attention should be drawn to several methodological issues. These methodological considerations include the divergent neuroimaging techniques, menstrual cycle assessment timing and confirmation through hormonal measurements, and behavioral assessments. It is critical for future research that these matters are meticulously considered, as it substantially reduces the comparability between studies and could jeopardize the results.

Neuroimaging techniques
Current neuroimaging research faces critical limitations such as low statistical power tied to small sample sizes, as well as flexibility in data analysis, both resulting in poor replicability of results (Poldrack et al., 2017). Thus, the use of different statistical analysis approaches (e.g. whole brain versus ROI), stereotaxic space (i.e. MNI versus Talairach), and labelling systems (e.g. AAL, Brodmann) represent important limitations for the comparison of findings. For example, ROI studies might introduce a bias towards the reported regions and leave possibly interesting results in other brain areas undetected. In addition, the statistical significance thresholds used to visualize the neuroimaging results vary from one study to another, and correction for multiple comparisons is not systematically applied. Furthermore, differences in imaging parameters across multiple MR scanners and scanner upgrades would likely affect the comparability among the reviewed studies as well, as they affect both structural and functional measurements (Fortin et al., 2018;Noble et al., 2017;Panman et al., 2019;Takao et al., 2013). Recent evidence suggest that the test-retest reliability of task-based fMRI measures is overall poor, while structural MRI and functional connectivity measures demonstrate high reliability (Elliott et al., 2020). Importantly, habituation effects and rapid fluctuations have a critical impact on test-retest reliability estimates (Elliott et al., 2020), which is of high importance for studies of the menstrual cycle. Another possible source of discrepancy would relate to the methodology used to define functional networks of interest when analyzing resting-state fMRI data. Indeed, among the studies included in the present review, network definition was achieved by applying either seed-based analysis, eigenvector centrality mapping analysis, ICA alone, or in combination with statistical comparison to the Intrinsic Connectivity Networks (ICNs) described by Laird et al. (2011). Thus, while investigating the same functional networks, the regions included in these networks could differ among the studies. Moreover, relatively few neuroimaging studies investigated the menstrual cycle, which systematically included a small number of women (< 100).
Hence, due to the broad range of different study designs and limited sample size, no quantitative meta-analysis was performed on the reviewed studies, which did not meet the requirements ensuring robust results from an Activation Likelihood Estimation coordinate-based meta-analysis (Eickhoff et al., 2016;Muller et al., 2018). Another limitation arises from the publications bias induced by the under-representation of null findings, which therefore cannot be accounted for in meta-analyses. As the next step, voxel-based meta-analysis conducted on first level processed contrast images from various studies, together with estimated pooled effect size analysis, represent a promising prospect for bringing together the findings from the neuroimaging studies of the menstrual cycle.
Suggestions for future research include the implementation of mutivoxel pattern analysis, machine learning models, and multimodal MR analyses (Elliott et al., 2020) to identify objective, hypothesis-free markers of neural changes through the menstrual cycle. Indeed, the use of high-resolution multimodal neuroimaging techniques with a consistent analytical approach would advance the field and lead to a more comprehensive overview of brain changes associated to the menstrual cycle. Furthermore, while MR provides indirect brain measurements, its combination with other techniques, such as well-designed behavioral paradigms, quantitative receptor ligand imaging, pharmacological interventions Henningsson et al., 2015) and animal models, would likely provide a substantial insight into the complex interactions between ovarian hormone fluctuations and brain functional anatomy. In addition, neuroimaging studies should provide coordinate data from a standardized stereotaxic space, which could be further used to conduct coordinate-based meta-analysis of comparable studies.

Menstrual cycle assessment timing
The major hurdle of reviewing neuroimaging studies investigating the effects of the menstrual cycle on the brain relates to the lack of consistency in assessment timing. In the present review, no less than six different assessment timing points have been described, including early, mid-and late FP, ovulation, mid-and late LP. In addition, several studies used an assessment timing covering the whole FP (Epperson et al., 2002;Jovanovic et al., 2006;Sundstrom Poromaa et al., 2018), while substantial changes in E2 levels occur during this period of time (Abraham et al., 1972). Besides, while variations in brain features observed throughout the FP may be attributed to E2 level fluctuations as P4 levels remain very low, the specific influence of E2 and P4 on brain changes occurring during the LP cannot be disentangled. More coherence between timing of assessment will increase the comparability between studies, which could be reached by systematically using earlyand late FP as reference timing points for low and high E2 status, and mid-and late LP as reference timing points for high E2/P4 status and E2/P4 decline, respectively. Besides, heterogeneity in terms of study design led to a multiplicity of comparison terms between menstrual cycle phases across the literature (i.e. comparisons of early FP, mid-FP late FP, mid-LP, late LP), further undermining the comparability among studies.
Confirmation of assessment timing within the menstrual cycle is challenging (Hampson, 2020), but crucial to a rigorous assessment of ovarian hormones effects on the brain. Thus, a number of studies not including any kind of hormonal assessment to confirm menstrual cycle phase could not be included in the present review. While most of the studies included in the review relied on menstrual cycle mapping, urinary LH and blood hormonal measurements as recommended, some only used urinary LH measures to confirm menstrual cycle phase, possibly introducing variability in hormonal states between individuals and assessment times. Additionally, as inter-individual variations in hormonal fluctuations along the menstrual cycle and the effects of these hormonal fluctuations on the brain may introduce additional biases (Fehring et al., 2006), it is crucial that future investigations reach a consensus regarding menstrual cycle phase confirmation. In line with this, a three-step method including menstrual cycle mapping, urinary ovulation test and serum hormone measurement for verification of the menstrual cycle phase has been proposed (Schaumberg et al., 2017), to ensure the presence of a regular menstrual cycle, reduce the discrepancy between studies and obtain more robust results.

Behavioral assessment
Opposite to neuroimaging studies of psychiatric disorders such as PMDD and peripartum depression, phenotyping of task-related behavior and premenstrual mood symptoms is generally missing in neuroimaging studies of healthy, naturally cycling women. In total, among the forty studies that included behavioral assessments, twenty investigated interactions with the neuroimaging data, and thirteen related their neuroimaging findings to behavioral changes. Of functional relevance, these behavioral associations strengthen the findings of structural and functional variations throughout the menstrual cycle, involving brain regions of limbic and cognitive networks. Nevertheless, the behavioral significance of the menstrual cycle-related brain variations remains to be investigated, as there is a substantial discrepancy between relative consistency among neuroimaging findings, and inconsistencies among behavioral studies (Sundstrom Poromaa and . To this end, it is compelling that targeted behavioral paradigms should be developed to overcome the current challenges of the field, which include: low test-retest reliability (Elliott et al., 2020), repeated measurements (i.e. habituation effects), subclinical populations, subtle effect size, etc.

Critical reflections
The fact that the male brain is still the default in neurosciences (Bale and Epperson, 2017), and that females hormones represent a problem in preclinical research (Shansky, 2019), is currently debated. The present overview of findings on menstrual cycle influences on neural correlates further strengthens the relevance of going beyond these biases and brings forward the calls on inclusion of female individuals in research studies. Indeed, menstrual cycle staging is possible with adequate methodological control (Barth et al., 2016), even if only one test session is included (Bale and Epperson, 2017). However, it is undoubtedly a factor that requires extra resources to be considered. While for each woman included in a neuroimaging study, and likely in behavioral and clinical studies as well, menstrual cycle phase may influence the results, such effect may be comparable to other potential confounding factors that could be taken into account (e.g. age, substance use, personality traits). A better understanding of these influences on the brain is therefore really much needed. Increased standardization of experimental methods is also needed to investigate the impact of ovarian steroid hormones on women's brain, with a neuroimaging design: i) considering sex as a biological variable and being powered for equal male/female distribution; ii) including behavioral phenotyping and psychological profiling. Potential confounding effects of hormonal cycling fluctuations and lifetime exposure to exogenous hormones could be considered as well, in relevant studies. Moreover, while hormonal manipulation in humans is restricted by ethical reasons, oral contraceptives and hormonal replacement therapy can be used as models to study the effect of ovarian hormone levels on brain structure, function and chemistry under different conditions (Beltz and Moser, 2020;Comasco et al., 2014;Toffoletto et al., 2014).

Conclusions
This systematic review provides integrated, data-driven evidence, though disparate, for brain differences associated with endogenous ovarian hormone fluctuations, derived from multimodal brain imaging studies. Taken together, the reviewed findings suggest that hormonal fluctuations during the menstrual cycle modulate brain structure, chemistry and function, thus influencing negative affect and cognition. Despite heterogeneity in terms of study design, consistent effects on the reactivity and the structure of corticolimbic brain regions emerged. Specifically, the anterior cingulate cortex, hippocampus, amygdala, insula, prefrontal cortex and inferior parietal lobule appear as the regions with the highest sensitivity to ovarian hormone fluctuations. However, while the majority of accumulated evidence relates to fluctuations in brain reactivity, the underpinnings of such variations and their association with brain structure and chemistry throughout the menstrual cycle remain to be unraveled. In this regard, the use of highresolution multimodal neuroimaging techniques in naturally cycling women based on careful menstrual cycle staging, together with behavioral paradigms, quantitative receptor ligand imaging, pharmacological interventions Henningsson et al., 2015), and animal models would likely advance the field and lead to a more comprehensive overview of brain changes associated to the menstrual cycle.