Child maltreatment and hypothalamic-pituitary-adrenal axis functioning: A systematic review and meta-analysis

Alterations in hypothalamic-pituitary-adrenal (HPA) axis and its effector hormone cortisol have been proposed as one possible mechanism linking child maltreatment experiences to health disparities. In this series of meta-analyses, we aimed to quantify the existing evidence on the effect of child maltreatment on various measures of HPA axis activity. The systematic literature search yielded 1,858 records, of which 87 studies (k = 132) were included. Using random-effects models, we found evidence for blunted cortisol stress reactivity in individuals exposed to child maltreatment. In contrast, no overall differences were found in any of the other HPA axis activity measures (including measures of daily activity, cortisol assessed in the context of pharmacological challenges and cumulative measures of cortisol secretion). The impact of several moderators (e.g., sex, psychopathology, study quality), the role of methodological shortcomings of existing studies, as well as potential directions for future research are discussed.


Introduction
Child maltreatment is a widespread phenomenon that affects the lives of millions of children worldwide (Stoltenborgh et al., 2015;Witt et al., 2017). Despite extensive research on the consequences of the experience of child maltreatment, surprising heterogeneity exists across studies with respect to its operational definition (Cicchetti and Toth, 2005;Leeb et al., 2008;Manly, 2005). Researchers, however, generally agree that child maltreatment involves both acts of commissionincluding physical, sexual, and emotional (psychological) abuse -as well as acts of omission (i.e., any form of neglect) by a parent or other caregiver that results in harm, potential for harm, or threat of harm to a child (usually interpreted as up to 18 years of age; Gilbert et al., 2009;Leeb et al., 2008). Typically, maltreated children experience multi-type maltreatment, suggesting that the different forms of maltreatment often co-occur (Herrenkohl and Herrenkohl, 2009;Vachon et al., 2015). While inconsistencies exist between studies in terms of its definition, extensive research, including findings from prospective and retrospective cohort studies (e.g., Clark et al., 2010;Danese et al., 2009;Dube et al., 2001) as well as twin studies (Kendler et al., 2000;Nelson et al., 2002), has shown that the experience of child maltreatment represents a profound, nonspecific risk factor for the development of a broad variety of mental (e.g., Bonoldi et al., 2013;Carr et al., 2013;Dube et al., 2001;Infurna et al., 2016;Liu et al., 2018;Norman et al., 2012;Porter et al., 2020;Teicher and Samson, 2013) as well as physical disorders (e.g., Clemens et al., 2018;Hemmingsson et al., 2014). Importantly, most publications indicate a dose-dependent relationship between the experience of child maltreatment and the risk for health impairments, with those reporting more severe experiences or an increasing number of different types of child maltreatment showing stronger associations (Clemens et al., 2018;Dube et al., 2001;Hemmingsson et al., 2014;Kendler et al., 2000;Norman et al., 2012).
One proposed mechanism by which child maltreatment might affect later disease risk involves epigenetic programming -a mechanism known to cause long-lasting changes in gene expression, especially in combination with specific risk alleles, thereby inducing long-lasting changes in biological functioning (i.e., biological embedding; Heim et al., 2019;Jacoby et al., 2016;Smith and Mill, 2011). It is assumed that particularly the expression of several genes relevant to stress regulation might be affected, ultimately leading to the development of a phenotype with core dysfunctions in circuits of the brain involved in the processing of stress and emotion regulation, and related changes in core outflow stress response systems, including -and this will present the focus of the current meta-analysis -the hypothalamic-pituitary-adrenal (HPA) axis (Chen and Baram, 2016;Heim et al., 2019;Jacoby et al., 2016;Koss and Gunnar, 2018;Strüber et al., 2014). In turn, these core dysfunctions might then increase the lifelong risk for the development of a wide range of adverse outcomes later in life (e.g., Heim et al., 2019).
As an important stress response system, HPA axis activity and associated cortisol secretion -i.e., cortisol stress reactivity -serves various functions, depending on the system under investigation (e.g., cardiovascular system, immune system), together being important for survival and restoring homeostasis (adapting to a homeostatic challenge; Nicolaides et al., 2015). Importantly, structures centrally involved in controlling stress-induced HPA axis activity (particularly anticipatory responses to social challenges) include limbic brain regions such as the hippocampus, the amygdala, and the prefrontal cortex (for a detailed review, see Herman et al., 2003), structures also well-known to be affected by chronic stress such as the experience of child maltreatment (e.g., McEwen et al., 2016McEwen et al., , 2015Shirazi et al., 2015;Teicher et al., 2016). Interestingly, an aberrant cortisol stress response has been observed in patients with various mental health problems including borderline personality disorder (BPD), anxiety disorders, depression, and schizophrenia (e.g., Drews et al., 2019;Zorn et al., 2017).
Aside from its central role as a stress response system, HPA axis activity and associated cortisol secretion, in addition to stressors, is also triggered by other regulatory control factors, including circadian signals (for a comprehensive overview, see Spencer and Deak, 2017). Accordingly, in addition to the well-known cortisol stress response, several other measures of HPA axis activity can be distinguished. These include the cortisol awakening response (CAR; Clow et al., 2010;Fries et al., 2009;Spencer et al., 2018), diurnal cortisol (DC; Adam and Kumari, 2009;Segerstrom et al., 2014;Spiga et al., 2014), cortisol assessed following pharmacological challenge tests (i.e., the dexamethasone suppression test (DST), the combined dexamethasone-corticotropin releasing hormone test (Dex-CRH), and the corticotropin-releasing hormone test (CRH); Carroll, 1981;Gold et al., 1986;Watson et al., 2006), as well as cumulative measures of cortisol secretion including 24-hour urinary free cortisol (24hour UFC; Deutschbein et al., 2011;Moore et al., 1985) and hair cortisol concentrations (HCC; for an overview, see Meyer and Novak, 2012;Stalder and Kirschbaum, 2012). Interestingly, similar to findings related to the cortisol stress response, alterations in these other HPA axis activity measures have been associated with various mental and physical health problems as well, although substantial inconsistencies exist among findings (e.g., Adam et al., 2017;Berger et al., 2016;Chida and Steptoe, 2009;Leistner and Menke, 2018;Stalder and Kirschbaum, 2012).
To summarize, a growing number of studies indicate associations between the experience of child maltreatment and epigenetic changes in key genes involved in stress regulation. Interestingly, epigenetic changes have been found in genes important for the regulation of HPA axis activity, such as in the GR gene, the FKBP5 gene and the CRH gene (Hoffmann and Spengler, 2012;Klengel et al., 2013;Palma-Gudiel et al., 2015;Turecki and Meaney, 2016). Together with findings of altered brain morphology, functioning and connectivity, especially in brain regions involved in the regulation of HPA axis activity (e.g., McCrory et al., 2010;Teicher et al., 2016), long-lasting changes in the regulation of this system in individuals exposed to child maltreatment are suggested. If additionally, the manifold biological effects of cortisol are considered (e.g., Sapolsky et al., 2000), a dysregulation in the release of this hormone might increase the susceptibility for a wide variety of negative health outcomes, particularly in combination with other stressful experiences later in life. As discussed above, alterations in several measures of HPA axis activity have indeed been linked to various health conditions including psychopathology. Accordingly, the association between child maltreatment and alterations in cortisol secretion has been widely investigated, including various measures of HPA axis activity. However, findings are inconclusive, with inconsistencies reported (similarly with findings related to psychopathology), specifically regarding the direction of alteration (hyper-versus hyposecretion). To date, three meta-analyses have investigated the association between adverse childhood experiences and aberrant cortisol secretion (Bernard et al., 2017;Bunea et al., 2017;Fogelman and Canli, 2018), whereby two of them focused on early life adversity (ELA) in general and one specifically on child maltreatment. With respect to cortisol stress reactivity, the meta-analysis conducted by Bunea and colleagues (2017) revealed a blunted cortisol stress response to social stressors in those with a history of ELA. In contrast to these findings, the two other metaanalyses failed to show alterations in circadian rhythms, including measures such as the diurnal slope (DSL) as well as the CAR (Bernard et al., 2017;Fogelman and Canli, 2018).
When examining the relationship between child maltreatment and HPA axis activity, careful control of several potential moderators may be of particular importance. As mentioned previously, various mental disorders have been linked to alterations in HPA axis functioning (e.g., Chida and Steptoe, 2009;Stalder et al., 2017;Zorn et al., 2017). The meta-analysis conducted by Zorn et al. (2017), for instance, was able to show that women with major depressive disorder (MDD), women with an anxiety disorder and patients with schizophrenia show a blunted cortisol stress response to psychosocial stressors compared to healthy control participants. The meta-analysis conducted by Drews et al. (2019) similarly found evidence for an overall attenuated cortisol stress response in patients with borderline personality disorder (BPD). Due to the close association between the experience of child maltreatment and psychopathology, these studies, however, cannot rule out the possibility that the observed endocrinological alterations were actually caused by childhood adversity and may have been present even before the development of the respective mental health conditions. Moreover, beyond the epigenetic changes found in genes important for the regulation of HPA axis activity that are likely caused by the experience of chronic stress, such as the experience of child maltreatment, specific polymorphisms of respective genes associated with differences in cortisol secretion independent of child maltreatment have also been found to be related to an increased risk of developing mental disorders (Fan et al., 2021;Ising et al., 2008;Mahon et al., 2013). Accordingly, psychopathology may interfere with or moderate the relationship between child maltreatment and cortisol, and should therefore be considered carefully. In addition, several lines of evidence suggest that women are generally more prone to mental disorders, particularly stress-related disorders, and that sex hormones might account for some of these findings (Li and Graham, 2017). Interestingly, marked sex differences have also been reported with respect to HPA axis functioning (Foley and Kirschbaum, 2010;Stalder et al., 2016;Stalder and Kirschbaum, 2012;Zänkert et al., 2019), indicating that in order to accurately capture the relationship between child maltreatment and cortisol, the influence of sex should be considered as well. Furthermore, there is some support that hormonal activity is elevated at stressor onset (e.g., at the time of child maltreatment) and reduces with passing time (e.g., with increasing age; Miller et al., 2007), implying that the age of the population studied may likewise be important. Finally, ethnic differences in HPA axis functioning have been found (Boileau et al., 2019), keeping in mind, of course, that minorities are generally exposed to more adversity (e.g., O'Connor et al., 2020), which in turn may explain this association.
A number of variables related to the measurement of child maltreatment may also be important when investigating the association between child maltreatment and HPA axis functioning. These include the assessment modality employed (e.g., informant versus self-reports), the various approaches to defining the presence of child maltreatment (e.g., records, cut-offs, specifications), the age at maltreatment onset, as well as the chronicity of the maltreatment experiences. For instance, a nationally-representative birth cohort study, the Environmental Risk (E-Risk) Longitudinal Twin Study, demonstrated that retrospective selfreport data of child maltreatment were more strongly associated with adult psychopathology than prospective informant reports (Newbury et al., 2018). Chronic maltreatment starting early in life is generally associated with poorer neurocognitive functioning and worse psychopathology (Cowell et al., 2015;Jaffee and Maikovich-Fong, 2011;Kaplow and Widom, 2007). In addition, findings suggest that chronic exposure to stress hormones can impact brain structures differently, depending on the timing and duration of the exposure (Lupien et al., 2009). Thus, effects on HPA axis functioning might vary depending on the age of first child maltreatment experience and/or the chronicity of these experiences.
Furthermore, studies vary widely regarding the assessment of cortisol. Depending on the HPA axis activity measure of interest, some of these variations may account for additional variability in the child maltreatment cortisol relationship. Variables that have been associated with different cortisol findings include, for instance, sample type (blood versus saliva; Spencer and Deak, 2017), slope type, and whether morning samples were collected at awakening for diurnal cortisol (Adam et al., 2017;Ryan et al., 2016), as well as the type of stressor in the context of the cortisol stress reactivity (social-evaluative versus other; e.g., Dickerson and Kemeny, 2004;Zänkert et al., 2019), and differences might emerge depending on how well a task can elicit a cortisol response. In addition, the variation in dosage of dexamethasone in pharmacological stimulation tests (0.5 mg versus 1.0 mg) might account for variability as well (Leistner and Menke, 2018).
Finally, several aspects of methodological quality need to be considered when attempting to quantify the relationship between the experience of child maltreatment and HPA axis functioning. Besides matching the child maltreatment and the control group with respect to age, sex, and psychopathology, and ensuring that no one from the control group was exposed to child maltreatment experiences, these include instructions for sampling and collection, the day and timing of sampling, as well as controlling for specific state covariates such as being sick or experiencing any current stress at the time of testing, with these methodological variables differing to some extent between the various HPA axis activity measures. For a comprehensive overview and corresponding references, see Appendix B Tables B1-B7. Finally, there are certain disease states (e.g., addictions, endocrine diseases), various drugs (e.g., glucocorticoids, psychoactive medications), and sex hormone-dependent variables (e.g., menstrual cycle, oral contraceptive intake, pregnancy) that can strongly influence cortisol levels (e.g., Foley and Kirschbaum, 2010;Granger et al., 2009;Kudielka et al., 2012;Kudielka and Wüst, 2010;Locatelli et al., 2009;Stalder et al., 2016;Zänkert et al., 2019). Considering that participants with experiences of child maltreatment are more likely to suffer from medical conditions, are at increased risk for substance abuse, and more often experience unintended teenage pregnancy (e.g., Hughes et al., 2017), factors related to altered cortisol secretion (e.g. Foley and Kirschbaum, 2010;Stalder et al., 2016;Stalder and Kirschbaum, 2012), careful assessment and matching between groups (i.e., maltreated versus control group) on these variables is of major importance.
Thus, in this comprehensive systematic review and meta-analysis, we aimed to quantify existing evidence on the effect of child maltreatment on cortisol metabolism, including all of the previously mentioned measures of HPA axis activity. In contrast to existing meta-analyses, we were particularly interested in the potential influence of psychopathology in interfering or moderating the effect of child maltreatment on changes in cortisol secretion. Accordingly, psychopathology, and especially the matching of the groups (psychopathology versus no psychopathology) was recorded thoroughly. Furthermore, we were interested in a range of other factors likely to moderate the effect of child maltreatment on cortisol regulation, such as age at the time of study participation, sex, ethnicity, child maltreatment assessment method (informant versus self-report), child maltreatment grouping method (i. e., cut-off scores, child protective services (CPS) records, other), age at the time of child maltreatment onset, chronicity of the child maltreatment experiences, as well as variables related to the assessment of the corresponding HPA axis activity measure. We considered different indices for each HPA axis activity measures, at least including one index of total cortisol production and one index reflecting change in cortisol over time (sensitivity of the system; Pruessner et al., 2003). In contrast to the existing meta-analyses, the present investigation sought to determine whether aberrant cortisol secretion patterns following child maltreatment can be observed in both of these largely independent components of HPA axis activity. Finally, a comprehensive quality assessment based on expert guidelines was developed for each activity measure, and several elements of methodological quality and their potential moderating role were investigated.

Systematic literature search
Articles were identified by searching the electronic databases Pubmed, PsycINFO, Web of Science (WOS) and the Cumulative Index to Nursing and Allied Health Literature (CINAHL) from their inception to June 2018. The search consisted of titles and abstracts and used the following search string: ("maltreatment" OR "neglect" OR "emotional abuse" OR "sexual abuse" OR "physical abuse" OR "childhood trauma") AND ("hypothalamic-pituitary-adrenal" OR "HPA" OR "cortisol" OR "adrenocorticotropic hormone" OR "ACTH"). Moreover, the reference lists of the prior meta-analyses on HPA axis functioning and child maltreatment (or ELA) were checked, as were studies proposed by authors who were contacted in the context of data collection.

Selection criteria
Studies were included if they: (1) involved human participants of all ages; (2) reported on at least one measure of child maltreatment (e.g., self-report [questionnaire, interview], report by an outside source [CPS record, parental report] or self-identification), whereby child maltreatment was defined according to the definition provided by the Centers for Disease Control and Prevention ("any act or series of acts of commission or omission by a parent or other caregiver that results in harm, potential for harm, or threat of harm to a child"; Leeb et al., 2008, p. 11) including the subtypes emotional, physical, sexual abuse, as well as neglect; (3) reported measuring cortisol levels, either as indicator of daily activity (DC, CAR), in response to a stressor (cortisol stress reactivity) or pharmacological challenges (DST, Dex-CRH, CRH), or as a cumulative index (24hour UFC or HCC). All assessment methods, i.e., saliva, blood (serum, plasma), urine and hair, were eligible. Additionally, several preconditions were formulated for the various measures of HPA axis activity: With respect to DC at least two sampling time points, one cortisol assessment in the morning (best with reference to awakening) and one in late afternoon/evening, had to be available (Segerstrom et al., 2014). In case of the CAR, only studies that collected at least two samples, with the first sample anchored to awakening and a second sample between +30 min or +45 min post-awakening, were included (Stalder et al., 2016). With regard to the cortisol stress reactivity, studies needed to collect at least one cortisol baseline measure (before being introduced to a stressor) as well as a sample between +20 and +40 min post-stressor onset to capture the peak of the cortisol stress response (Dickerson and Kemeny, 2004). UFC had to be collected over a period of 24 h (Moore et al., 1985) and studies assessing HCC needed to focus on the first 3 cm hair segment (Meyer and Novak, 2012). No restrictions were applied to studies involving pharmacological stimulation tests. Articles were excluded for one or more of the following reasons: (1) evaluated a non-human sample; (2) were not written in English or German; (3) did not contain primary data (e.g., systematic reviews, meta-analysis, book chapters); or (4) were not peer reviewed (e.g., dissertations, mastertheses, conference abstracts). Additionally, studies that included (5) participants with substance abuse (e.g., alcohol, cocaine); (6) individuals who suffered from a medical condition (e.g., endocrine disorder, chronic fatigue, chronic pain); or studies including (7) pregnant women or women in the postpartum period (up to 6 months) were excluded, given the well-known effects of these factors on HPA axis activity (e.g., Stalder et al., 2016;Zänkert et al., 2019). Studies were first screened based on their title and abstract and then further examined in full text in case of suitability.

Data extraction
As mentioned earlier, for each of the HPA axis activity measures (exception: 24-hour UFC and HCC) several outcome indices were defined, including at least one measure for total cortisol production and one measure reflecting changes in cortisol over time. For DC, these included waking (morning) and bedtime levels (for total cortisol) as well as the delta between bedtime and waking cortisol samples (reflecting change over time; DSL). For the CAR, the following assessment time points or indices were extracted: waking cortisol, peak cortisol (expected between +30 and +45 min post awakening), end cortisol (assessed +60 min post awakening), peak reactivity (delta between the peak and the waking cortisol sample) as well as the area under the curve with respect to the ground (AUC g ; reflecting total cortisol production) and the area under the curve with respect to increase (AUC i ; reflecting changes over time; for more details regarding these two formulas, see Pruessner et al., 2003). Similarly, the following assessment time points or indices were extracted for the cortisol stress reactivity, the CRH-and the Dex-CRH-test: baseline cortisol (before being introduced to a stressor; before CRH injection), peak levels, recovery levels (last sample assessed), peak reactivity (defined as delta between peak and baseline levels), AUC g and AUC i . In some of the studies, the timing of the peak differed between the child maltreatment and the control group and the values at the individual peak times were extracted for each group. Whenever a peak occurred in only one of the groups, the value at that peak time was extracted also for the other group. There were also studies in which neither group showed a cortisol response following the perception of a stressor. In this case the values were extracted at the time a response would have been expected (around +30 min post stressor onset). For these few studies, however, no peak reactivity values were extracted or included in this meta-analysis. For the DST, cortisol assessed in the morning prior administration of dexamethasone (which was normally administrated at 11 pm) and cortisol measured the next day, as well as the delta between these two measurement time points were extracted. As we were interested in obtaining all of the defined outcome indices from each study, the authors of all studies containing missing information were contacted with a data request. The data request consisted of an excel file containing all the variables of interest. Since we assumed that the requested additional calculations were relatively time-consuming, one option was to send us the raw data, with which we calculated the desired indices. If the data could not be provided, whenever possible, means and standard deviations were extracted from tables or text. If those data were not available, data were extracted from figures using a web-based digitizer (Rohatgi, 2012). Two independent reviewers extracted the data from the corresponding figures and the mean value of both extractions was calculated. If not clearly stated in the text or in the subheading of the figures, we assumed that they represented means and standard errors. Standard deviations were calculated from standard errors or from confidence intervals using the RevMan Calculator provided by the Cochrane group (Drahota and Beller, n.d.). If none of these data sources were available, the study was excluded. In case of multiple publications based on the same cohort, the study with the largest sample size or the one that provided extractable data was included. Whenever possible, non-transformed (raw) cortisol data were extracted. In the case of more than two groups described in the paper, we extracted the data from those two groups that best matched in terms of psychopathology, or if the data of two groups could be combined, weighted means and standard deviations were calculated using the StatsToDo software (https://www.statstodo.com/index.php). If a study included both a clinical and a healthy control group and appropriate measures of child maltreatment were taken in both groups, data were requested or extracted separately for these two subgroups. If the experience of child maltreatment was assessed but grouping was based on other criteria, the authors were asked to (re)group participants based on the presence or absence of child maltreatment or, in case of cutoff scores, in high versus low child maltreatment groups (see Appendix A for details on the respective (re)grouping method of the studies in question). Finally, if a measure did not just assess child maltreatment but other traumatic experiences as well, such as it is the case for the Early Trauma Inventory (ETI; Bremner et al., 2000), authors were requested to group participants including only the subscales which assessed child maltreatment. However, this was not always possible, which is why some of the included studies did not focus on child maltreatment only, but on ELA in general. The few studies to which this applies are marked accordingly. We always asked authors to provide the data including only those without missing cortisol or child maltreatment assessments. Therefore, the data presented in this meta-analysis might not completely correspond to the data displayed in the original studies.

Coding of study characteristics for moderator analyses
The following details were extracted from each study (1) identifying features (i.e., authors, year of publication, journal), (2) participant characteristics (i.e., sample size, age-range, average age, sex ratio, ethnicity [or race: percentage of Caucasians, non-Caucasians], assessment of psychopathology [clinical sample, healthy controls, mixed, not assessed, as well as the percentage of participants meeting criteria for a current mental disorder]), (3) trauma related information (i.e., measurement method [self-report, informant report, mixed], instrument used, grouping method [cut-off scores, record, other], type of child maltreatment [emotional, physical, sexual abuse, neglect], average age of first child maltreatment report and average duration of child maltreatment), (4) cortisol related information (i.e., type of sample [blood, saliva, urine, hair], measurement unit [e.g., nmol/l, μg/dl], time points of sampling, number of samples, reliability of measure [sampling over one day, two-days, more], minutes to peak, duration of stressor, type of stressor [social-evaluative, other], whether a cortisol response was observed [yes, both groups, only in one group, no], dose of the respective stimulant in case of the pharmacological stimulation tests) and (5) data related information (source of data [paper, provided] and whether the data were (re)grouped or not). Some of these variables were viewed as potential moderators that might account for variability in the child maltreatment cortisol relationship (see moderator analyses).

Risk of bias in individual studies
In order to quantify the risk of bias for each individual study and to examine the potential moderating role of several elements of methodological quality, a quality assessment tool was developed. This quality assessment tool covered the following three key domains: (1) variables associated with the selection of participants (including the measurement of child maltreatment and the matching of the two groups with respect to age, gender and psychopathology), (2) variables associated with the measurement of HPA axis activity, and related to, the (3) assessment of important confounders. These quality criteria, particularly those related to the assessment of cortisol and associated confounders, were developed based on expert guidelines and differ to some degree between the various HPA axis activity measures (see Appendix B for corresponding references). The risk of bias assessment for each HPA axis activity measure was conducted by two independent reviewers, with disagreements being resolved through discussion. In case of (re)grouped data or missing statistics (e.g., t-test or fisher's exact test), corresponding group comparisons were conducted based on available means and standard deviations using QuickCals from GraphPad (https://www.graphpad. com/quickcalcs/). As the data of some articles were (re)grouped, the information with respect to some quality items was no longer available at the group level. In this case, a conservative approach was followed and the corresponding point was not awarded (marked accordingly in the corresponding tables of Appendix B). In certain cases, the assessment of a quality item was not meaningful, e.g., scoring the matching between the child maltreatment and the control group with respect to oral contraceptive intake in an all-male sample, and in those cases the corresponding items were coded as NA (not applicable). For each of the three quality domains (selection of participants, appropriate assessment of the corresponding HPA axis activity measure, appropriate control for confounders), a score derived from the mean of all associated items (excluding the NA items) multiplied by 100 was calculated. In addition to these domain-specific scores, we also calculated an overall total score. These scores were then used in corresponding meta-regression analyses.

Statistical analyses
All analyses were run using R and R studio (version 3.6.2 2019-12-12), packages: meta, metafor, dmetar) and were guided by the online book "A Hands-on Guide" from Harrer et al., (2019). Effect sizes for the primary studies were estimated using the Hedge's g coefficient corrected for small sample sizes (Hedges, 1982). In order to calculate the overall effect, random-effects models for the different HPA axis activity measures and the various outcome indices were performed, applying the Restricted Maximum-Likelihood (REML) method to estimate the variance of the distribution of the true effect sizes (tau2; Veroniki et al., 2016). Between-study heterogeneity was evaluated focusing on the Cochran's Q statistics (with a p < 0.05 indicating the presence of statistical heterogeneity), the Higgins's and Thompson's I 2 measure (with I 2 : 25% = low heterogeneity, 50% = moderate heterogeneity, 75% = high heterogeneity) and the prediction interval (Higgins, 2003;Higgins and Thompson, 2002;IntHout et al., 2016). By means of the find.outliers (meta package) and inf.analysis function (Leave-One-Out-method; dmetar package), studies with extreme effect sizes (outlier studies) and studies exerting a high impact on the overall result (potential influential studies) were identified and excluded in the context of corresponding sensitivity analyses (Harrer et al., 2019;Viechtbauer and Cheung, 2010). Additionally, meta-regression and subgroup analyses (mixed/ fixed-effects model) were conducted to examine the influence of several predefined moderator variables. For some studies, cortisol data were available for a lower number of participants than reported in the original paper, with information on the various moderator variables only available for the original sample. Despite this, these original values were included in corresponding moderator analyses. To our best knowledge, we marked this in the tables describing the characteristics of the included studies. In case of substantial between-study heterogeneity (I 2 > 50%), meta-regression and subgroup analyses were based on the sensitivity model excluding outlier studies. Finally, in order to evaluate the presence of publication bias, funnel plots were visually inspected and the Egger's test for funnel plot asymmetry was performed (Egger et al., 1997;Peters et al., 2008).

Search results
The literature search yielded a total of N = 1,858 records of which n = 575 duplicates were removed. Screening of reference lists of existing meta-analyses on HPA axis functioning and child maltreatment (or ELA), as well as studies proposed by authors who were contacted in the context of data collection, yielded an additional n = 9 studies. After title and abstract screening, n = 1,025 articles were discarded because they did not meet inclusion criteria. The remaining n = 267 studies were assessed in full-text. Of these, another n = 120 publications were excluded for the following reasons (1) no appropriate HPA axis measure (n = 52), (2) all participants experienced child maltreatment (n = 8), (3) unusual measure of child maltreatment (n = 6), (4) intervention study with no baseline assessment (n = 3), and (5) samples used in multiple studies (n = 51). Additionally, n = 60 articles had to be excluded due to missing relevant statistics, leaving a total of n = 87 independent studies included in this series of meta-analyses (for full process of study selection, see Fig. 1). Of the n = 87 studies, n = 14 studies included two subgroups, one study contained three subgroups, and n = 18 articles collected data on more than one HPA axis activity measure (with DC and CAR most frequently jointly assessed), leaving a total of k = 132 group comparisons. Since some studies collected data on various HPA axis activity measures, it was possible that an effect size (e.g., cortisol measured at awakening) was included in two different random-effects models relating to two different outcome indices (e.g., morning cortisol in the context of DC and awakening cortisol in the context of the CAR). With respect to the various HPA axis activity measures, n = 23 studies reported on DC (k = 26), n = 22 on the CAR (k = 27), n = 35 on cortisol stress reactivity (k = 39), and n = 19 studies assessed cortisol following pharmacological challenges (DST: n = 11 (k = 17); Dex-CRH test: n = 8 (k = 10)). Only two studies examined cortisol after the CRH test, which is why these two studies were combined with the data reported for the Dex-CRH test. With respect to the cumulative measures, n = 8 studies reported on HCC (k = 9) and n = 4 studies on 24-hour UFC. Overall, data from n = 41 studies were provided by the respective authors, of which n = 23 data sets (k = 29) were (re)grouped for the purpose of this metaanalysis (see Appendix A). For three studies including large sample sizes (Hibel et al., 2019;Lovallo et al., 2019;Vreeburg et al., 2009) from which we obtained data, the publications that best described the respective samples and not those that appeared in the initial literature search were chosen as references.

Diurnal cortisol
3.2.1.1. Included studies. In total, our systematic search strategy identified n = 40 studies that assessed waking (or morning) and evening cortisol to measure some aspects of the circadian rhythm of cortisol secretion. Of these, n = 23 studies, including k = 26 comparisons involving a total of n = 5,248 participants were retained for quantitative synthesis. Owing to a lack of statistical information, data from the remaining n = 17 studies that were eligible for inclusion could not be considered. The mean age of the total sample was 26.89 (SD = 14.99) years, the majority of studies included predominantly female subjects (with the percentage of females ranging between 33.1 and 100.0%, M = 65.4%, SD = 24.0%; k = 5 comparisons with a purely female sample), and the percentage of Non-Caucasians ranged between 0.0 and 81.7% (M = 30.0%, SD = 30.7%; k = 14 not reporting on ethnicity). Most studies (k = 16) were conducted with adults only, with fewer studies involving children or adolescents (k = 10). With respect to psychopathology, k = 8 comparisons included healthy participants, k = 5 involved participants all fulfilling diagnostic criteria for a mental disorder, k = 6 comparisons comprised participants with at least some fulfilling the diagnostic criteria for a mental disorder (range: 17.5-96.0%; with k = 2 not matched in terms of psychopathology), and k = 7 did not report on psychopathology at all. The majority of studies used self-reports to assess the presence of child maltreatment (k = 17) and k = 9 comparisons relied on informant reports. The assessment of child maltreatment and the grouping of participants into a child maltreatment and a control group varied across the studies. This refers both to the instruments used as well as to the grouping procedure applied (e.g., cut-off scores, specific definitions, presence of records). Three studies (k = 4) not only focused on child maltreatment but also included participants with other types of ELA (Carrion et al., 2002;Faravelli et al., 2010;Faravelli et al., 2017). In terms of reliability, the fewest studies assessed cortisol over more than two days (k = 5). In total, we received data from 13 studies (k = 16), of which the respective authors of six studies (k = 8) regrouped or grouped their data based on the available assessment of child maltreatment (or, in case of raw data, the (re)grouping was performed by us). For further details on the characteristics of the included studies, see Table 1.

Risk of bias assessment.
Studies received an average total score of 51.9/100.0 (SD = 12.7, range: 27.6-76.7). With respect to the selection of participants (M = 61.0/100.0, SD = 15.5), the majority of studies used an established instrument to assess the experience of child maltreatment and matched their participants with respect to age, sex, and psychopathology (assessed with a gold-standard diagnostic tool). However, less than half of the comparisons (k < 13) assured that all participants in the child maltreatment group were exposed to child maltreatment, while none of the participants in the control group were, and only four studies (k = 5) used two different sources to establish the presence of child maltreatment. Relating to the appropriate assessment of DC (M = 53.5/100.0, SD = 17.4), most studies did report on clear sampling instructions (including prohibitions of certain behaviors before sampling as well as clear information about how to collect, where to place, and how to return samples) and provided details on their test protocol as well as on missing data and/or handling of outliers. However, only few studies provided clear instructions regarding the day of sampling (k = 9), assured that the time of awakening did not differ between the groups (k = 7), assessed sampling time adherence (k = 7), rescheduled sampling if participants were sick (k = 5), reported on batch analysis (k = 8), and assured that participants were not under any current extraordinary stress (k = 2). Finally, as shown by the relatively low scores related to the control of confounding variables (M = 40.4/100.0, SD = 24.6), less than half of the comparisons (k ≤ 13) excluded participants with a medical condition or participants working night shifts, assessed smoking, menstrual cycle, oral contraceptive, and medication use (especially medications affecting the central nervous system (CNS)) and thus assured that participants did not differ in these respects, and    Rel. = reliability, whereby the following definitions have been used: 1 = cortisol assessed over only one day, 2 = cortisol assessed over two days, 3 = cortisol assessed over more than two days; all = morning, evening, delta (evening minus morning value). a The two groups CPS-involved, stayed with birth parents and CPS-involved, placed in foster care were combined into one group. b This article did not appear in the initial search but was suggested by the respective author as best suited for citation in this meta-analysis. c The two groups early physical/sexual abuse and maltreated without early abuse were combined into one group. d Comparison between patient groups with and without early trauma. e The two groups Multidimensional Treatment Foster Care for Preschoolers (MTFC) and Regular Foster Care (RFC) were combined into one group. f This article did not appear in the initial search but was suggested by the respective author as best suited for citation in this meta-analysis. g Authors also administered ETI, but grouping was based on CTQ. h Percentage of non-Caucasians refers to total sample (N = 127). i Deprived adoptees were combined into one group and were compared to the non-deprived UK adoptees. j Comparison between PTSD patients with childhood sexual or physical abuse and those reporting no history of childhood sexual or physical abuse. k Comparison between MDD patients with early life stress and those without corresponding experiences. l This article did not appear in the initial search but was suggested by the respective author as best suited for citation in this meta-analysis. m Exclusion of repressed memory group; CM sample: recovered and continuous memory group, control sample: control group. n The data were regrouped including only the traumatized control subjects and the non-traumatized control subjects (remark: some in the control sample may have experienced other traumatic events). o The two groups some neglect and severe neglect were combined into one group. p The data are from the Netherlands Study of Depression and Anxiety, a large cohort study; this article did not appear in the initial search but was suggested by the respective author as best suited for citation in this meta-analysis. q Patient group: dysthymia, MDD, social phobia, panic with/without agoraphobia, generalized anxiety disorder. r No one of the participants fulfilled the diagnostic criteria for dysthymia, MDD, social phobia, panic with/without agoraphobia, or generalized anxiety disorder in the past 6 months. + It is not clearly stated in the text or in the subheading of the figure whether means and standard deviations or means and standard errors were presented; we assumed standard errors.
only k = 2 comparisons assured that participants did not differ with respect to other ELA or adult adversity. It should be noted, however, that several studies would have assessed some of the variables of interest, but since the data of six studies (k = 8) were (re)grouped, the corresponding information at the group level was no longer available for all of these studies. For details on individual scoring results of the primary studies as well as a summary of the average risk of bias scores, see Appendix B Table B8 or Table B1 for individual quality items.

Meta-analysis.
The results of the meta-analyses for the three indices of circadian activity (morning, evening, DSL) revealed no overall differences for morning (Hedges'g = − 0.02, 95% CI [− 0.11; 0.06], p = 0.586) and DSL cortisol (Hedges'g = 0.00, 95% CI [− 0.11; 0.11], p = 0.987) between the child maltreatment and the control sample (this also held true for the corresponding sensitivity analyses; for further details, see Table 2). In contrast, participants in the child maltreatment group had slightly elevated evening cortisol levels compared to their respective control group (

Meta-regression and subgroup analyses.
We conducted a number of pre-defined meta-regression and subgroup analyses. The summary results for each outcome index and each moderator examined are shown in Appendix D Table D1. In the following section, results for moderators found to significantly influence the main effects are outlined. Despite low heterogeneity in the effect size estimates between studies reporting on morning cortisol (I 2 = 29.1%), the following two continuous moderators influenced the main effect: (1) age at the time of study participation and (2) the sub-domain "appropriate measure of cortisol in the context of DC" of the quality assessment. With respect to the mean age of study participants, studies including older-aged samples reported a tendency for higher morning cortisol (β = 0.006, 95% CI [0.001; 0.010], p = 0.014, R 2 = 73.04%; when comparing the child maltreatment group to the control group) compared to younger samples. Concerning the assessment of cortisol, studies with higher quality scores were associated with lower morning cortisol (β = − 0.005, 95% CI [− 0.010; − 0.001], p = 0.012, R 2 = 85.49%) in the child maltreatment group compared to the control group. In addition, forming subgroups of studies using informant reports and those relying on self-report data to assess the presence of child maltreatment revealed overall reduced morning cortisol in those studies applying informant reports (Hedge's g = − 0.114, CI [− 0.204; − 0.024]), whereas a tendency for increased morning cortisol in the child maltreatment compared to the control group was observed in studies relying on self-report information (Hedge's g = 0.060, 95% CI [− 0.026; 0.146]; Q 1 = 7.48, p = 0.006). Since the majority of studies relying on informant reports used the presence of records to group participants, the corresponding subgroup comparison of the different grouping methods applied (records, cut-off scores, other grouping approaches (mainly specifications)) also reached significance (Q 2 = 8.32, p = 0.016). Finally, the subgroup comparison between studies where original data were extracted and those that (re)grouped their data for this meta-analysis also explained some of the between-study heterogeneity, with studies where original data could be extracted implying overall reduced morning cortisol (Hedge's g = − 0.096, 95% CI [− 0.177; − 0.016]) and those with (re)grouped data pointing to slightly increased morning cortisol levels in the child maltreatment group (Hedge's g = 0.087, 95% CI [− 0.014; 0.189]; Q 1 = 7.72, p = 0.005). However, since the majority of studies (k = 8 of k = 9) that relied on informant reports to group participants also belonged to the original data subgroup, interpretation of these findings should be done with caution. With respect to evening cortisol (I 2 = 2.0%), studies focusing on other types of ELA showed larger positive effect size estimates than studies focusing on child maltreatment only (Q 1 = 12.24, p < 0.001). Further, studies including original data showed larger positive effect size estimates than studies which provided (re)grouped data (Q 1 = 6.47, p = 0.011). It should be noted that by excluding those three studies (k = 4) focusing not only on child maltreatment experiences (but also including participants with loss experiences), the initial model on evening cortisol became insignificant (p = 0.098). Finally, with respect to DSL cortisol, no moderator was identified that significantly influenced between-study heterogeneity. . Concerning psychopathology, k = 9 comparisons included healthy participants, k = 6 involved participants all fulfilling diagnostic criteria for a mental disorder, k = 6 comparisons comprised participants with at least some fulfilling the diagnostic criteria for a mental disorder (with k = 4 not matched in terms of psychopathology), and k = 6 did not report on psychopathology at all. The majority of studies (n = 18, k = 23) employed self-reports to assess the presence of child maltreatment and only k = 4 relied on informant reports. Along with the use of different instruments, the grouping of participants into a child maltreatment and a control group varied, however, with the majority of studies using specific cut-off scores (n = 13, k = 17). Only one of the included studies (Klaus et al., 2018) not only focused on child maltreatment but also included participants with other types of ELA, including death of a close friend or relative, parental separation or divorce, major illnesses or injuries or other traumatic experiences. Three studies (k = 4) assessed the CAR over more than two days, and there were several studies (n = 8, k = 9) with cortisol sampled at only two time points (i.e., awakening and +30 min post awakening or +45 min post awakening). In n = 4 studies peak cortisol values were not observed at the same assessment time points for both groups. Finally, data from 12 studies (k = 15) were provided by the respective authors, of which k = 10 comparisons contained (re)grouped data. For further details on the characteristics of the included studies, see Table 3.

Risk of bias assessment.
Studies which assessed cortisol in response to awakening received an average total score of 52.8/100.0 (SD = 11.7, range: 35.7-76.7). With respect to the selection of participants (M = 65.7/100.0, SD = 14.9), the majority of comparisons (k > 13) ensured that all participants in the child maltreatment group were exposed to maltreatment, while none of the participants in the control group was, used an established instrument to assess the experience of child maltreatment and matched their participants with respect to age, sex, and psychopathology (assessed with a gold-standard diagnostic tool; k = 12 in case of self-reports). However, only one study used two different sources to establish the presence of child maltreatment. Concerning the appropriate assessment of cortisol in the context of the CAR (M = 56.8/100.0, SD = 12.0), most studies reported on clear sampling (k = 22) and collection (k = 27) instructions, provided information on the day of sampling (k = 14), collected at least three samples (with one sample between +30 and +45 min post awakening, k = 18) over at least two days (k = 14), provided information about how samples were collected, stored or analyzed (k = 27), and reported on outliers or missing data (k = 22). However, less than half of the comparisons (k < 14) assessed the time of awakening (thus ensuring that the two groups did not differ in this respect; k = 13), assessed sampling time adherence (k = 8), reported whether sampling was rescheduled if participants were sick (k = 7), reported on batch analyses (k = 9), and only k = 3 comparisons ensured that participants were not under any current extraordinary stress or whether sampling was rescheduled if participants experienced any stressor during the day of collection. Finally, many of the studies failed to control for several important confounding variables (M = 38.0/100.0, SD = 22.5). For instance, less than half of the comparisons (k < 14) reported whether participants were excluded if pregnant or working night shifts, assessed smoking, menstrual cycle, oral contraceptive, and medication use (especially medications affecting the CNS), thus ensuring that participants did not differ in these respects, and only k = 2 comparisons ensured that participants did not differ with respect to other ELA or adult adversity. Again, several studies would have assessed some of the variables of interest, but since the data of eight studies (k = 10) were (re)grouped, the corresponding information at the group level was no longer available for some of these studies. For details on individual scoring results of the primary studies as well as a summary of the average risk of bias scores, see Appendix B Table B9 or Table B2 for individual quality items.

Meta-analysis.
The pooled effect estimates for the different CAR indices (including corresponding sensitivity analyses) are displayed in Table 4. As shown in the corresponding lines, none of the examined indices suggested a difference (p > 0.05) in cortisol assessed in response to awakening when comparing the child maltreatment and the control group. For corresponding forest plots, see Appendix C Figs. 2.1-2.6. Between-study heterogeneity was in the moderate to high range for some of the outcome indices (I 2 = 41.8%− 73.1%), exceeding the level of significance (Q-statistics all p < 0.05) for peak, delta, AUC g, and AUC i cortisol, suggesting that other variables differing between the included studies might be of importance as well. Visual inspection of traditional and counter-enhanced funnel plots as well as Egger's regression test of funnel plot asymmetry implied absence of small-study bias for all outcome indices examined (all p > 0.05; for funnel plots, see Appendix C Figs. 2.1-2.6).

Meta-regression and subgroup analyses.
The summary results for the pre-defined meta-regression and subgroup analyses for each outcome index and each moderator examined are shown in Appendix D Table D2. In the following, the results for moderators found to significantly influence the main effects are outlined. The subgroup comparison between studies where original data were extracted and those that (re) grouped their data for this meta-analysis explained some of the betweenstudy heterogeneity for awakening cortisol and AUC g cortisol, with studies where original data could be extracted demonstrating overall reduced morning cortisol (Hedge's g = − 0.169, 95% CI [− 0.323; .164], respectively; Q 1 = 6.02, p = 0.014 and Q 1 = 23.80, p < 0.001, respectively). Since, as noted before, there is a relatively large overlap between studies reporting on morning cortisol assessed in the context of DC as well as on cortisol assessed in response to awakening (k = 14), this finding was to be expected. Furthermore, with respect to awakening cortisol, age seems to explain some of the between-study variance (R 2 = 70.27%), but in contrast to morning cortisol (DC), does not represent a significant moderator (p = 0.338). For delta cortisol, we identified the proportion of women in the sample as a significant continuous moderator (for peak cortisol: p = 0.071, for +60 min post awakening cortisol: p = 0.096, and for AUC g cortisol: p = 0.058), with an increase in the proportion of females being associated with lower cortisol when comparing the child maltreatment and the control sample (β = − 0.005, 95% CI [− 0.010; − 0.000], p = 0.040, R 2 = 0.0%). Finally, with respect to AUC g cortisol the sub-domain "appropriate measure of confounders" of the quality assessment explained some of the variance in the effect estimates, with studies with higher quality scores being associated with lower AUC g cortisol (β = − 0.012, 95% CI [− 0.019; − 0.005], p < 0.001, R 2 = 95.17%) in the child maltreatment group compared to the control group (for awakening cortisol: p = 0.086, R 2 = 64.43%).

Cortisol stress reactivity
3.2.3.1. Included studies. In total, our systematic search strategy identified n = 73 studies that measured cortisol in the context of a stressor. Of these, n = 35 publications (k = 39 comparisons) were included.
Owing to a lack of statistical information, the data of n = 22 studies that were eligible for inclusion could not be considered. The total sample of the k = 39 comparisons consisted of n = 4,284 (range: 17-699) participants with a mean age of 25.57 (SD = 12.33) years and an average of 66.1% females (SD = 28.5%, range: 0.0-100.0%; k = 2 studies contained a purely male sample and k = 13 a purely female sample). K = 10 comparisons involved samples consisting of children and/or adolescents only, k = 26 comprised exclusively adult participants, and k = 3 studies included both adolescent and adult subjects. Eleven studies (k = 12) did not report on percentages of Non-Caucasians, while the percentage of Non-Caucasians in the remaining studies ranged between 0.0 and 88.7% (M = 35.8%, SD = 27.9%). With respect to psychopathology, k = 10 comparisons included healthy participants, k = 6 involved participants all fulfilling diagnostic criteria for a mental disorder, and k = 10 comparisons comprised participants with at least some fulfilling the diagnostic criteria for a mental disorder (with k = 6 not matched in terms of psychopathology; however, in two of these studies, the authors were able to show that the presence of the specific mental disorder did not affect the cortisol data). Finally, k = 13 comparisons did not report on psychopathology at all. Various instruments to assess child maltreatment were applied with n = 7 studies relying on informant reports, n =    = specification (authors applied a specific definition of CM); SA = sexual abuse; PA = physical abuse; EA = emotional abuse; N = neglect; ◊ = The grouping of participants was not just based on CM experiences but also included other traumatic experiences; awak. = awakening; Rel. = reliability, whereby the following definitions have been used: 1 = cortisol assessed over only one day, 2 = cortisol assessed over two days, 3 = cortisol assessed over more than two days; AUC g = area under the curve with respect to ground; delta = peak minus awakening levels; AUC i = area under the curve with respect to increase. a Time points cortisol was sampled. b Authors also administered ETI, but grouping was based on CTQ. c ELS AA/AG and ELS GG were combed into one group and compared to no ELS AA/AG and no ELS GG group. d Percentage of non-Caucasians refers to total sample (N = 127). e Deprived adoptees were combined into one group and were compared to the non-deprived UK adoptees. f Raw data were provided; participants with sampling adherence of +/-5 min were included. g Patients were matched with respect to MDD but not with respect to PTSD. h Exclusion of repressed memory group; CM sample: recovered and continuous memory group, control sample: control group. i The data were regrouped including only the traumatized control subjects and the non-traumatized control subjects (remark: some in the control sample may have experienced other traumatic events). j The data are from the Netherlands Study of Depression and Anxiety, a large cohort study. The article did not appear in the initial search but was suggested by the respective author as best suited for citation in this meta-analysis. k Patient group: dysthymia, MDD, social phobia, panic with/without agoraphobia, generalized anxiety disorder. l No one of the participants fulfilled the diagnostic criteria for dysthymia, MDD, social phobia, panic with/without agoraphobia, or generalized anxiety disorder in the past 6 months. + It is not clearly stated in the text or in the subheading of the figure whether means and standard deviations or means and standard errors were presented; we assumed standard errors.
25 (k = 29) on self-report data, and three studies using both information sources. The most frequently used self-report was the Childhood Trauma Questionnaire (CTQ; n = 15, k = 17), and accordingly, cut-off scores were mostly used to group study participants in these studies. Nevertheless, several other instruments were also employed, resulting again in various grouping approaches. It should be noted that five studies did not focus on child maltreatment only (Hengesch et al., 2018;Ivanov et al., 2011;Kaiser et al., 2018;Otte et al., 2005;Ouellet-Morin et al., 2011) but also included participants with other ELA experiences. By far, the most frequently applied stress task was the Trier Social Stress Test (TSST) or the TSST-C (n = 18, k = 21) and the majority of studies contained some social-evaluative aspects (k = 29; for an overview of the different tasks applied in the various studies, see Appendix E). The average duration of the stressors used was about 19.02 (SD = 16.94) min (k = 27 between 10 and 20 min). In n = 3 studies no cortisol response following the onset of the corresponding stressor was observed. Interestingly, these studies all applied stressors that did not contain any social-evaluative challenges. In k = 29 comparisons a cortisol response was observed in both groups (with different peak times found for k = 5 comparisons), and finally in k = 7 comparisons, the response was observed only in one but not in the other group (k = 6 only in the control sample, k = 1 only in the child maltreatment sample). On average, the time between the onset of a stressor and peak cortisol levels being reached was 29.84 (SD = 15.98) min, with the majority of studies reporting that the peak was reached between +20 and +40 min post stressor onset (k = 26). The vast majority of studies used saliva samples to assess cortisol. Baseline, peak and recovery data were reported by most publications, with considerably fewer studies reporting on AUC i or AUC g indices. Finally, the data of n = 19 (k = 20) studies were provided by corresponding authors, with the data of k = 13 comparisons being (re)grouped. For further details on the characteristics of the included studies, see Table 5.

Risk of bias assessment.
Studies assessing cortisol in the context of a stressor received an average total score of 58.1/100.0 (SD = 12.4, range: 32.1-78.6). With respect to the selection of participants (M = 66.7/100.0, SD = 15.8), less than half of the included studies ensured that all participants in the child maltreatment group were exposed to child maltreatment, while none of the participants in the control group were (k = 18), used at least two different sources of information to establish the presence of child maltreatment (k = 8), and ensured that participants were matched with respect to psychopathology assessed with corresponding self-report questionnaires (k = 19). Most studies, however, employed an established measure to assess child maltreatment and matched their participants with respect to age, sex, and psychopathology (assessed with a gold-standard diagnostic tool). Regarding the appropriate measurement of cortisol in the context of a stressor (M = 61.7/100.0, SD = 13.4), less than half of the included studies reported on whether sampling was rescheduled if participants were sick (k = 9), ensured that all women were tested during a specific period of their menstrual cycle (k = 13), reported on whether samples were analyzed in one batch (k = 7), and only k = 5 comparisons included measures attempting to ensure that none of the participants were under any current stress at the time of testing or if testing was rescheduled if participants experienced any stressor during the respective day. Finally, concerning the appropriate control of potential confounders (M = 44.3/ 100.0, SD = 24.0), less than half of the included studies made efforts to exclude participants with any medical condition (k = 17) known to influence HPA axis functioning, ensured that the groups did not differ with respect to smoking (k = 14), clearly stated whether pregnant women were excluded (k = 14), ensured that participants did not differ with respect to the intake of medications known to influence the CNS (k = 10), and finally, only k = 5 comparisons took measures to ensure the two groups did not differ with respect to other types of ELA or adult adversity. The detailed quality ratings for the individual studies as well as the detailed description of the individual quality items can be found in Appendix B Table B10 or Table B3.

Meta-analysis.
The pooled effect estimates for the different indices are displayed in Table 6. The results of the sensitivity analyses (where appropriate) are also presented. As shown in the corresponding lines of Table 6, the results of the meta-analyses on baseline cortisol showed no significant overall differences in cortisol levels assessed prior to the onset of the respective stress task between the child maltreatment and the control sample (holding true for the sensitivity analyses). In contrast, the release of cortisol following the perception of a stressorexpressed as peak, recovery, delta, and AUC i cortisolwas lower in the child maltreatment group compared to the control sample (with all pooled effect estimates being in the small range), indicating a blunted cortisol stress reactivity (see Appendix C Figs. 3.1-3.6 for corresponding forest plots). For AUC g , the pooled effect estimate was not statistically significant (p = 0.081). However, when excluding one outlier study (Ivanov et al., 2011), significance was also reached for this outcome index (p = 0.021). Between-study heterogeneity was in the moderate to high range for some of the outcome indices (I 2 > 50%), exceeding the level of significance (Q-statistics all p < 0.05) for all but AUC i cortisol. Visual inspection of traditional and counter-enhanced funnel plots as well as Egger's regression test of funnel plot asymmetry revealed the absence of small-study bias for baseline, peak, AUC g, and AUC i cortisol levels (all p > 0.05). However, the Egger's regression test of funnel plot asymmetry reached significance for delta as well as recovery cortisol levels (p < 0.05), suggesting the presence of small-study bias (see Appendix C Figs. 3.1-3.6 for corresponding funnel plots).

Meta-regression and subgroup analyses.
We conducted a number of pre-defined meta-regression and subgroup analyses focusing on peak, delta, recovery, AUC g, and AUC i cortisol. The summary results for each outcome index and each moderator examined are shown in Appendix D Table D3. In the following section, the results for moderators found to significantly influence the main effects are outlined. For delta, recovery, and AUC i cortisol, we identified the proportion of women in the sample as a continuous moderator (for AUC i : p = 0.052), with an increase in the proportion of females being associated with lower cortisol secretion following the perception of a stressor when comparing the child maltreatment and the control sample. Additionally, for delta cortisol, the proportion of participants fulfilling diagnostic criteria for a mental disorder significantly moderated the summary effect, with an increase of the proportion being associated with a stronger blunting of the cortisol stress response (β = − 0.006, 95% CI [− 0.010; − 0.002], p = 0.007, R 2 = 99.20%). This finding, however, should be interpreted with caution, as only two of the studies that included a purely clinical sample (Schalinski et al., 2015;Suzuki et al., 2014) reported on delta cortisol. In addition, and in contrast to the other studies involving a clinical sample (exception Rao & Morris, 2015), these two studies reported relatively strong negative effects. Nevertheless, despite considerable heterogeneity between the studies, all outcome indices showed stronger effects for studies including purely clinical samples and markedly weaker effects for those studies that involved healthy subjects only (see results subgroup analyses). Furthermore, stronger effects were found for studies that observed a cortisol response in just one of the groups (holding true for all outcome indices) compared to studies that found a response in both groups and those that found no response in either of the groups, with the subgroup comparison reaching significance for delta (Q 2 = 4.53, p = 0.033) and AUC i (Q 2 = 12.33, p = 0.002) cortisol. Comparing studies focusing on child maltreatment experiences only to those involving participants with other types of ELA as well showed that the few studies that also considered other types of ELA overall yielded greater negative effect estimates for all outcome indices, but significant for delta cortisol only (Q 1 = 3.95, p = 0.047). However, it should be noted that heterogeneity within these studies varied substantially between the different outcome indices and thus depended highly on the included studies. Finally, again depending on the outcome index investigated (and thus on the studies included), the different subdomains of the quality assessment appeared to explain part of the variance in the effect estimates between studies, although this effect was only significant for AUC i cortisol (and only for the subdomain: selection of participants: β = 0.009, 95% CI [0.002; 0.016], p = 0.011, R 2 = 77.45%). In general, there was a tendency that a higher study quality was associated with a smaller negative difference in cortisol secretion between the child maltreatment and the control group. As an additional note, although subgroup comparisons between studies with (re)grouped data to those with original data could not explain significant heterogeneity between studies for any outcome indices, those studies with (re) grouped data still showed considerably less pronounced effects. Eleven articles, containing k = 17 comparisons, involving a total of n = 2,222 participants (range: 16-1,112) assessed cortisol in the context of the DST. Of these, k = 16 reported on baseline cortisol levels (cortisol assessed before dexamethasone administration; pre-DST), k = 17 on cortisol assessed following the administration of dexamethasone (post-DST), and k = 9 contained information on delta values (post-DST cortisol minus pre-DST cortisol). The included studies mainly consisted of adults, with only one study involving adolescents. The average age was 33.32 (SD = 8.32) years and studies ranged from 45.6 to 100.0% (M = 72.6%, SD = 18.6%) in terms of the proportion of women (k = 3 studies with a purely female sample). Five studies (k = 8) did not report on the percentage of Non-Caucasians, while the percentage of Non-Caucasians in the remaining studies ranged between 0.0 and 100.0% (M = 53.4%, SD = 41.2%). Three out of the k = 17 comparisons involved healthy participants and k = 14 included participants in whom the proportion of people suffering from a mental illness ranged from 13.2 to 100.0% (k = 9 studies involved purely clinical samples and k = 3 involved participants where the child maltreatment and the control sample were not matched in terms of psychopathology). Various instruments to assess child maltreatment were applied, all relying on self-report information. The most common self-report used was the CTQ (n = 5, k = 9). It should be noted that the child maltreatment sample of the study from Faravelli et al. (2010) did not only consist of participants with child maltreatment experiences, but also included several participants with loss experiences. Approximately half of the studies used established cut-off values to group participants in the corresponding child maltreatment and control groups, with the others mostly applying specific definitions. All but three studies (k = 4) used 0.5 mg of dexamethasone and the data of four studies (k = 8) were re-grouped for this meta-analysis (see Table 7 for more details).
3.2.4.1.2. Risk of bias assessment. Studies assessing cortisol in the context of the DST received an average score of 72.1/100.0 (SD = 19.0) for selection of participants, 61.3/100.0 (SD = 12.1) for appropriate assessment of cortisol, and 44.7/100.0 (SD = 20.3) for adequate controlling for confounders, resulting in an average overall score of 58.1/ 100.0 (SD = 13.3, range: 40.0-76.0). The detailed quality ratings for the individual studies as well as the detailed description of the individual quality items can be found in Appendix B Table B11 or Table B4. None of the studies included used two different sources to establish the presence of child maltreatment, reported whether cortisol was analyzed in one batch and whether participants were excluded when working night shifts, and only one study assessed whether exposure and control groups differed in relation to the experience of other traumatic events during childhood or adulthood. Moreover, less than half of the comparisons reported whether sampling was postponed when participants were sick (k = 7), whether dexamethasone intake was checked (k = 7), whether participants differed in smoking (k = 6), intake of oral contraceptives (k = 7), and their use of medication (with CNS effect; k = 4).  1-4.1.3). Corresponding sensitivity analyses excluding studies with extreme effect sizes and influential studies did not change the overall results (see Table 8 for related statistics).

Meta-regression and subgroup analyses.
We conducted a number of pre-defined meta-regression and subgroup analyses focusing on post-DST cortisol only. The summary results for each moderator examined are shown in Appendix D Table D4. The only moderator that Table 4 Summary statistics for random-effects models of included studies that reported on cortisol assessed in the context of the cortisol awakening response (CAR), displayed separately for the different cortisol outcome indices. Note. AUC g = area under the curve with respect to ground; AUC i = area under the curve with respect to increase; 95% CI = 95% confidence interval; Pred. int.      Task; arithmet. = arithmetic's; self-discl. = self-disclosure; rel.-build. = relationship-building; PST = Psychosocial Stress Test; perform. = performance; interper. = interpersonal; self-eval. = self-evaluative; ident. = identical; adj. = adjusted; bl = baseline (if possible, value just prior stressor onset was extracted); delta = peak values minus baseline value; AUC g = Area under the curve with respect to ground; AUC i = Area under the curve with respect to increase; • no cortisol peak was observed in both groups, therefore the time point closest to +25 min post stressor onset was extracted. a Community-dwelling older adult sample. b Community-dwelling younger adult sample. c Percentage of non-Caucasians refers to total sample (N = 82). d Percentage of non-Caucasians refers to total sample (N = 88). e CM sample and control sample were matched with respect to MDD but not with respect to PTSD. f Female and male participants with early life adversity were combined into one group and were compared to female and male control participants. g Percentage of non-Caucasians refers to total sample (N = 42). h Percentage of non-Caucasians refers to total sample (N = 127). i This article did not appear in the initial search but was suggested by the respective author as best suited for citation in this meta-analysis. j Demographic data for total sample (age, percentage of females, percentage of Non-Caucasians) refers to N = 76, cortisol data available for N = 67. k Demographic data for total sample and CM and control sample refers to N = 44, cortisol data available for N = 35. l Comparison between participants with MDD and CM and those with MDD without CM. m The data were regrouped, taking into account only the patient group with stress-related disorders. n Female and male maltreated participants were combined into one group and were compared to female and male control participants. o Authors also administered CTQ, but grouping was based on ETI. + It is not clearly stated in the text or in the subheading of the figure whether means and standard deviations or means and standard errors were presented; we assumed standard errors.
significantly influenced the main effect was the proportion of women in the respective samples (β = − 0.014, 95% CI [− 0.022; − 0.005], p = 0.003, R 2 = 95.16%), with an increase of the proportion being associated with lower cortisol levels following the administration of dexamethasone (increased cortisol suppression) when comparing the child maltreatment sample with participants from the control sample. Since only one study was included that focused on ELA in general, the significant subgroup result of different pooled effect estimates for studies focusing on child maltreatment only and the study including also other childhood adversities has to be interpreted with caution. None of the methodological quality criteria significantly influenced the pooled effect estimate.

Combined dexamethasone-corticotropin releasing hormone test 3.2.4.2.1. Included studies.
In total, our search strategy identified n = 21 studies that measured the responsivity of the pituitary to CRH. Of these, n = 6 studies consisting of k = 8 comparisons reporting on cortisol in the context of the Dex-CRH test and n = 2 studies reporting on cortisol in the context of the CRH test were included (k = 10). Of the included studies, k = 4 comparisons reported on baseline cortisol (cortisol assessed after the administration of dexamethasone, before CRH injection), k = 6 on peak (after the CRH injection) and delta cortisol (peak minus baseline), k = 9 on AUC g, and k = 6 on AUC i cortisol. There was only one study with available recovery data (in all other studies, the peak value corresponded to the last measurement time point). The total sample of the k = 10 comparisons consisted of n = 561 participants (range: 21-230) with a mean age of 31.19 (SD = 12.70) years and an average of 60.0% females (SD = 39.6%, range: 0.0-100.0%; k = 2 comparisons contained a purely male sample and k = 4 a purely female sample). Studies reporting on Dex-CRH cortisol consisted of adult samples only, whereas the two studies focusing on the CRH test were conducted with children or adolescents. Four studies (k = 5) did not report on ethnicity with the other articles ranging between 34.6 and 57.1% in terms of percentage of Non-Caucasians (M = 45.9%, SD = 9.3%). With respect to psychopathology, k = 3 comparisons included healthy participants, k = 3 involved participants all fulfilling diagnostic criteria for MDD, k = 1 comparison included participants with mixed diagnoses, and k = 3 involved participants where the child maltreatment and the control sample were not matched for psychopathology. The authors of the three comparisons in which the subjects were not matched for psychopathology, however, showed that the presence of the specific mental disorder did not affect the cortisol results. Again, various instruments to assess child maltreatment were used, with the majority of studies relying on self-report data (k = 8). The grouping of participants also differed, including the use of cut-off scores, thirds, the presence of a CPS record and the utilization of specific definitions. The data of n = 2 studies were provided by corresponding authors, with the data of one study being regrouped. See Table 9 for more details.

Risk of bias assessment.
Studies assessing the responsivity of the pituitary to CRH received an average total score of 69.8/100.0 (SD = 8.0, range: 58.3-83.3). All studies, or k = 9 out of 10 comparisons used an established instrument to assess the experience of child maltreatment, matched their participants with respect to age and sex, and made efforts to assess psychopathology (selection of participants: M = 76.3/100.0, SD = 12.4). Additionally, most comparisons rescheduled the sampling when participants were sick (k = 8), provided details on their test protocol (k = 10), and on how cortisol was collected, stored and analyzed (k = 10; appropriate measurement of cortisol in the context of the Dex-CRH: M = 62.3/100.0, SD = 14.8). Finally, as shown by the relatively high scores related to the control of confounding variables (M = 69.0/100.0, SD = 10.3), the majority of studies assessed or controlled for a wide variety of potential influential factors. Overall, however, only a few comparisons used two different sources of information to establish the presence of child maltreatment (k = 2), verified the ingestion of dexamethasone (k = 2), reported on whether samples were assessed in one batch (k = 1), or whether participants were excluded in case of working night shifts (k = 2), and only k = 2 comparisons made any effort to ensure that the exposure and control group did not differ with respect to other types of ELA or adult adversity. For details on individual scoring results of the primary studies as well as a summary of the average risk of bias scores, see Appendix B Table B12 or  Table B5 for individual quality items.

Meta-analysis.
The pooled effect estimates for the different indices are displayed in Table 10. For each outcome index the analysis was repeated excluding the studies focusing on the CRH test only. Results of the sensitivity analyses (where appropriate) are also presented. None of the pooled effect estimates were significant, indicating that there is no overall difference in cortisol assessed both before and after the administration of CRH and holding true for all outcome indices (all p > 0.05). Between-study heterogeneity was however in the moderate range (I 2 > 50%), exceeding the level of significance (Q-statistics all p < 0.05) for peak, delta, AUC g, and AUC i cortisol. Due to a  Carpenter et al., 2007, Ivanov et al., 2011, and Suzuki et al., 2014a; * exclusion of influential study Carpenter et al., 2007. Peak cortisol: • exclusion of outlier studies Carpenter et al., 2007, Heim et al., 2000b, Ivanov et al., 2011, and Suzuki et al., 2014b; * exclusion of influential study Carpenter et al., 2007. Delta cortisol: • * exclusion of outlier and influential study Suzuki et al., 2014b. Recovery cortisol: • exclusion of outlier studies Ali & Pruessner, 2012, Carpenter et al., 2007, Ivanov et al., 2011, and Suzuki et al., 2014a; * exclusion of influential study Carpenter et al., 2007. AUC g cortisol: • exclusion of outlier study Ivanov et al., 2011. See Appendix C Figs. 3.1-3.6 for corresponding forest and funnel plots.  f Comparison between PTSD patients with childhood sexual or physical abuse and those reporting no history of childhood sexual or physical abuse. g The sample represents a highly traumatized, low-income cohort; data provided separately for those patients all fulfilling diagnostic criteria for PTSD and those patients without PTSD. h The data are from the Netherlands Study of Depression and Anxiety, a large cohort study; this article did not appear in the initial search but was suggested by the respective author as best suited for citation in this meta-analysis. i Patient group: dysthymia, MDD, social phobia, panic with/without agoraphobia, generalized anxiety disorder. j No one of the participants fulfilled the diagnostic criteria for dysthymia, MDD, social phobia, panic with/without agoraphobia, or generalized anxiety disorder in the past 6 months.
limited number of studies (k < 10), however, no subgroup analyses and meta-regressions were performed. Visual inspection of traditional and counter-enhanced funnel plots as well as Egger's regression test of funnel plot asymmetry suggested the absence of small-study bias (all p > 0.05 A total of n = 8 independent studies, comprising k = 9 comparisons, with an overall sample size of n = 978 participants, reported on HCC. The sample size ranged from n = 22 to n = 537 participants and the majority of studies included mainly female subjects (with the percentage of females ranging between 50.7 and 100.0%, M = 84.4%, SD = 19.2%; k = 3 contained a purely female sample). Four comparisons included children and/or adolescents and k = 5 included adult participants only. The average age was 28.13 (SD = 14.88) years with the youngest participant being about 3 years and the oldest around 79 years. With respect to ethnicity, most studies included samples composed mainly of an ethnic majority group with the percentage of Non-Caucasians ranging between 0.0 and 87.2% (M = 27.0%, SD = 33.8%). Three studies did not reporting on ethnicity. Concerning psychopathology, k = 4 comparisons included only healthy participants, k = 2 consisted of a primarily clinical sample (with 96.0-100.0% meeting diagnostic criteria for a mental disorder), and k = 3 comparisons did not report on psychopathology. Only one study used information about child maltreatment from an informant source, with the others all using self-report data. The assessment of child maltreatment and the grouping of participants into a child maltreatment and a control group varied between the studies. This refers to both the instruments used and to the grouping procedure, which included the use of cut-off scores, percentiles, clustering methods, and the use of specific definitions. Of the six studies (k = 7) that provided data on request, the respective authors of four studies (k = 5) regrouped or grouped their data based on the available assessment of child maltreatment (or, in case of raw data, the (re)grouping was performed by us). For further details on the characteristics of the included studies, see Table 11.
3.2.5.1.2. Risk of bias assessment. Table B13 in Appendix B provides an overview of the individual scoring results of the primary studies as well as a summary of the average risk of bias scores. The detailed description of the quality items can be found in Table B6. On average a total score of 58.9/100.0 was received (SD = 16.6, range: 24.0-76.0). Regarding participant selection (M = 75.0/100.0, SD = 10.8), the majority of studies assessed child maltreatment with an established instrument and ensured that all the participants in the child maltreatment group did experience child maltreatment, while none of the participants in the control group did. Most studies also matched participants with respect to age, sex, as well as psychopathology. However, only one study used two different sources of information to establish the presence of child maltreatment. With respect to the appropriate measure of HCC (M = 55.6/100.0, SD = 15.1), most studies obtained hair samples from the posterior vertex of the head and reported on a clear sampling analysis protocol and information about outlying or missing data. The majority of the studies however, neither assessed the experience of any ongoing life stressor (k = 0) nor whether HCC samples were assessed in one batch (k = 2). With respect to the appropriate control of confounding variables, the quality of the different studies varied quite strongly (M = 48.3/100.0, SD = 33.3, range: 0.0-88.9). It should be noted, however, that several studies would have assessed the variables of interest, but since the data of four studies (k = 5) were regrouped, the information at the group level was no longer available (this is marked accordingly in the table). No studies reported whether participants were excluded if they worked night shifts (k = 0) and few reported whether participants in the two groups did not differ with respect to other traumatic experiences in childhood or adulthood (k = 3). Moreover, less than half of the comparisons reported whether participants with any type of addiction (k = 4) or pregnant women (k = 4) were excluded as well as whether participants were comparable with respect to medication use (medications with CNS effects, k = 4).

Meta-analysis.
Pooling the results of the k = 9 comparisons (n = 978), we found no significant effect (Hedges'g = − 0.05, 95% CI [− 0.33; 0.24], p = 0.749), suggesting no overall difference in HCC in the child maltreatment sample compared to the control sample (the corresponding forest plot is shown in Appendix C Fig. 5.1). There was significant, moderate heterogeneity in the effect size estimates between studies (Q 8 = 17.14, p = 0.029, I 2 = 53.3%). The between-study heterogeneity was not caused by extreme effect sizes as there was no such outlier study. However, one study exerting a high influence on the overall effect estimate was identified (do Prado et al., 2017). Traditional and contour-enhanced funnel plots are shown in Appendix C Fig. 5.1 and visual inspection of them suggested absence of small-study bias as did the Egger's regression test of funnel plot asymmetry (intercept = 0.970, p = 0.285; but attention k < 10). Excluding the study from do Prado et al.
(2017) as part of the sensitivity analysis, heterogeneity decreased from I 2 = 53.3% to I 2 = 10.2% (Q 7 = 7.80, p = 0.351), yielding a small negative effect, which reached significance (k = 8, n = 921, Hedges'g = − 0.20, 95% CI [− 0.34; − 0.06], p = 0.004). Despite varying effects of the primary studies, the result of the sensitivity analysis suggests an overall reduction of HCC in the child maltreatment sample compared to the control sample, with the prediction interval (− 0.37; − 0.03) pointing in the same direction. However, three of the five studies indicating reduced HCC in the child maltreatment compared to the control group had not used a gold-standard diagnostic tool to assess psychopathology and thus matching in this respect is not properly judgeable.

24-hour urinary free cortisol
3.2.5.2.1. Included studies. Eleven studies assessing cortisol in urine were identified through the systematic search. Of these, only n = 4 studies including a total of n = 110 participants (n = 108 with valid cortisol data) could finally be included. Participants were on average 22.17 (SD = 12.94) years old and three out of the four studies comprised female participants only (M = 85.1%, SD = 29.8%). The majority of the participants were Caucasian, with the percentage of Non-Caucasians Table 8 Summary statistics for random-effects models of included studies that reported on cortisol assessed in the context of the dexamethasone suppression test (DST), displayed separately for the different cortisol outcome indices. Note. 95% CI = 95% confidence interval; Pred. int. = prediction interval. a Intercept and (p values) displayed.

Table 9
Summary characteristics of included studies that reported on cortisol assessed in the context of the combined dexamethasone-corticotropin releasing hormone (Dex-CRH) test. Note. N = sample size; M = mean; SD = standard deviation; The sex ratio is indicated as percentage of female participants; Ethn. = ethnicity; The ethnicity ratio is indicated as percentage of non-Caucasians; NA = not assessed; Psychopath. = psychopathology, whereby the following definitions have been used: yes = at least some of the participants met diagnostic criteria for a psychiatric disorder, no = none of the participants met diagnostic criteria for any psychiatric disorder, * = groups are not matched with respect to psychopathology; MDD = Major Depressive Disorder; PD = Personality Disorder; CM = child maltreatment; n = sample size; quest. = questionnaire; CTQ = Childhood Trauma Questionnaire; ETI = Early Trauma Inventory; CPS = Child Protective Services; PSS = Psychosocial Schedule for School Aged Children; spec. = specification (authors applied a specific definition of CM); EA = emotional abuse; PA = physical abuse; SA = sexual abuse; N = neglect; bl = baseline (if possible, sample just prior CRH injection was extracted); AUC g = area under the curve with respect to ground; AUC i = area under the curve with respect to increase. Dex-dose = dose of dexamethasone administered to participants; CRH-dose = dose of corticotropin-releasing hormone administered to participants. a All extracted values are adjusted for age, gender, and effects of four other maltreatment subtypes. b Authors also administered ETI, but grouping was based on CTQ. c The low CTQ PD group and the normal control group were combined and compared to the high CTQ PD group. d Sample assessed at − 150 min prior CRH injection was not included in analyses (therefore 7 instead of 8 samples; baseline sample at 4 pm). e Delta defined as difference between maximum value after CRH injection and the mean of the three baseline measures. f The data were regrouped, taking into account only the patient groups. g No dexamethasone was administrated (CRH test only). h Comparison between the patient groups with and without abuse. i 54% were subjected to ongoing EA. j Baseline defined as the mean of the three pre CRH infusion samples. + It is not clearly stated in the text or in the subheading of the figure whether means and standard deviations or means and standard errors were presented; we assumed standard errors.
ranging between 12.0 and 42.3% (M = 27.6%, SD = 15.2%). With respect to psychopathology, most of the subjects in the child maltreatment group met the criteria for a mental disorder, while the subjects in the control group were mainly healthy controls (n = 3; n = 1 did not report on psychopathology). However, in two of the studies included, the authors were able to demonstrate that the presence of the specific mental disorder did not affect the 24-hour UFC data. Three of the four studies focused exclusively on sexual abuse experiences without collecting information about other types of child maltreatment, and two studies recruited participants solely on the basis of self-identification without using any established measurement to assess child maltreatment. All data were extracted from the respective articles. See Table 12 for further details.

Risk of bias assessment.
The risk of bias assessment of the primary studies as well as the average risk of bias scores can be found in Appendix B Table B14. The detailed description of the quality items can be found in the Table B7 . In all n = 4 studies participants were matched with respect to age and gender. In addition, all four articles provide detailed information on how UFC samples were collected, stored and analyzed and all studies give a relatively good overview of participants' medication use. The n = 3 studies that focused on sexual abuse did not provide information about other maltreatment experiences, reducing the quality of the grouping into a child maltreatment and a clear control group without any types of child maltreatment experiences. In only one of the four studies it was ensured that the participants did not experience any ongoing significant life stressors, assessed UFC over at least three days, provided batch analysis information, and none of the studies ensured that the participants in the two groups did not differ with respect to other traumatic experiences in childhood or adulthood.

Meta-analysis.
Pooling the results of the n = 4 studies (n = 108), the aggregate effect size was Hedges'g = 0.07, 95% CI [− 0.83; 0.98], p = 0.874, suggesting no overall difference in 24-hour UFC in the child maltreatment sample compared to the control sample (the corresponding forest plot is shown in Appendix C Fig. 5.2). There was significant, high heterogeneity in the effect size estimates (Q 3 = 14.79, p = 0.002, I 2 = 79.7%), indicating high inconsistencies between studies. No outlier study was detected, but the study from Lemieux et al., (2008) had a high influence on the overall result. Traditional and contour-enhanced funnel plots are also shown in Appendix C Fig. 5.2 and visual inspection of them suggested the absence of small-study bias as did the Egger's regression test of funnel plot asymmetry (intercept = -6.898, p = 0.428; but attention k < 10). The sensitivity analysis substantially reduced heterogeneity (I 2 = 79.7% to I 2 = 8.5%) and yielded a medium significant overall effect (n = 83, Hedges'g = 0.56, 95% CI [0.11; 1.00], p = 0.014) with participants in the child maltreatment group showing higher 24-hour UFC concentrations compared to the control sample. However, considering the small sample size and the large prediction interval (− 2.33; 3.45), it is unclear what the results of future studies will show. In addition, it should be noted that the study excluded in the context of the sensitivity analysis (Lemieux et al., 2008) received the highest average quality score.

Discussion
This series of meta-analyses, based on a systematic review of the literature, examined the existing evidence on the association between child maltreatment and cortisol metabolism including various measures of HPA axis activity. Measures of interest ranged from cortisol assessed in the context of the circadian rhythm (DC) to cortisol assessed in response to awakening (CAR), in response to the perception of a stressor (cortisol stress reactivity) and pharmacological challenges (DST, Dex-CRH test, CRH test), to cumulative measures of cortisol secretion, namely 24-hour UFC and HCC.

Main findings
Consistent with the findings of two previous meta-analyses (Bernard et al., 2017;Fogelman and Canli, 2018) we did not find overall differences in any of the indices related to cortisol secretion in the context of circadian activity (with the exception of evening cortisol) as well as in response to awakening (CAR) when comparing individuals with child maltreatment to those without corresponding experiences. The finding of slightly increased evening cortisol in individuals exposed to child maltreatment was mainly driven by a few studies in which the child maltreatment group also included individuals with loss experiences and thus should be interpreted with caution in the context of this metaanalysis. Individuals with a history of child maltreatment, however, appear to show a blunted cortisol stress response. Though not yet evident before being introduced to a corresponding stressor (baseline cortisol), blunting was seen in indices reflecting total cortisol production (peak, recovery cortisol) as well as in indices expressing changes in cortisol over time (delta, AUC i cortisol) following the perception of a stressor. These findings are consistent with the results of a previous meta-analysis examining the effect of ELA on cortisol response to social stress (Bunea et al., 2017), albeit with somewhat smaller effects observed in our meta-analysis. Interestingly, this blunting was not observed in studies where CRH injections (Dex-CRH test) were used to initiate the secretion of cortisol. However, the number of studies on the    139). b Three outliers in hair cortisol were removed, however it is unclear to which group this applied. c Comparison between women with abuse/neglect without recent interpersonal violence exposure and non-trauma controls. d The data were regrouped, taking into account only the patient group with stress-related disorders. e Grouping was based on k-means clustering method based on ETI sum score excluding the general trauma subscale (remark: the sample represents a refugee's sample with all having experienced some type of trauma during their life). f The data were regrouped, taking into account only the traumatized control subjects and the non-traumatized control subjects (remark: some in the control sample may have experienced other traumatic events); data provided for a total sample of N = 58 instead of N = 53 as presented in the paper.
Dex-CRH test was much smaller compared to the number of studies assessing cortisol in response to a stressor. In addition, no difference in the negative feedback mechanism of the HPA axis (at least at the level of the pituitary gland), measured by oral administration of dexamethasone, was found between the two groups. Finally, with respect to the few studies reporting on cumulative measures of cortisol secretion including 24-hour UFC and HCC, no differences were observed in both of these measures between those exposed to child maltreatment and those without corresponding adversity. Respective sensitivity analyses excluding influential studies, performed within the context of these two outcome indices, on the other hand, suggest increased 24-hour UFC and slightly reduced HCC in maltreated individuals. However, especially the finding of increased 24-hour UFC should be interpreted with caution, since the overall sample size was small and the large prediction interval of the pooled effect estimate suggests a high degree of uncertainty regarding the results of upcoming studies (for a summary overview about all main findings see Table 13).

Between-study heterogeneity and the influence of moderators
Although no overall differences in cortisol secretion were found for the majority of the HPA axis activity measures except for cortisol assessed in response to a stressor across studies, we generally observed a significant degree of variability in the effect estimates between studies (especially for indices reflecting cortisol secretion in the context of awakening, after a stressor and following the Dex-CRH test), suggesting the likely influence of additional variables in moderating the effect of child maltreatment on cortisol regulation. Before discussing some of the moderators that systematically accounted for between-study heterogeneity, holding true for various of the HPA axis activity measures, it should be kept in mind that the majority of studies were conducted with predominantly young, female adults who belonged to an ethnic majority group and in whom child maltreatment experiences were assessed mainly through self-reports. In addition, a considerable number of studies did not report on psychopathology, and studies involving clinical samples were fairly heterogeneous in terms of the predominant mental disorder (e.g., MDD versus posttraumatic stress disorder (PTSD)). Accordingly, our ability to find important moderators or relevant subgroup differences might have been limited.

Influence of participant related characteristics
One of the moderators that explained some of the between-study heterogeneity in effect sizes (in line with findings from Bunea et al., 2017;and Zorn et al., 2017) was the proportion of females in the respective sample, with a higher proportion being associated with a stronger blunting of cortisol secretion (CAR, cortisol stress reactivity, and DST). Corresponding sex differences, particularly with respect to stress reactivity, have been repeatedly reported, with men showing higher cortisol levels to psychosocial stress than women (Liu et al., 2017). Factors that influence corticosteroid binding globulin (CBG) levels and thus the level of free cortisol appear to account for some of these gender effects including the use of oral contraceptives and the production of sex steroids throughout the menstrual cycle (e.g., Foley and Kirschbaum, 2010). Both the intake of oral contraceptives and the assessment of the menstrual cycle were not adequately evaluated in many of the studies included and therefore matching of the two groups in these respects was not properly controlled. Apart from the proportion of women in the respective sample, there was little evidence that the remaining participant related characteristics such as age, ethnicity, and participant diagnosis accounted for variability in the effect estimates among primary studies. Interestingly, even though psychopathology did not account for heterogeneity in the child maltreatment cortisol relationship (at least for the majority of outcome indices), a tendency for stronger blunting in clinical samples compared to healthy controls was observed for indices related to cortisol stress reactivity. Nevertheless, an attenuation of the cortisol stress response was observed in participants   Table 12 Summary characteristics of included studies that reported on 24-hour urinary free cortisol (24-hour UFC Age refers to total sample (N = 28) including PTSD group. e Comparison between women with a history of childhood sexual abuse without PTSD and controls; one cortisol specimen was lost due to technical error, however unclear to which group this applied.
with child maltreatment experiences that at the time of measurement did not report a mental disorder, suggesting that alterations in HPA axis activity may be present prior to the development of mental health issues or independent of psychopathology.

Influence of trauma related information
Another fairly consistent moderator accounting for some of the between-study heterogeneity (DC and CAR, tendency for cortisol stress reactivity) was whether or not data were (re)grouped for the purpose of this meta-analysis, with (re)grouped data showing a tendency towards smaller effects. In some of the studies that provided (re)grouped data, grouping of participants into a child maltreatment and a control group was based on relatively low severity thresholds (particularly for cortisol assessed in the context of awakening and circadian activity), which might account for this finding. Indeed, and in agreement with the observed dose-dependent relationship between child maltreatment and health impairments (e.g., Clemens et al., 2018;Norman et al., 2012), the severity of child maltreatment experiences, though difficult to assess (Jackson et al., 2019), might actually be of particular importance in explaining variability between the association of child maltreatment and HPA axis functioning. Interestingly, several studies, especially those which assessed cortisol in response to a stressor, found a stronger blunting in cortisol secretion following the perception of a stressor with an increase in the severity of child maltreatment (Lovallo et al., 2019;Ouellet-Morin et al., 2018;Trickett et al., 2014;Voellmin et al., 2015). Unfortunately, our group comparison approach did not allow us to investigate this association systematically. In line with the difficulties in defining child maltreatment and the possibility to rely on various assessment modalities (Cicchetti and Toth, 2005;Manly, 2005), studies generally differed widely in their child maltreatment assessment and grouping approaches. For instance, studies that relied on established self-reports such as the CTQ or the Childhood Experience of Care and Abuse interview (CECA) grouped their participants based on validated cut-off scores, while other studies applied specific definitions (e.g., Heim et al., 2000: "repeated abuse, once a month or more for at least 1 year"), sometimes based on self-developed assessment tools (e.g., Groër et al., 2016;Martinson et al., 2016;Smeets et al., 2007) and still others relied on the presence or absence of a specific record, such as a CPS record (e. g., Bernard et al., 2010;Cicchetti et al., 2010;Hibel et al., 2019). However, neither the assessment (self-report, informant report, mixed) nor the grouping method (cut-offs, other, record) explained variance in the effect estimates of any of the HPA axis activity outcome indices, with the exception of larger effect sizes found in studies focusing on informant reports compared to self-reports for waking/morning cortisol (which is consistent with findings from Bernard et al., 2017). Since there were far fewer studies using informant reports as opposed to self-reports (a pattern also found among studies on the prevalence of child maltreatment; Stoltenborgh et al., 2015), comparison between these approaches might have been inappropriate and the chances of detecting differences accordingly low. Importantly, some of the included studies failed to sufficiently ensure that control participants were not subjected to any type of child maltreatment. The studies that performed poorly in terms of ensuring maltreatment did not take place in the control group were those that relied on specific records, such as CPS records, and studies focusing on one particular type of child maltreatment. In corresponding studies -besides the absence of a record or the corresponding type of maltreatment -no other measures were applied to ensure that control participants had not experienced any child maltreatment. This may have influenced our results, since child maltreatment experiences are rather common in the general population, as demonstrated by epidemiological studies (Witt et al., 2017). Neither the role of age at maltreatment onset nor the chronicity of the maltreatment experiences on HPA axis activity could be investigated, as very few studies applied measures that assessed these two factors in the first place -a finding in line with a recent review summarizing research on the operationalization of child maltreatment over the last 10 years (Jackson et al., 2019).

Influence of cortisol related information
In our meta-analysis, neither sample type (blood, saliva), type of stressor (social-evaluative, other), slope type (wake-to-bed, other), whether cortisol was assessed in reference to awakening or not, nor dose of dexamethasone (0.5 mg, 1.0 mg) significantly explained variability in the various effect estimates. The only moderator related to the assessment of cortisol that explained between-study heterogeneity was whether a cortisol response was observed in both, in only one, or in none of the groups following the perception of a stressor, with stronger effects found in those studies that observed a cortisol response in just one of the two groups. This finding might be attributed to the fact that six out of the seven comparisons that observed a response only in one of the two groups, reported an increase in cortisol in the control group only. Overall, there was no evidence to suggest that the two components of cortisol secretion -total cortisol production and change in cortisol over time -which appear to capture different aspects of HPA axis activity (Khoury et al., 2015), are affected differently by child maltreatment. A blunting in cortisol secretion following the perception of a stressor was found in indices reflecting both total and change in cortisol over time. However, as expected, studies generally differed widely with respect to the index or indices reported (e.g., much more studies reported on peak cortisol compared to AUC i cortisol), making comparisons between the different outcome indices difficult. Along with this, depending on the studies included, different moderators emerged, which in turn complicated the interpretation of the corresponding findings.

Influence of several components of methodological quality
Finally, we did not observe a consistent association between study quality as assessed by our self-developed quality assessment tool (which was based on existing recommendations and guidelines) and reported effect sizes. Although, at least for some outcome indices, the quality of the individual studies seemed to explain some of the between-study heterogeneity. For instance, a tendency towards smaller negative differences in cortisol secretion following the perception of a stressor was observed in studies with a higher study quality and thus a lower risk of bias. Importantly, in studies that (re)grouped participants for the purpose of this meta-analysis -although the majority of these studies used an established instrument (an established source of information) to assess child maltreatment experiences -by (re)grouping their study participants, several other aspects of methodological quality could no longer be assessed. Therefore, our conservative approach of not awarding any points in a given case may have induced biases. In addition, two studies were able to achieve the same average total score, but scored in completely different quality items. Since we were not able to value the importance of the various quality items, our ability to find associations with the different effect estimates might thus have been limited. Nevertheless, it should be mentioned that several qualityrelated aspects were generally well implemented in the majority of the studies (holding true for all HPA axis activity measures), whereas others were insufficiently addressed and controlled in most of the included publications. With respect to the selection of participants, for instance, the two comparison groups were generally matched in terms of sex and age, and most studies used an established measurement to assess child maltreatment. While exposure and the control groups were generally balanced in terms of psychopathology in those studies that reported on psychopathology, a substantial number of studies did not evaluate the presence of mental disorders at all, and therefore information about matching in this respect was unavailable for several studies. This is particularly surprising as, on the one hand, child maltreatment experiences are much more common among individuals with a mental disorder and, on the other hand, psychopathology itself has been repeatedly associated with changes in various HPA axis activity measures, with sometimes opposing findings for different mental disorders (e.g., Adam et al., 2017;Chida and Steptoe, 2009;Leistner and Menke, 2018;Stalder et al., 2017;Zorn et al., 2017). Accordingly, in these studies, the presence of a possible confounding effect of psychopathology cannot be ruled out. Interestingly, a recent study examining the effects of comorbidity and adversity on HPA axis functioning in depressed patients was able to show that rather than the diagnostic groups per se, the timing of adversity appears to influence HPA axis functioning in adulthood, putting the importance of psychopathology and especially the role of diagnostic groups somewhat into perspective. In this study, an attenuated HPA axis stress response was only found in those patients with comorbid PTSD from childhood. By contrast, no alterations were seen in those with depression only, or those with depression with comorbid PTSD resulting from adult trauma (Mayer et al., 2020). Thus, these results, consistent with the findings of the present meta-analysis and those of the meta-analysis by Bunea et al. (2017), suggest that adverse experiences during childhood indeed appear to be of particular importance in influencing the HPA axis stress response in adulthood. Related to participant selection, and as already indicated in the context of the assessment of child maltreatment, several studies inadequately ensured that none of the control participants were exposed to any type of child maltreatment, and only a handful of studies used two different sources to evaluate the presence of child maltreatment. Regarding appropriate assessment of cortisol in the context of the corresponding HPA axis activity measure, most studies reported on clear sampling (prohibitions) and collection instructions (i.e., how they collected, stored, and analyzed samples), provided details on their test protocol, and generally reported on missing and/or outlier data. By contrast, very few studies provided information on whether sampling was rescheduled if participants were sick, on batch analysis or ensured that participants were not under any current stress at the time of testing (sampling), factors known to influence cortisol results (e.g., Adam and Kumari, 2009). Furthermore, only a few studies that assessed cortisol in the context of daily activity (DC and CAR) ensured that exposure and control groups did not differ in the time of awakening as well as sampling time adherence. This is particularly surprising as the validity of the CAR measurement critically depends on the sampling schedule, with inaccurate sampling strongly biasing CAR (including morning/waking) estimates (Stalder et al., 2016). Moreover, a study investigating the variability and reliability of DC indicated that a 10-day sampling procedure would be required to obtain stable estimates of between-person differences in DSL cortisol (Segerstrom et al., 2014). Similarly, up to six assessment days might be necessary to obtain reliable CAR trait measures (Hellhammer et al., 2007). However, only a few studies included in this systematic review assessed cortisol over more than two days. Lastly, a substantial number of studies failed to adequately assess and thus control for important confounding variables. As mentioned in the context of sex differences, the matching of participants in terms of oral contraceptive use and menstrual cycle timing was insufficient in many studies. Other confounding variables which were generally poorly assessed and controlled for were: smoking, medication intake with known CNS effects, and clear statements about whether pregnant women and participants working night shifts were excluded, factors also known to account for variability in cortisol results (e.g., Kudielka et al., 2012;Locatelli et al., 2009;Stalder et al., 2016;Zänkert et al., 2019). Finally, only a few of the included studies took measures to ensure that participants from the control group were not subjected to any type of ELA other than child maltreatment. According to the National Scientific Council on the Developing Child (e.g., Shonkoff et al., 2012), three types of stressors can be differentiated according to their potential to cause enduring physiological disruptions. These include: positive, tolerable and toxic stressors. "Tolerable" stress experiences include those that present a great magnitude of adversity or threat, such as the death of a family member, or a serious illness or injury. However, when buffered by a supportive adult, the risk that corresponding circumstances will cause long-term consequences for health are suggested to be greatly reduced. In contrast, toxic stress experiences include those that are experienced in the absence of a supportive adult relationship and may cause strong, frequent, or prolonged activation of the body's stress response system. Since child maltreatment experiences typically occur in the absence of the buffering protection of stable adult support, these experiences are suggested to be particularly toxic and thus show a great potential to induce long-lasting biological changes. In line with this, child maltreatment experiences do show high associations with later Note. DC = diurnal cortisol; DSL = diurnal slope cortisol; CAR = cortisol awakening response; DST = dexamethasone suppression test; Dex-CRH test = combined dexamethasone-corticotropin releasing hormone test; HCC = hair cortisol concentrations; 24-hour UFC = 24-hour urinary free cortisol; CM = child maltreatment; AUC g = area under the curve with respect to ground; AUC i = area under the curve with respect to increase; • Positive effect = overall increased cortisol levels in child maltreatment group compared to their respective control group. * Negative effect = overall reduced cortisol levels in child maltreatment group compared to their respective control group. -No significant moderators or/and subgroup comparisons were identified. disease risk (e.g., Dube et al., 2001). Nevertheless, ELA and especially the experience of multiple adverse childhood experiences have been related to various health conditions later in life as well (e.g., Clark et al., 2010;Danese et al., 2009;Hughes et al., 2017). In addition, the metaanalysis conducted by Bunea et al. (2017), although slightly smaller effects were observed compared to studies focusing on child maltreatment only, showed that ELA was similarly associated with a blunted cortisol stress response. Thus, considering that experiencing adversity during childhood is rather the rule than the exception (Merrick et al., 2019), for the vast majority of studies, it cannot be ruled out that control participants have experienced other forms of ELA, which in turn may have influenced the results of this systematic review. Considering these methodological shortcomings, including the limitations associated with this series of meta-analyses, our ability to establish a consistent link between the experience of child maltreatment and HPA axis functioning may indeed be compromised. Finally, it should be noted that we decided to focus on peer-reviewed papers only to allow for a transparent and replicable search of the literature. Appropriate statistical methods (e.g., funnel plots and Egger's regression tests) were applied to evaluate and control for publication bias. Nevertheless, the inclusion of grey literature could have counteracted the problem of including data that are not fully representative of the evidence as a whole.

Interpretation of the findings in the context of developmental programming of the HPA axis
Nevertheless, taking the above constraints into account, we found evidence of an altered cortisol stress response in individuals exposed to child maltreatment as compared to control participants. The null findings with regard to the other HPA axis activity measures (keeping in mind the various methodological shortcomings as one potential explanation) could also indicate that alterations causing aberrant cortisol secretion are less apparent at the level of the pituitary or adrenal glands, but are rather expressed in brain regions involved in stress processing (e. g., limbic brain areas including the hippocampus, the amygdala, and the prefrontal cortex) and in the connectivity of these brain regions to the hypothalamus (Herman et al., 2003). In line with this idea, a review summarizing findings on the neuronal control of chronic stress adaptation, suggests that changes in HPA axis regulation following severe stress exposure might be traced back to long-term changes in the limbic input to neurons controlling stress responsiveness (Herman, 2013). Additionally, it is well known that limbic brain areas including the hippocampus, the amygdala and the prefrontal cortex widely express GRs, and therefore it is not surprising that acute and chronic stress appear to significantly affect synaptic physiology and connectivity in these regions (e.g., Myers et al., 2014). In contrast, structures involved primarily in the regulation of cortisol release in the context of circadian signals or following awakening (i.e., the suprachiasmatic nucleus, Spiga et al., 2014) might be less affected. Thus, corresponding alterations in HPA axis activity measures that are not primarily activated by stress perception (e.g., DC, CAR, HCC, UFC) might only become apparent when cortisol is measured during periods of high life stress -when stress processing actually becomes relevant for these activity measures as well. Interestingly, findings of a longitudinal study evaluating stress exposure across the lifespan on HPA axis functioning at age 37 provide some support for this assumption (Young et al., 2021). In this study, in accordance with the theory of developmental programming of biological systems -the biological embedding model (e.g., Heim et al., 2019;Heindel et al., 2015) -individuals with adversities experienced during early or middle childhood showed a blunted cortisol response to a modified version of the TSST. This blunting of cortisol secretion following the perception of this stressor was independent of whether or not participants were experiencing current life stress. Additionally, similar cortisol stress response patterns were seen in participants with high and low cumulative stress, if these cumulative stress exposures did not involve early life stress (Young et al., 2021). These findings thus support the notion that when attempting to explain differences in the cortisol stress reactivity, it is not so much stress in general, but early childhood stress in particular that seems to be critical (supporting the biological embedding model). In contrast, flatter DSL profiles were only observed in those individuals who experienced ELA and were currently subjected to high levels of stress (Young et al., 2019). While these DSL results remain consistent with the biological embedding model, they also provide support for the assumption that alterations causing aberrant cortisol secretion likely relate to circuits of the brain involved in the processing of stress, and accordingly, meaningful differences in HPA axis activity measures that in terms of their activation do not per se require the experience of stress, are only to be expected when stress processing actually is involved (i.e., under high current life stress; see also Kuhlman et al., 2016).

Conclusion and future directions
Taking into account all the findings and difficulties in the context of this series of meta-analyses, including: the unbalanced recruitment of study participants in the primary studies (e.g., predominantly young, female adults who belonged to an ethnic majority group and in whom child maltreatment experiences were assessed mainly through selfreports), the considerable number of studies that did not report on psychopathology, the limitations related to the assessment of child maltreatment (i.e., the use of various definitions and our inability to investigate the role of age at onset and the chronicity of the maltreatment experiences), and the various constraints related to the assessment of the various HPA axis activity measures (i.e., the inadequate control of state factors and confounding variables and limitations related to the reliability of the cortisol outcome measures), it becomes apparent that, on the one hand, a comprehensive conclusion about the functioning of the HPA axis in individuals who have been exposed to child maltreatment cannot be drawn at this time point, and on the other hand, our ability to find important moderators or relevant subgroup differences might have been limited. Nevertheless, child maltreatment appears to be associated with a blunted rather than an exaggerated activity when considering cortisol secretion following the perception of a stressor (while a tendency was also shown for HCC), and several moderators including the proportion of females in the sample, psychopathology, and the study quality (to name a few) have been identified to account for some of the observed between-study heterogeneity. Considering that cortisol, when secreted in excess (e.g., during prolonged stress exposure like it is the case for child maltreatment), can have a variety of deleterious effects (Feelders et al., 2012), particularly in the brain (Sapolsky, 1999;Sapolsky et al., 2000), a corresponding downregulation may indeed serve an adaptive function protecting the body from these various adverse effects, an idea that has been subsumed under the socalled "attenuation hypothesis" (e.g., Kaess et al., 2018;Trickett et al., 2010). While probably adaptive in the first place, there is growing evidence linking not only an exaggerated but increasingly also a blunted cortisol stress response to various adverse behavioral and health outcomes (Carroll et al., 2017;de Rooij, 2013;Turner et al., 2020). Cortisol has various important anti-inflammatory and immune suppressive functions (Sapolsky et al., 2000) and several studies have shown that an attenuated cortisol stress reactivity (irrespective of cause) is associated with a stronger proinflammatory immune response (Buske-Kirschbaum et al., 2010;Janusek et al., 2017;Schwaiger et al., 2016). Interestingly, a growing number of studies suggest that inflammatory processes may precede the onset of, or be involved in the development of various types of mental disorders (Kivimäki et al., 2014;Melhem et al., 2017;Slavich et al., 2020). Accordingly, future studies should not only pay more attention to the potential moderating influence of current life stress, especially if interested in HPA axis activity measures that are not primarily regulated by stress perception alone, but, if interested in the consequences arising from an altered HPA axis activity, studies specifically should examine how an alerted cortisol secretion might be related to dysfunctions in other biological systems. In addition, by investigating the potentially moderating role of genes and epigenetic changes, knowledge of which individuals are most susceptible to the long-term consequences of child maltreatment (or ELA in general) may be further enhanced (e.g., Heim et al., 2019). However, reliable and reproducible results are only to be obtained if future studies more consistently rely on measurement tools that capture the assessment of various types of ELA, their onset, their chronicity and, in particular, these tools should permit the assessment of the perceived severity of the corresponding experiences. Related to a growing number of studies showing different neurobiological consequences of deprivation and threat experiences (e.g., Colich et al., 2020), a more fine-grained analysis of child maltreatment or adversity in general could further improve our understanding of the functioning of the HPA axis in individuals exposed to corresponding experiences. However, in order to obtain reliable and valid HPA axis activity measures, future studies must focus more consistently on cortisol assessment guidelines, which provide important information regarding various state and confounding variables, as well as information on the reliability of the corresponding outcome activity measures (e.g., Adam and Kumari, 2009;Allen et al., 2017;Foley and Kirschbaum, 2010;Kudielka et al., 2012;Stalder et al., 2016;Stalder and Kirschbaum, 2012;Zänkert et al., 2019).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.