Efficacy on sleep parameters and tolerability of melatonin in individuals with sleep or mental disorders: A systematic review and meta-analysis

We conducted the first systematic review and series of meta-analyses to assess the efficacy and tolerability of melatonin in children/adolescents or adults with sleep or mental health disorders, using the same set of criteria across disorders and ages. Based on a pre-registered protocol (PROPSPERO: CRD42021289827), we searched a broad range of electronic databases up to 02.02.2021 for randomized control trials (RCTs) of melatonin. We assessed study quality using the Risk of Bias tool, v2. We included a total of 34 RCTs (21 in children/adolescents: N = 984; 13 in adults: N = 1014). We found evidence that melatonin significantly improved sleep onset latency and total sleep time, but not sleep awaking, in children and adolescents with a variety of neurodevelopmental disorders, and sleep onset latency (measured by diary) as well as total sleep time (measured with polysomnography) in adults with delayed sleep phase disorder. No evidence of significant differences between melatonin and placebo was found in terms of tolerability. We discuss clinical and research implications of our findings.


Melatonin-background
Melatonin is a hormone produced mainly by the pineal gland, located just above the thalamus. Its main function is to aid in the regulation of the sleep-wake cycle (Kennaway and Wright, 2002), but other properties, including hypnotic, antioxidant, anti-inflammatory, and free radical scavenging functions, have also been identified (Bruni et al., 2015). The secretion of this hormone is mostly dependent on the level of luminance throughout the day. When light hits the retina, the production of melatonin is blocked via a chain of communication, starting from photoreceptor cells in the retina though the suprachiasmatic nucleus (SCN), and ending in the pineal gland. When there is darkness, the SCN sends excitatory signals to the pineal gland to initiate the secretion of melatonin (Reiter, 1993). During this time, body temperature and respiration are lowered to prepare for sleep (Reiter, 2003). Overall, a growing body of evidence suggests a link between abnormal sleep-wake cycles and abnormal melatonin secretion (Gringras et al., 2017). Previous research has shown that children and adolescents with sleep-wake disorders tend to have low levels of melatonin secretion around a normal bedtime (Braam et al., 2008a(Braam et al., , 2008b. Furthermore, melatonin production reduces throughout adulthood (Poza et al., 2020), which could increase the likelihood of developing a sleep disorder.

Current clinical recommendations
The use of melatonin in children and adults with sleep and mental health conditions is supported by a number of guidelines. Based on a narrative overview of the literature, an international consensus established a series of recommendations on the use of melatonin in children/ adolescents (Bruni et al., 2015). This consensus conference suggested that, when considered as a sleep inductor, 1-3 mg of melatonin should be administered 30 min before bedtime, and slowly increased up to 5 mg/nocte if there is no effect. As a chronobiotic, 0.2-0.5 mg should be administered 2-3 h before the dim light melatonin onset (DLMO) -a marker of circadian phase -and slowly increased if there is no effect. More recently, Geoffroy et al. (2019) provided recommendations on the use of melatonin for adults. They highlighted the need of specific recommendations for each disorder. For instance, they suggested that both immediate-and prolonged-release melatonin improve insomnia symptoms in adults with depression, whilst they suggest phase shifting benefits using a low dose 0.1 mg melatonin for individuals with seasonal affective disorders.

Evidence base on the efficacy and/or tolerability of melatonin
The above mentioned guidelines are grounded on an increasing body of evidence on the effects of melatonin in humans. Indeed, the efficacy and/or tolerability of melatonin to improve sleep parameters in individuals with sleep disorders or mental health conditions have been assessed in a number of randomized controlled trials (RCTs). To date, to our knowledge, twelve meta-analyses have pooled data from available RCTs. The key results of these meta-analyses are summarised in Table 1. Of these, four were on children, three on adults, and five on both children and adults. Among the meta-analyses exclusively in children, one was on insomnia, one on autism, and two on neurodevelopmental disorders in general. All the three meta-analyses in adults were on sleep disorders. Among the meta-analyses in both children and adults, one was on insomnia, one on delayed sleep phase disorder (DSPD), one on intellectual disabilities (ID), and two on sleep disorders in general. Each of these previous meta-analyses addressed children or adults and individuals sleep or mental disorders and results across meta-analyses are not consistent. As such, evidence so far on melatonin efficacy/tolerability is fragmented, with several meta-analyses (each with their own inclusion criteria) focused on specific populations and/or specific disorders. Therefore, to better characterize the effects of melatonin, there is a need to conduct an updated systematic review and series of metaanalysis of RCTs using the same criteria for inclusion of studies in children, adolescents or adults and across sleep and/or mental disorders. The present systematic review and meta-analysis aimed to fill this gap. The data extracted from relevant RCTs were used to analyse the following: (a) whether melatonin is more efficacious than placebo in children, adolescents, or adults with sleep or mental disorders; (b) how melatonin compares to placebo in terms of tolerability.

Search
Based on a pre-registered protocol in PROSPERO (CRD42021289827), we searched PubMed, OVID databases (PsycINFO, Embase + Embase Classic, and Medline), and Web of Knowledge databases [Web of Science (science citation index expanded), Biological

Inclusion criteria
We searched for RCTs, regardless of the level of blinding, reporting the efficacy and/or tolerability of melatonin (immediate or prolonged release formulations) on any available subjective or objective sleep parameters, in children (pre-schoolers > 3 years or school-aged, as the diagnosis of the mental conditions listed below would be challenging in younger children/toddlers), adolescents, or adults with any sleep disorder (based on the International Classification of Diseases (ICD), Diagnostic and Statistical Manual of Mental Disorders (DSM) or International Classifications of Sleep Disorders criteria or defined based on core above cut-off in sleep questionnaires) or mental health condition (listed in Appendix B) diagnosed based on formal criteria (ICD or DSM, any version). RCTs with participants with a primary diagnosis of sleep or mental disorders and secondary comorbid mental disorders were eligible. Studies including 100% of participants with a comorbid disorder were included but considered for a possible subgroup analysis. We included RCTs where the only intervention was represented by melatonin. Other RCTs that used melatonin agonists or melatonin in conjunction with another medicine/treatment were excluded.

Outcomes
We included as an outcome any subjective (i.e., based on questionnaire) or objective (i.e., based on actigraphy and/or polysomnography) variable reported in at least two RCTs. Tolerability (i.e., dropout due to side effects) of melatonin compared to placebo was also extracted. DLMO, which reports the level of melatonin plasma collected on a saliva swab, was also included as an eligible outcome to assess if melatonin plasma onset appeared earlier in the night after treatment with melatonin compared to placebo, across disorders.

Screening process and data extraction
Retrieved references were independently and blindly screened and coded for eligibility by two study authors. Any disagreement was resolved by the senior author. Two authors independently extracted data from the retained studies. All relevant data from each study were entered into an Excel sheet and categorized according to their disorder. The corresponding authors were contacted if there were any issues in retrieving potentially useful data. One RCT was only reported in the qualitative synthesis, due to missing data and the author not responding in time (Mohammadi et al., 2012).

Risk of bias
Study quality was assessed independently by a pair of researchers using the Cochrane Risk of Bias (RoB) 2 (Higgins et al., 2011). The domains that were assessed were: a) Risk of bias arising from the randomization process; b) Risk of bias due to deviations from the intended interventions; c) Missing outcome data; d) Risk of bias in the measurement of the outcome; and e) Risk of bias in the selection of the reported result. Each domain contained multiple questions rated as low, some concerns, or high concerns. These judgments were combined to generate an overall risk of bias for each RCT.

Data analysis
First, a qualitative, narrative synthesis was conducted to provide a description of the study characteristics within each disorder.
Outcome variables were combined by conducting a pairwise metaanalysis, per disorder. For variables measured with the same metric, the mean difference (MD) was calculated, with a 95% confidence interval (CI), estimated by a random effects model. Otherwise, standardized mean difference (SMD) with a 95% CI were calculated. Tolerability was assessed by pooling odds ratios (ORs). Log odds ratios were calculated for all disorders and ages with the following variables: a) Melatonin dropouts due to adverse events (n); b) Melatonin total sample size (n); c) Placebo dropouts due to adverse events (n); and d) Placebo total sample size (n). Once the log odds ratio were calculated, the data were plugged into a random effects model and a meta-analysis was conducted. We considered data from intention to-treat analyses.
We used the Q (indicating the presence, but not the degree, of heterogeneity) and the I 2 index (percentage of variance due to true heterogeneity) to estimate indices of heterogeneity of effect size (Higgins and Thompson, 2002). Publication bias was assessed visually by funnel plots (Sterne and Egger, 2001). Funnel plots were only used for meta-analyses that contained ten or more studies, as a study number lower than ten would not be able to distinguish real from chance asymmetry due to low power (Higgins et al., 2019). Publication bias was assessed quantitatively using the rank correlation test for funnel plot asymmetry and the regression test for funnel plot asymmetry.
All analyses were conducted in 4.1.0 [R-Core-Team], with the packages metafor, meta, and dmetar.

Additional analyses
Per protocol, the following sensitivity analyses were planned: a) Adequacy of blindnessexcluding all non-blinded trials, and b): Excluding RCTs at high risk of bias. We also planned to conduct a metaregression to test the impact of the dose of melatonin. We planned to conduct the main analyses separately by age (children/adolescents and adults), and then to also pool data from RCTs across ages.

Search and screening results
From an initial pool of a total of 6000 references to screen, we eventually included 34 RCTs. Appendix C reports a summary of references excluded after reading the full text, with reason for exclusion. The screening process is summarized in Fig. 1. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses recommendations were followed. (PRISMA; Page et al., 2021).

Study characteristics in children and adolescents
As shown in Table 2, a total of 21 studies between the years 1998 and 2017 were included in the qualitative synthesis. Twelve studies were parallel RCTs and the other nine were randomized cross-over trials, and of these, all except one had a washout period of a week. All were doubleblinded. In the study without washout, it was considered that the risk of carry-over effect was mitigated by the short half-life of the melatonin supplement used in the study (Wright et al., 2011). The total number of children and adolescents (as defined by the study authors) involved in these studies was N = 974 (633 males, 329 females, and 20 with sex not stated).
Each study presented data on at least one outcome variable. Data from Mohammadi et al. (2012) was not included in the quantitative synthesis due a missing standard deviation (we contacted the author but they were not able to provide it).

Study characteristics in adults
As reported in Table 3, a total of 13 studies between the years 1989 and 2018 were included. Five studies were parallel RCTs and the other eight were randomized crossover trials, with all of them except one having a washout period of at least 5 days. The total number of adults included in these RCTs was N = 1012 (657 males, 343 females, and 12 with sex not stated).

Doses and duration of melatonin
As shown in Appendix D (children/adolescents) and Appendix E (adults), there was no sign of consistency among studies when it came to selecting a given dose of melatonin. Some studies started low and increased the dose over the period of the trial while some studies had a fixed dose for the whole trial. Even among fixed doses, the amount was different. Similarly, administration time was found to be reported in multiple formats which were difficult to quantify in the same way. The general trend across studies was for melatonin to be administered around an hour or two before bedtime. The most frequent duration of treatment across all disorders was 4 weeks.

Tolerability
Appendix F (children/adolescents) and Appendix G (adults) report information regarding the tolerability of melatonin for all studies included in this review. Dropout due to adverse events data were extracted from all 34 RCTs. Only one event came from the melatonin group and two came from the placebo group in RCTs in children and adolescents. In RCTs in adults, three dropouts came from the melatonin group and two from the placebo group. The most frequent side effects included headaches and migraines. Furthermore, the total mild side effects (as defined by the studies) that were reported in each study were roughly equal between conditions. The majority of the RCTs reported no side effects in either arm.

Study quality
Appendix H reports information regarding the RoB 2.0 overall judgements for each included RCT. Most of the children and adolescent RCTs were rated as some concern, while six RCTs were judged as low and none at high risk. All adult RCTs were rated as some concern, and this was due to at least one of the domains showing some concern to bias. As there were no high-risk studies, no sensitivity analyses removing these studies were applied. For more information on the individual scoring of each RCT, see Appendix I.   Table 4 summarizes the main analyses of each outcome variable within and across disorders for which meta-analyses were performed. The MD effect size was considered for sleep onset latency (SOL) on diary, SOL actigraphy, total sleep time (TST) diary, TST actigraphy, and DLMO. The SMD was considered for nocturnal awakenings (NA).
3.7.1.1. Attention deficit/hyperactivity disorder with insomnia (i.e., 100% of the sample with ADHD + insomnia). The total number of studies included in this analysis was k = 2 (N = 124) and only one outcome variable qualified for analysis. The MD effect size was calculated for SOL actigraphy, and the results indicated a significant difference in favor of melatonin over placebo. Furthermore, the test for heterogeneity showed no significant heterogeneity. Overall, this result indicates that melatonin reduces SOL by on average 17.73 min in children and adolescents with ADHD and comorbid insomnia. See Appendix J for the forest plot.

Autism spectrum disorder (ASD).
The total number of studies included in these analyses was k = 4. All outcome variable except DLMO and TST actigraphy qualified for analysis. Melatonin was related to significantly lower scores in SOL diary and SOL actigraphy compared to placebo. Furthermore, melatonin was related to significantly increased scores in TST diary in comparison to placebo. However, there was no significant difference between melatonin and placebo for NA scores. There was evidence of significant heterogeneity only for NA. See Appendix K for a detailed description and forest plot on each outcome variable analyzed for this disorder.
3.7.1.3. ASD and comorbid insomnia (i.e., 100% of the sample with ASD and insomnia). The total number of studies included in these analyses was k = 6. All outcome variable except DLMO qualified for analysis. Melatonin was related to significantly lower scores in SOL diary and SOL actigraphy compared to placebo. Furthermore, melatonin was related to significantly increased scores in TST diary and TST actigraphy in comparison to placebo. However, there was no significant difference between melatonin and placebo for NA scores. There was evidence of Assessment System, 18 -Sleep Disturbance Scale for Children, 19 -Social Functioning Scale for the Mentally Retarded. * A participant was checked with at least one of the symptom measures mentioned.  Note. SOLsleep onset latency, TSTtotal sleep time, DLMOdim light melatonin onset, NA: nocturnal awakenings; RErandom effects, FEfixed effects, MDmean difference, SMD -standardized mean difference. significant heterogeneity only for SOL actigraphy and NA. See Appendix L for a detailed description and forest plot on each outcome variable analyzed for this disorder.

Delayed sleep phase disorder.
The total number of studies included in these analyses was k = 2. All outcome variables except DLMO and NA qualified for analysis. Melatonin was related to significantly lower scores in SOL diary compared to placebo. However, there was no significant difference between melatonin and placebo for SOL actigraphy, TST diary, and TST actigraphy scores. There was evidence of significant heterogeneity only for SOL actigraphy. See Appendix M for a detailed description and forest plot on each outcome variable analyzed for this disorder.
3.7.1.5. Insomnia. The total number of studies included in these analyses was k = 5. The outcome variables that qualified for analysis were SOL diary, TST diary, and DLMO. Melatonin was related to significantly lower scores in SOL diary compared to placebo. Furthermore, melatonin was related to significantly increased scores in TST diary in comparison to placebo. However, there was no significant difference between melatonin and placebo for DLMO scores. There was no evidence of heterogeneity. See Appendix N for a detailed description and forest plot on each outcome variable analyzed for this disorder.

Neurodevelopmental disorders with comorbid insomnia.
The total number of studies included in these analyses was k = 6. All outcome variables qualified for analysis apart from DLMO. Melatonin was related to significantly lower scores in SOL diary and SOL actigraphy compared to placebo. However, there was no significant difference between melatonin and placebo for both TST variables and DLMO scores and for these outcomes there was evidence of significant heterogeneity. See Appendix O for a detailed description and forest plot on each outcome variable analyzed for this disorder.
3.7.1.7. ID with comorbid insomnia. The total number of studies included in these analyses was k = 2. The outcome variables that qualified for analysis were SOL diary, TST diary, and NA. Melatonin was related to significantly lower scores in SOL diary compared to placebo. However, there was no significant difference between melatonin and placebo in TST diary and NA scores, for which there was also evidence of heterogeneity. See Appendix P for a detailed description and forest plot on each outcome variable analyzed for this disorder.

ID & ID with comorbid insomnia.
The total number of studies included in these analyses was k = 3. The outcome variables that qualified for analysis were SOL diary, TST diary, and NA. Melatonin was related to significantly lower scores in SOL diary compared to placebo. However, there was no significant difference in TST diary and NA scores (for which there was also evidence of significant heterogeneity) in the melatonin group compared to placebo. See Appendix Q for a detailed description and forest plot on each outcome variable analyzed for this disorder.
3.7.1.9. Neurodevelopmental disorder (not better specified). The total number of studies included in these analyses was k = 2. The outcome variables that qualified for analysis were SOL diary and TST diary. Melatonin was related to significantly lower scores in SOL diary compared to placebo. However, there was no significant difference between melatonin and placebo for TST diary scores. No evidence of significant heterogeneity was found for any of the two outcomes. See Appendix R for a detailed description and forest plot on each outcome variable analyzed for this disorder.
3.7.1.10. All neurodevelopmental disorders. The total number of studies included in these analyses was k = 13. All outcome variable qualified for analysis. Melatonin was related to significantly lower scores in SOL diary, SOL actigraphy, and NA compared to placebo. Furthermore, melatonin was related to significantly increased scores in TST diary and TST actigraphy in comparison to placebo. However, there was no significant difference between melatonin and placebo for DLMO scores. Nevertheless, there was evidence of significant heterogeneity for SOL actigraphy, TST diary, TST actigraphy, NA and DLMO. See Appendix S for a detailed description and forest plot on each outcome variable analyzed for this disorder.
3.7.1.11. All sleep disorders. The total number of studies included in these analyses was k = 7. All outcome variables qualified for analysis apart from NA and DLMO. Melatonin was related to significantly lower scores in SOL diary and SOL actigraphy, compared to placebo. Furthermore, melatonin was related to significantly increased scores of TST diary compared to placebo. However, there was no significant difference between melatonin and placebo in the outcome variable TST actigraphy. There was evidence of significant heterogeneity for SOL Actigraphy and TST actigraphy. See Appendix T for a detailed description and forest plot on each outcome variable analyzed for this disorder.

All sleep disorders & neurodevelopmental disorders with insomnia.
The total number of studies included in these analyses was k = 13. All outcome variables qualified for analysis. Each outcome variable was analyzed with a random effects model. Melatonin was related to significantly lower scores in SOL diary and SOL actigraphy, compared to placebo. Furthermore, melatonin was associated with significantly increased scores of TST diary compared to placebo. However, there was no significant difference between melatonin and placebo in the outcome variables TST actigraphy, NA, and DLMO. Furthermore, there was evidence of heterogeneity for all the outcomes in this analysis. See Appendix U for a detailed description and forest plot on each outcome variable analyzed for this disorder.
3.7.1.13. All disorders combined. The total number of RCTs included in these analyses was k = 20. All outcome variable qualified for analysis except DLMO. Melatonin was related to significantly lower scores in SOL diary and SOL actigraphy compared to placebo. Furthermore, melatonin was related to significantly increased scores in TST diary and TST actigraphy in comparison to placebo. However, there was no significant difference between melatonin and placebo for NA scores. Furthermore, there was evidence of significant heterogeneity for all outcomes. See Appendix V for a detailed description and forest plot on each outcome variable analyzed for this disorder. Table 5 reports a summary of the main analyses of each outcome variable within and across disorders in adults for which meta-analyses were performed.

Adults
The MD effect size was considered for SOL diary, SOL actigraphy, TST diary, TST actigraphy, and DLMO. On a few occasions, SMD was calculated for SOL diary and TST diary as one of the RCTs included in the analysis reported mean change rather than endpoint data. The SMD was considered for NA.

Delayed sleep phase disorder.
The total number of studies included in this group was k = 5. All outcome variables except SOL diary, NA, and DLMO qualified for this analysis. Melatonin was related to significantly higher scores in TST polysomnography compared to placebo. However, there was no significant difference between melatonin and placebo in the other outcome variables TST diary. There was evidence of significant heterogeneity for SOL Polysomnography. See Appendix W for a detailed description and forest plot on each outcome variable analyzed for this disorder.

Insomnia.
The total number of studies included in this group was k = 5. Three outcome variables qualified for analysis. The MD effect size was calculated for SOL polysomnography, SMD for SOL and TST diary. The results indicated no significant difference between melatonin and placebo for all outcome variables. There was evidence of significant heterogeneity for TST diary. See Appendix X for a detailed description and forest plot on each outcome variable analyzed for this disorder.

All sleep disorders.
The total number of studies included in this group was k = 11. Only SOL diary and TST diary qualified for this analysis. Melatonin was related to significantly lower scores in SOL diary compared to placebo. However, there was no significant difference between melatonin and placebo in the outcome variable TST diary. Importantly, there was evidence of significant heterogeneity for all the outcomes, expect SOL diary. See Appendix Y for a detailed description and forest plot on each outcome variable analyzed for this disorder.

All disorders combined.
The total number of studies included in this group was k = 12. SOL diary, SOL polysomnography, TST diary, TST polysomnography and NA qualified for this analysis. Melatonin was related to significantly lower scores in SOL diary and SOL polysomnography. Additionally, melatonin was related to higher TST polysomnography compared to placebo. Importantly, there was evidence of significant heterogeneity for all the outcomes, except SOL diary. See Appendix Z for a detailed description and forest plot on each outcome variable analyzed for this disorder.

Additional analyses.
Sensitivity analyses were carried out on SOL and TST diary analyses that included Wade et al., 2011. This was the only study that reported mean change and contained a large number of participants which could skew the results. After the removal, the all sleep disorders group yielded non-significant results. This is not in line with SOL diary result prior to removal. However, the non-significant TST result remained in the same direction. In the all disorderscombined, both SOL and TST diary outcomes were also in the same direction. SOL diary yielded significant results whilst TST diary did not. See Appendix AA for the sensitivity analyses, and Appendix AB for a detailed description and forest plot for each of these analyses.

All Ages
See Table 6 for a summary of the main analyses of each outcome variable within and across disorders which involves all ages. The MD effect size was considered for SOL diary, SOL actigraphy, TST diary, TST actigraphy, and DLMO. On a few occasions, SMD was calculated for SOL diary and TST diary as one of the RCTs included in the analysis reported mean change rather than endpoint data. The SMD was considered for NA.

Delayed Sleep Phase Disorder.
The total number of studies included in this group was k = 7. SOL diary, SOL actigraphy, TST diary, and TST actigraphy qualified for this analysis. Melatonin was related to significantly lower scores in SOL diary and actigraphy. Furthermore, it was related to significantly higher scores in TST actigraphy. However, TST diary was non-significant. Heterogeneity was significant for SOL actigraphy and TST diary. See Appendix AC for a detailed description.

Insomnia.
The total number of studies included in this group was k = 9. Two outcome variables qualified for analysis. The SMD was calculated for SOL and TST diary. The results indicated that melatonin was related to significantly lower scores in SOL diary compared to placebo, whilst no significant difference between melatonin and placebo was shown for TST diary. For both outcomes, a significant heterogeneity was found. See Appendix AD for a detailed description and forest plot on each outcome variable analyzed for this disorder.

All sleep disorders.
The total number of studies included in this group was k = 18. Five variables qualified for this analysis. Melatonin was related to significantly lower scores in SOL diary compared to placebo. However, there was no significant difference between melatonin and placebo in the outcome variable TST diary. See Appendix AE for a detailed description and forest plot on each outcome variable analyzed for this disorder.

All sleep disorders and neurodevelopmental disorders with insomnia.
The total number of studies included in these analyses was k = 24. Six variables qualified for this analysis. Melatonin was related to significantly lower scores in SOL diary, SOL actigraphy, and DLMO compared to placebo. Furthermore, melatonin was related to significantly increased scores in TST diary and actigraphy in comparison to Table 5 Results of the main meta-analysis of each outcome variable within and across disorders for adults. Significant results are bolded. Note. SOLsleep onset latency, TSTtotal sleep time, MDmean difference, NAnocturnal awakenings, SMD -standardized mean difference.
placebo. However, there was no significant difference between melatonin and placebo for NA scores. See Appendix AF for a detailed description and forest plot on each outcome variable analyzed for this disorder.
3.7.3.5. All disorders combined. The total number of RCTs included in these analyses was k = 33. Five variables qualified for this analysis which were analyzed with a random effects model. Melatonin was related to significantly lower scores in SOL diary, SOL actigraphy, and NA compared to placebo. Furthermore, melatonin was related to significantly increased scores in TST diary and actigraphy in comparison to placebo. See Appendix AG for a detailed description and forest plot on each outcome variable analyzed for this disorder.

Sensitivity analyses.
Sensitivity analyses were carried out on SOL and TST diary analyses that included Wade et al., 2011. This was the only study that reported mean change and contained a large number of participants which could skew the results. After the removal, the insomnia group yielded a significant result in favor of melatonin over placebo in the SOL diary condition whilst TST diary was nonsignificant, these results follow the same direction as the results prior to removal. For the insomnia & comorbid insomnia, the all sleep disorders & neurodevelopmental disorders with insomnia, and the all disorders combined groups, both variables were significant which follow the same direction as the results prior to removal. All sleep disorders group yielded significant result for SOL diary and non-significant for TST diary. This is in line with SOL diary result prior to removal. See Appendix AH for the detailed results of the sensitivity analyses and Appendix AI for a detailed description and forest plot for each of these analyses.

Publication bias
Funnel plots were conducted on four outcome variables in the All disorders combined group for all ages as they contained more than 10 studies. These four outcome variables were SOL diary, SOL actigraphy, TST diary, and NA.
For SOL diary, when visualizing the funnel plot there are some signs of asymmetry as there is a gap in the bottom of the plot where low studies with positive or negative effect size estimates would have been. Therefore, there may be some form of publication bias when reporting this type of outcome (Appendix AJ). Two tests of publication bias were conducted to quantify this visual impression. The first was the rank correlation test for funnel plot asymmetry. This test was not significant, producing p = 0.41, τ = − 0.15, which meant that there is no publication bias. Secondly, a regression test for funnel plot asymmetry was conducted to see if there was a correlation between standard error and effect size. The results yielded significant values z = − 3.58 and p = 0.0003, which meant that there may be some evidence of publication bias. In summary, when reporting for SOL diary there is the possibility of publication bias.
For TST diary, the funnel plot seemed to show some signs of asymmetry as there is a big gap in the bottom right of the plot (Appendix AK). The rank correlation test for funnel plot asymmetry was not significant producing p = 0.88, τ = − 0.06 which means it is not possible to reject the null hypothesis that there is no association, meaning that there is no publication bias. A regression test for funnel plot asymmetry yielded non-significant values z = − 0.8 and p = 0.43 which meant that there is no evidence of publication bias. In summary, both the plot and the tests for publication bias indicated no publication bias. Therefore, the results presented on this outcome variable may be quite reliable in terms of publication bias.
For TST diary, the funnel plot seemed to show some signs of asymmetry as there are big gaps all over the graph (Appendix AL). The rank correlation test for funnel plot asymmetry was not significant, producing p = 0.13, τ = 0.25, which means it is not possible to reject the null hypothesis that there is no association, meaning was no evidence of publication bias. A regression test for funnel plot asymmetry yielded nonsignificant values z = 1.27 and p = 0.2 which meant that there is no evidence of publication bias. In summary, both the plot and the tests for Table 6 Results of the main meta-analyses across ages (i.e., combining children/adolescents+ adults). Note. SOLsleep onset latency, TSTtotal sleep time, DLMOdim light melatonin onset, MDmean difference, SMD -standardized mean difference.
publication bias indicated no publication bias. For NA, the funnel plot seemed to show some signs of asymmetry (Appendix AM). The rank correlation test for funnel plot asymmetry was not significant, producing p = 0.22, τ = − 0.31, which means it is not possible to reject the null hypothesis that there is no association, meaning there was no evidence of publication bias. A regression test for funnel plot asymmetry yielded non-significant values z = − 0.87 and p = 0.39, which meant that there was no evidence of publication bias.

Tolerability of melatonin
Out of 33 studies, only 4 RCTs reported dropouts due to adverse events in either condition. The results of the meta-analysis yielded a non-significant difference in tolerability of melatonin against placebo LOR = 0.06, 95% CI [− 1.24, 1.35], z = 0.09, p = 0.93. See Appendix AN for a forest plot depicting this analysis.
We finally note that, due to a lack of comprehensive/clear data across studies, the planned meta-regression using the dose of melatonin as a regressor was not possible.

Discussion
To our knowledge, this is the most comprehensive evidence synthesis on the efficacy (in terms of sleep parameters) of melatonin and its tolerability across a range of sleep and neurodevelopmental conditions in children/adolescents and adults. Whilst previous meta-analyses (Table 1) have assessed the efficacy of melatonin in specific disorders or age ranges, the present meta-analysis, using a common set of criteria across disorders, allows insight into the comparative efficacy of melatonin for sleep parameters across sleep/neurodevelopmental disorders.
When considering significant meta-analytic results with no evidence of significant across-studies heterogeneity, we found that, in children/ adolescents, compared to placebo, melatonin was significantly better in improving: sleep onset latency, measured with sleep diary, in autism spectrum disorder, autism spectrum disorder and comorbid insomnia, delayed sleep phase disorder, insomnia, neurodevelopmental disorders with comorbid insomnia, intellectual disability (with insomnia), and all neurodevelopmental disorders or all sleep disorders pooled together; sleep onset latency measured with actigraphy in ADHD, ASD, and neurodevelopmental disorders with comorbid insomnia; total sleep time measured with diary in ASD, ASD with comorbid insomnia, insomnia, and in all sleep disorders; and total sleep time measured with actigraphy in ASD with comorbid insomnia. In adults, melatonin was significantly better in improving sleep onset latency, measured with diary, when pooling RCTs across sleep disorders and across all disorders. It was also significantly better in improving total sleep time, measured with polysomnography, in adults with delayed sleep phase disorder. By contrast, there was no evidence that melatonin improved night awakenings or shifted the DLMO in any of the analyses that we performed. We found additional significant results in other parameters or clinical groups but the significant across-studies heterogeneity urge the confidence in these findings; due to the limited amount of RCTs included in the majority of our analyses, we could not explore further, and explain, this heterogeneity via subgroup or meta-regression analyses. In addition, melatonin appeared to be more efficacious in children across each individual disorder compared to adults. Indeed, only one meta-analysis on one sleep parameter -TST polysomnography in Delayed Sleep Phase Disorderfor adults was significant. However, it should be acknowledged that the number of RCTs in adults, in particular, and also in children was overall limited. Therefore, the availability of future additional RCTs in children and adults will allow a more fair comparison between the efficacy of melatonin in children and in adults, respectively.
In terms of tolerability, only a small number of RCTs addressed it, showing overall no significant differences between melatonin and placebo.
There was no signal of worse tolerability of melatonin compared to placebo. However, this finding should be taken cautiously given the limited number of RCTs (n = 24) that collected data on tolerability, and the non-systematic (i.e., not based on a structured tool) exploration of side effects in these trials.
Our findings have important implications for clinical practice. First, they show that, overall, there is more extensive evidence of efficacy, across a broader number of outcomes (mainly sleep onset latency, but also total sleep time) and disorders (ASD, ADHD, intellectual disability, neurodevelopmental disorders in general, insomnia, delayed sleep phase disorder) in children compared to adults, where the best evidence was for sleep onset latency for all sleep disorders and total sleep time (measured with PSG) in delayed sleep phase. Interestingly, in some countries, such as, in the UK, melatonin is approved mainly for adults, with only one specific type of prolonged release melatonin approved for children with ASD or Smith-Magenis syndrome only (https://bnf.nice. org.uk/drug/melatonin.html). Committees involved in the license of melatonin should take this evidence into account. However, it should be noticed that available evidence reflects the availability of RCTs and the type of outcomes used in these studies. Arguably, sleep onset latency measured via diaries is easier to implement than sleep onset latency measured with actigraphy or polysomnography-hence the larger amount of RCTs using sleep onset latency with diaries compared to other objective measures.
Second, whilst we have found evidence that melatonin is efficacious for individuals with sleep disorders and/or neurodevelopmental disorder, we could not find evidence that it is efficacious for those with other types of mental disorders, such as depression or anxiety, which, once again, is not equivalent to say that there is evidence that melatonin is not effective for individual with these disorders. This is not in keeping with studies pointing to anxiolytic properties of melatonin (e.g., Singla et al., 2021;Abbasivash et al., 2019) and with anecdotal evidence (we are not aware of any empirical data quantifying the extent of this practice) of clinicians prescribing melatonin to patients with mood/anxiety disorders.
Third, we could not find any evidence that melatonin improves night awakenings. Again, this is in contrast to anecdotal reports of a number of clinicians recommending a second dose of melatonin when the patient wakes up in the night. Indeed, this practice is not in line with the mechanism of action of melatonin, which is excepted to further delay the sleep cycle, rather than facilitate sleep, when taken later in the evening (Arns et al., 2021).
While our series of meta-analyses provide a quantitative overview of the evidence supporting the efficacy and, possibly, good tolerability of melatonin, they are not suitable to inform other aspects related to the prescription of melatonin in the clinical practice. First, an important question is around the most efficacious/effective dose of melatonin. There was significant heterogeneity in the dose and in the dosing strategy across studies. A few studies based the dosage on the weight of a youth (McArthur and Budden, 1998;Van der Heijden et al., 2007). Some studies included a phase of adjustment to melatonin. This entailed participant, parents, or carers observing over the course of a few days how effective the dose of melatonin is on their child. The increase would stop if the participant reached "good" sleep (Wright et al., 2011). Furthermore, many studies used multiple doses for participants and did not state the effects each dose had on each outcome variable which made it impossible to separate the data correctly. We initially planned to assess the relationship between the dose and efficacy via a meta-regression. However, due to the lack of suitable data (some studies using incremental dose increases until effective/based on weight and most of the studies using 5 mg/night) we could not run this analysis with confidence. Therefore, while, for instance, current expert-based guidance (Bruni et al., 2015) suggests a maximum of 3 mg/nocte in children and 5 mg/nocte in adolescents, some studies suggest that higher doses may be beneficial. For instance, a non-randomized study showed that increasing above 6 mg/nocte (up 10 mg/nocte) added a further benefit in 9% of the sample. Therefore, further evidence from RCTs and observational studies is needed to support recommendations on the maximum dose. Second, another clinically-relevant question is as to whether immediate-release formulations compare to prolonged-release formulations. Unfortunately, we could not address this question due to most studies using inconsistent formulation. However it is worth noting that Bruni et al. (2017), in a follow-up commentary to their 2015 guidance, concluded that "solid empirical evidence suggesting that the prolonged-release melatonin is superior to immediate-release melatonin is lacking". After 5 years, this statement appears still true. However, current evidence summarized in this meta-analysis shows that prolonged-release melatonin might improve sleep continuity. Third, other relevant questions are "how long is melatonin effective for?", "does it lose its efficacy/effectiveness over time?", and "are there long-term side effects?". We could not address these issues as included RCTs were generally short terms (a few weeks). We note that there are data from a follow-up of a short-term RCT showing the persistence of effectiveness in about 885 of treated individuals up to 3.7 years (average) (Hoebert et al., 2009). With that being said, it does seem that prolonged-release melatonin might improve sleep continuity since this is the main evidence reported from the majority of RCTs in this review.
Overall, this meta-analysis provides evidence that may inform stakeholders and policymakers, showing that melatonin is an efficacious treatment for delayed sleep onset latency. This is particularly of relevance for countries, such as the UK, where melatonin is not available over-the-counter and is not officially licensed for children (other than a specific type of prolonged-release melatonin for children with autism or Smith-Magenis syndrome). However, melatonin should always be considered within the framework of a multimodal treatment, including good sleep hygiene practices.
Therefore, in addition to providing the most comprehensive evidence synthesis from RCTs on the efficacy and tolerability of melatonin, our work also points to important lines of research for the future, namely the need to: 1) Assess to which extent melatonin is efficacious, effective and tolerated in disorders other than neurodevelopmental disorders or primary sleep disorders, as well as the underlying neurobiological underpinnings of melatonin dysfunctions in these disorders; 2) Systematically assess tolerability using structured forms to comprehensively explore adverse events during treatment with melatonin; 3) Gain insight into the most effective/best tolerated dose. A larger body of RCTs will allow the conduct of dose-response meta-analyses (e.g., Fahrat et al., 2022). Availability of individual participant data from each trial will allow us to assess more in-depth the effects of different doses. Furthermore, we suggest that future studies should be grounded on more homogeneous approaches for melatonin intake in terms of timing, given the substantial variability across currently available studies; 4) Compare head-to-head immediate-release vs. prolonged-release formulations; 5) Assess longer-term efficacy via withdrawal discontinuation trials (e. g., Wong et al., 2019) and long-term follow-up.

Data Availability
Data will be made available on request.