Interventions to improve executive functions in children and adolescents with acquired brain injury: a systematic review and multilevel meta-analysis

ABSTRACT To investigate the effectiveness of interventions aiming to improve hot and cold executive functions (EFs) in children and adolescents with acquired brain injury (ABI) and to examine whether characteristics of the intervention, participants, etiology of ABI (Traumatic-brain-injury [TBI] or non-TBI), time of assessment, or study quality moderate intervention effects. Whereas cold EFs refer to purely cognitive EFs, hot EFs refer to the affective aspects of these cognitive skills. A total of 970 participants from 23 randomized-controlled-trial studies (112 effect sizes [ES]) were included. A three-level random effects approach (studies, ES, individual participants) was used. Moderation analyses were conducted through meta-regressions. The three-level random effects model showed a better fit than the two-level model. Almost all individual studies showed non-significant ES across outcomes but in combination interventions were effective (Cohen’s d = 0.38, CI 0.16 ~ 0.61). Lower methodological quality, inclusion of participants with non-TBI, and parental participation predicted larger ES. Participants’ age, time of assessment, number of sessions, and focus on hot or cold EFs were not related to ES. We found no evidence of publication bias. Interventions are effective with small to medium ES according to conventional criteria. Intervention effects do not seem to fade away with time. Parent participation in the intervention is important to improve EFs. The efficacy of interventions seems larger when non-TBI is part of the etiology of ABI. Variation between studies is relevant for tracing the effective intervention characteristics. Most studies are conducted in adolescence, and studies in early childhood are needed.

240/100,000 children and adolescents per year (van Tol et al., 2011).TBI in youth is often caused by traffic accidents, falls, and sports injuries, whereas non-TBI include injuries that are not caused by an external physical force to the head (e.g., tumors, arteriovenous malformations, infections; Emanuelson et al., 2003).TBI is the main mechanism responsible for diffuse injuries (Emanuelson et al., 2003), whereas focal injuries are seen in most types of non-TBI (tumors, strokes; Anderson et al., 2014).Regardless of etiology, poor executive functions (EFs) are a common outcome in pediatric ABI populations (McKinlay et al., 2016;Spencer-Smith et al., 2011).EFs are a group of cognitive skills required for purposeful goal-directed activity, and can be divided into cold and hot domains (De Luca & Leventer, 2008;Kerr & Zelazo, 2004).Hot EFs include emotion regulation, affective decision-making, and social skills (De Luca & Leventer, 2008;Kerr & Zelazo, 2004).Working memory, strategic planning, organization, goal setting, behavior monitoring, problem solving, and cognitive flexibility are classified as cold EFs (De Luca & Leventer, 2008;Kerr & Zelazo, 2004).Treatments for pediatric ABI populations commonly aim to improve hot and cold EFs.A multilevel meta-analytical integration of available studies is necessary to (1) summarize possible effects of these interventions, including multiple outcomes per study (e.g., attention, processing speed, inhibition, externalizing behavior, internalizing behavior); (2) test whether the variability between studies needs to be taken into account to obtain a more accurate estimate of effectiveness and (3) test whether treatment effectiveness is moderated by characteristics of the intervention, participants, etiology of ABI (Traumatic brain injury [TBI] or non-TBI), time of assessment, or study quality.
Children with ABI present greater risk for mental health problems, low academic attainment, and quality of life over time (Brent & Max, 2017;Chavez-Arana, Catroppa, Yáñez-Téllez, Prieto-Corona, de León, et al., 2019;Musiol et al., 2019;Prasad et al., 2017;Sariaslan et al., 2016).Hot and cold EFs are crucial for these outcomes, both contribute toward adaptive functioning and are necessary to learn new skills, apply knowledge to daily life, and establish autonomy (Zelazo & Carlson, 2012;Zelazo & Müller, 2002;Zelazo et al., 2010).There is ample empirical evidence that executive functions mediate the effect of the brain injury on social adjustment, adaptive functioning, and academic achievement (Brent & Max, 2017;Chavez-Arana, Catroppa, Yáñez-Téllez, Prieto-Corona, de León, et al., 2019;Fulton et al., 2012;Gracey et al., 2014;Ryan et al., 2019).Younger age at injury onset (<3 years), severe injuries, and adverse environmental factors are associated with worse EFs over time in both etiologies (TBI and non-TBI; Brown et al., 1981).Early injury onset may derail development, potentially resulting in abnormal connectivity and/ or disruption to the establishment of mature functional networks (Gracey et al., 2014;Anderson et al., 2014).Although the type of injury seems to be an important determinant of outcomes (Forsyth & Basu, 2015;Pastore et al., 2013), most studies investigating interventions to improve EFs include participants with TBI only (Sood et al., 2018;Wade & Kurowski, 2017) or combine participants with TBI and non-TBI (Chan & Fong, 2011;Chavez Arana et al., 2020;Brown et al., 2014;Hooft et al., 2005).Even though researchers often recruit participants with different types of brain injuries to increase sample size, most studies in psychological research are underpowered (Stanley et al., 2018).As a consequence, comparisons of treatment outcomes across different etiologies of ABI are not available (Chavez-Arana et al., 2018).Due to the strong influence that EFs have on everyday activities and long-term outcomes, multiple interventions aimed to improve hot and cold EFs have been developed.To date, the effectiveness of interventions has been investigated by both systematic yet narrative reviews and meta-analyses that analyzed EFs by disentangling them into domains and type of measurement and did not take into account the variability between studies.Therefore, the overall estimate of the effectiveness of these interventions is still unknown.

Evidence from systematic reviews
Evidence from reviews suggests that interventions are effective in improving EFs following pediatric ABI (Backeljauw & Kurowski, 2014;Chavez-Arana et al., 2018;Dvorak & van Heugten, 2018;Brown et al., 2013).Chavez-Arana et al. (2018) reported that the effectiveness varies across EFs domains.For example, improvements were reported in 55.5% of the studies targeting attention and in 75% of studies targeting emotion regulation (Chavez-Arana et al., 2018).However, the conclusions from these previous reviews are limited.First, they diverge widely in inclusion criteria and coding, leading to equivocal conclusions.For example, a review included only studies of one research group (Brown et al., 2013), another review focused on one intervention (Dvorak & van Heugten, 2018), and still another review focused on attention problems only (Backeljauw & Kurowski, 2014).Second, studies reported in these reviews contained multiple outcomes and not all outcomes improved after the intervention.Hence, synthetizing the effectiveness per intervention study is complicated via narrative reviews leading to inconclusive evidence.Lastly, these reviews consistently report methodological concerns (e.g., lack of control groups) or heterogeneity in the studies, hindering the formulation of qualified conclusions (Chavez-Arana et al., 2018;Dvorak & van Heugten, 2018;Resch et al., 2018).Health-care professionals usually rely on narrative reviews to recommend treatments (Kolaski et al., 2021), but these reviews are difficult to replicate and prone to bias and error (Prinzie et al., 2009;van IJzendoorn, 2020).Compared to narrative reviews, meta-analyses are more replicable and produce more valid results, because effect sizes within and across studies are synthetized, taking into account sample size (Prinzie et al., 2009;van IJzendoorn, 2020).

Evidence from meta-analyses
Three meta-analyses investigated interventions aiming to improve EFs following pediatric ABI (Corti et al., 2019;Linden et al., 2016;Robinson et al., 2014).In a first metaanalysis, Robinson et al. (2014) investigated interventions aiming to improve EFs (attention, working memory, inhibition, and metacognition) in children with ABI and neurodevelopmental disorders.They conducted a hierarchical meta-analysis per EF domain and type of assessment (tasks or behavior rating scales; Robinson et al., 2014).The mean effect was small to medium for attention tasks and attention behavior rating scales; large for working memory tasks, small for working memory behavior rating scales and not significant for inhibition (Robinson et al., 2014).Due to the small number of studies included, only participant diagnosis (ABI or neurodevelopmental disorders) was tested as a moderator, suggesting that effect sizes for participants with ABI were larger than for participants with developmental disorders (Robinson et al., 2014).In a meta-analysis focusing on RCTs, Linden et al. (2016) investigated technological aids for rehabilitation and conducted independent analyses per EF domain.For overall EFs (various measures), their results indicated that technological aids were effective at post assessment with a small to medium effect size whereas a non-significant estimate was reported for emotion regulation (Linden et al., 2016).More recently, Corti et al. (2019) metaanalyzed technology-based interventions addressing cognitive and behavioral outcomes following pediatric ABI.Corti et al. (2019) performed separate analyses for cognitive and behavioral outcomes.They aggregated means of all reported outcomes within a study and found a non-significant estimate for cognitive outcomes and a small to moderate effect for behavioral outcomes (Robinson et al., 2014).
Evidence from these three previous meta-analyses is inconclusive and has several limitations.First, in their meta-analyses, Linden et al. (2016) and Corti et al. (2019) used techniques which assume that effect sizes within studies are independent (Moeyaert et al., 2017).This assumption is not realistic because studies often measure related variables within the same participants (Moeyaert et al., 2017).Even if independent samples are assessed, results within studies are more likely to be similar, because they may use the same instruments or apply similar treatments (Moeyaert et al., 2017).Thus, dependency needs to be taken into account in order to reduce an increased risk of Type-2 error (Moeyaert et al., 2017).When averaging several outcomes within one study, there is an increased risk of Type-2 error because the average sampling is considered the same as the individual sampling variance (Moeyaert et al., 2017).Second, not taking into account the multiple outcomes reported within one study, makes it impossible to investigate the importance of assessment characteristics (e.g., post or follow-up assessment) in a metaanalytic model.In their meta-analyses, Robinson et al. (2014) reported separate analyses per EF domain.Therefore, a small number of studies per EF domain was included, limiting the number of moderators that could be tested (Robinson et al., 2014).By taking into account the focus of the intervention (hot or cold EFs), type of population (TBI vs ABI) and time of assessment (post or follow-up) in the current study, we can investigate whether interventions are more effective for hot or cold EFs, for TBI or ABI samples, and at post or follow-up assessment.Thirdly, and regarding participants' characteristics, similar benefits have been suggested for participants with a variety of brain injuries.To the best of our knowledge, whether effect sizes are similar for participants with TBI or non-TBI has not been systematically evaluated (Backeljauw & Kurowski, 2014).Lastly, in their meta-analyses, Robinson et al. (2014) and Corti et al. (2019) include nonrandomized studies, which are subject to biases that are not present in RCTs (Shea et al., 2017).The inclusion of non-RCTs, has been associated with inflated effect sizes (Bakermans-Kranenburg et al., 2003).In general, RCTs are considered gold standard to assess treatment effectiveness (Jones & Podolsky, 2015).
Multilevel meta-analyses are especially advantageous when studies report multiple effect sizes for multiple outcomes (Harrer et al., 2021).Meta-analyses naturally have a multilevel structure, including data from individual participants (level one) and effect sizes reported in studies (level two; Harrer et al., 2021).Statistical independence is one of the core assumptions when pooling effect sizes in a meta-analysis (Harrer et al., 2021).To fulfil this assumption, previous meta-analyses included only one effect size per study.In multilevel meta-analysis (or three level meta-analysis), multiple effect sizes per study can be analyzed by adding a third level (studies) without breaking the assumption of statistical independence (Harrer et al., 2021).Expanding meta-analysis to three level meta-analysis can better capture the mechanisms that generated the data (Harrer et al., 2021).To investigate the effectiveness of interventions in children and adolescents with ABI, studies always measure multiple variables.As a result, a multilevel meta-analysis allows to examine more adequately several outcomes clustered within the same study.To date, the evidence available leaves several questions unanswered.In order to improve the quality of care provided to the pediatric population with ABI, we need to understand whether certain characteristics of the intervention, the participants or the study are associated with treatment effectiveness.A multilevel meta-analysis including only RCTs has the potential to advance the knowledge in the field.

Objectives and hypotheses
The primary objective of this study was to conduct a multilevel meta-analysis to investigate the effectiveness of interventions aiming to improve hot or cold executive functions (EFs) in children and adolescents with acquired brain injury (ABI).As a secondary objective, we tested whether intervention characteristics (focus on hot or cold EFs, number of sessions, parent participation), participant characteristics (age, TBI, or ABI), time of assessment, and quality of the study moderated the effectiveness of interventions.Based on a previous narrative review (Chavez-Arana et al., 2018), we hypothesized that (1) interventions would be more effective in improving hot than cold EFs.Regarding participant characteristics and based on evidence supporting the relevance of the level of brain maturity at the time of injury (Anderson et al., 2014;Anderson, 2005) we hypothesized that (2) younger participants would benefit more.Given the inconclusive evidence, no hypotheses regarding length of the intervention, parent's participation and participants being diagnosed with TBI or ABI were formulated.To the best of our knowledge, this is the first multilevel meta-analysis that includes in one model interventions aiming to improve hot and cold executive functions (EFs) in pediatric ABI populations.The inclusion of multiple outcomes per study (e.g., post and follow-up assessment) enabled to test whether characteristics of the participants (age, etiology of TBI or non-TBI), the intervention (number of sessions, parent participation, focus on hot or cold EFs), time of assessment (post or follow-up) or the study quality moderate treatment effectiveness.

Method
The current meta-analysis was conducted following the PRISMA (Moher et al., 2010) and AMSTAR-2 guidelines (Shea et al., 2017).The review methods, including (research question, search strategy, inclusion/exclusion criteria) and risk of bias assessment were established prior to conducting the literature search.The following electronic databases were searched on 13 January 2022 with relevant filters when available: CINAHL (filters used: human, age all child), PsycINFO (filters used: Clinical trial, human, age all child), Lens (no filters applied, this search engine includes databases such as PubMed, PubMed Central, Microsoft Academics, and CrossRef).The exact string of the keywords used for the search is available in the Supplemental materials stored at the OSF website.SPSS-26 was used to calculate the intercoder agreement of moderator variables.WebPlotDigitizer was used to extract data from graphs (Rohatgi, 2021).Analyses were performed using esc (Lüdecke, 2018), irr (Gamer et al., 2012), dmetar (Harrer et al., 2019), meta (Schwarzer et al., 2015), and metaphor (Viechtbauer, 2010) packages in R version 2021.09.1.

Requirements for inclusion
Titles and abstracts were reviewed independently by two authors (C.C-A.and C.S-J.) to determine eligibility for inclusion.Inclusion criteria were: (1) children and adolescents with ABI or families of children or adolescents with ABI; (2) participants aged 0-18 years.For papers that included children, adolescents, and adults, the majority of the participants had to be younger than 18 years at the time of the study to be included; (3) any intervention or combination of interventions to improve EF and/or social skills and/or reduce disinhibited behavior or behavior problems (i.e., internalizing or externalizing); (4) the components of the intervention were described; (5) using an RCT design.Exclusion criteria were: (1) review articles; (2) predominance of participants with a diagnosis other than ABI (e.g., neurodevelopmental disorders, abusive head trauma, leukemia); and (3) outcome other than EF (e.g., parental stress).Abstracts were categorized as eligible, likely eligible, or ineligible.No language limits were used in the search.Reviewers agreed on 98% of the abstracts (kappa = .76).When consensus was not initially achieved, full papers underwent review using a standard data collection form and then a consensus decision was made.
The electronic search generated 2,691 papers.A total of 2,308 papers were screened of which 31 papers met the inclusion criteria.After reading the full papers, 7 papers were excluded.One paper was excluded because it was comparing two interventions and did not include a control group (Brandt et al., 2021), two papers were excluded because the main focus of the intervention was not to improve EFs (Piovesana et al., 2017;Thomas-Stonell et al., 1994) two papers were excluded because they included participants with abusive head trauma (Kurowski et al., 2020;Wade et al., 2019) and two papers were excluded because the outcomes were already reported in other papers included (Kurowski et al., 2013;Wade et al., 2014).Finally, 23 independent studies (described in 23 articles and one thesis dissertation (Sood, 2021)) were included in the systematic review.Figure 1 illustrates the selection process and results of the search.

Data extraction
For each paper, we extracted data to calculate effect sizes (Fisher Z), standard errors, and variances.Multiple effect sizes within studies were synthesized using a multilevel analysis.For each paper, first author, measure used, EF targeted, time of assessment (post or follow-up), and informant (parent, teacher, child) were extracted and entered into the code sheet by one reviewer (C.C-A).Once the basics of the code sheet were completed, the following data were extracted if reported or applicable and entered into a code sheet  Downs & Black, 1998).The sources of funding for individual studies are available in the Supplemental materials stored at the OSF website.The number of months/weeks in which the intervention was delivered was not always clearly reported.Therefore, we extracted the total number of sessions.Only primary outcomes were included in the analysis because secondary outcomes were commonly measured to investigate whether treatment effects transfer to other scenarios or domains.Lastly, the data extracted for the calculation of the effect sizes were examined by a third reviewer (P.P).
A total of 113 effect sizes were extracted of which 81% (k = 92) were pooled from means, standard deviations, and n of the intervention and control group at pre-and postor follow-up assessment.The code used to pool the effect sizes is available in the Supplemental materials stored at the OSF website.For 14% (k = 16) of the effect sizes, this information was available at post or follow-up assessment only.Three effect sizes were extracted from mean, SEs, and n of the intervention and control group at post or follow-up assessment (Kurowski et al., 2014;Tlustos et al., 2016).Two effect sizes were calculated from T-scores, p-values, and n of the intervention and control group and follow-up assessment (Wade et al., 2015).The study by Aguilar et al. (2019)  and one control group.Similarly, the study by Wade et al. (2018) included two intervention groups (TOPS-family and TOPS Teens only) and one control group.To calculate effect sizes in these studies, the number of participants in the control group was divided by two, and effect sizes were calculated for each intervention group.We recoded the effect sizes when necessary (after pooling them in R studio) to ensure that positive effect sizes always indicated better EFs.

Intercoder agreement
The intraclass correlation coefficient (ICC) single rater absolute agreement was calculated for continuous variables, and kappa for categorical variables.The mean of the ICC and k were .91 indicating excellent agreement among raters.ICC and kappa per variable are available in the Supplemental materials stored at the OSF website.Differences in data extraction were identified, and corrected data were used.The effect sizes (Fisher Z) were transformed to z-scores to identify outliers (effect sizes > or < |3.29|).One effect size was identified as an outlier (z score = 3.56, effect size ID 15; Chan & Fong, 2011) and excluded from analysis, z-scores ranged from −2.50 to 2.70.A total of 112 effect sizes were metaanalyzed.

Quality assessment
The DB (Downs & Black, 1998) assesses the quality with four subscales: reporting (10 items), external validity (3 items), internal validity (7 items), and confounders (6 items; Downs & Black, 1998).The scores of each subscale were classified into high, some or low concern.Corresponding methodological quality are High concern: for reporting 1-5, for external validity 1; for internal validity 1-3, for confounders 1-2; Some concern: for reporting 6-8, for external validity 2, for internal validity 4-5, for confounders 3-4; Low concern: for reporting 9-10, for external validity 3, for internal validity 6-7, for confounders 5-6.The ICC for the methodological quality indicated good reliability (ICC = .80).The mean methodological quality was 21.31 (SD 2.24, minimum 18, maximum 26).About half of the studies presented some level of concern in reporting, external validity, internal bias, and confounders.About half of the studies showed a low level of concern for quality issues.Most importantly, no study received a rating of high concern for any of the quality dimensions.The score per item on the DB is available in the Supplemental materials stored at the OSF website.

Meta-analytic procedure
Using a multilevel meta-analytic approach (with effect sizes clustered within studies) a three-level random effects model (participants, effect sizes, studies) was fitted with all effect sizes (k = 112) to test whether a three-level model captured the variability in our data better than a two-level model (participants, effect sizes).The Knapp-Hartung adjustment for confidence intervals was applied to reduce the risk of false positives, and the restricted maximum likelihood (REML) method was used to take account of the between-study heterogeneity (Harrer et al., 2021).Model fit was compared between the full model and the reduced model with ANOVA.The likelihood ratio test (LRT), the Akaike (AIC), and the Bayesian Information Criterion (BIC) were used to compare the full three-level model with the reduced two-level model excluding the study level.
Lastly, to explain the variance between studies and effect sizes, meta-regressions were conducted to identify differences in effect sizes between interventions focusing on hot or cold EFs, duration of the intervention, participation of parents (parents participating or not), etiology of ABI (TBI or non-TBI), time of measurement (post or follow-up), age at intervention, quality of the study and number of sessions.The code used for analyses is available in the Supplemental materials stored at the OSF website.

Forest plot and publication bias
Effect sizes were transformed to Fisher Z for analysis (because of distribution advantages; Borenstein, 2019) and combined per study to compute a forest plot (see Figure 2).In addition to the overall pooled effect size, we also computed the 95% confidence interval (CI) to estimate the precision of the pooled effect size (boundaries within which the real effect size is expected), and the prediction interval to estimate the dispersion in which the effect size of a future study might be expected to fall (Borenstein, 2019).
To evaluate publication bias, the procedures described by Harrer et al. ( 2021) were followed.First, funnel plot asymmetry was assessed by the linear regression test of funnel plot asymmetry (Egger et al., 1997).Second, the Duval and Tweedie (2000) trim and fill procedure was used to examine how adjustment for possible bias would affect the combined effect size.Lastly, we conducted a p-curve analysis by plotting the percentage of exact p-values of all significant effects (p < .05).The p-curve indicates whether researchers used p-hacking to polish their results with peaks of significant results near the magic cutoff criterion of p = .05(Simonsohn et al., 2015).

Characteristics of the studies included
Table 1 describes the characteristics of the 23 studies included in the analyses.A total of 970 participants were investigated.The mean age of the participants was 12 years and 2 months (SD 3.07, minimum 5 years 1 month, maximum 15 years 3 months).More than half of the studies (52%) included teenagers (age > 13 years).Most of the studies included participants with TBI only (65%), required parents to participate in the intervention (65%), and received an intervention targeting hot EFs (70%).The number of sessions in the interventions varied greatly among studies (mean 23, SD 19.70, minimum 6, maximum 102).More than half of the interventions were delivered online (57%), and the rest face to face (43%).The mean sample size was N = 42 (SD 28, minimum 14, maximum 131).As can be seen in Table 1, results from study 11 were reported in four papers (Karver et al., 2014;Kurowski et al., 2014;Tlustos et al., 2016;Wade et al., 2015), results from study 15 were reported in two papers (Hooft et al., 2005;van 't Hooft et al., 2007).From the 23 studies included, 14 (61%) studies reported outcomes at post-treatment only, whereas 9 (39%) studies reported outcomes at post and follow-up.A trial registration number was reported in 48% of the studies.From the 112 effect sizes, for 62.5% the informant of the assessment was the child/adolescent, for 36.6% the informant was a parent or main caregiver and for 0.9% the informant was the teacher

Moderator analysis through meta-regression
Associations between potential moderators of effect sizes showed one high correlation between the etiology of ABI with delivery mode (r = .65;see Table 2).To decrease the risk of multicollinearity, delivery mode was left out of the meta-regression.Other significant correlations were observed between hot or cold EFs and parent participation (r = .51)and number of sessions (r = −.53),indicating that interventions targeting hot EFs were  The overall meta-regression was significant (F (8, 103) = 4.99, p = < .0001).Lower methodological quality studies that included participants with a diagnosis of non-TBI, and parental participation in the intervention were associated with larger effect sizes (see Table 3).No significant associations were seen regarding age, number of sessions, postor follow-up assessment, and focus on hot or cold EFs.

Publication bias
Publication bias is a bias against publication of papers with non-significant results.We computed the combined effect size across the various measures within each study.Figure 2 displays the forest plot with the Fisher Z effect size and standard error combined within each study.The random effects model estimated the overall Fisher Z = 0.18, 95%-CI [0.08 ~ 0.29], t = 3.65, p = .0014.This effect size is equivalent to a Cohen's d = 0.36 (CI   Code of the variables: etiology of traumatic brain injury = 0, etiology of traumatic and/or non-traumatic brain injury = 1; parent participated = 1, not participated = 0; post assessment = 1, follow-up assessment = 0; cold executive functions = 0, hot executive functions = 1. 0.16 ~ 0.59).This is close to the combined effect size of the multilevel meta-analysis above.
To investigate publication bias, Eggers' test was conducted.The Eggers' linear regression test of funnel plot asymmetry suggested the absence of a publication bias (t (21) = 0.73, p = .47).In the Duval and Tweedy trim and fill procedure, no effect sizes had to be added.Because only one study showed a significant effect (Chan & Fong, 2011), no p-curve analysis could be conducted.Sample sizes of the RCTs seemed too small to find significant results, whereas the quantitative synthesis revealed an overall statistically significant outcome which from an applied perspective might also be considered substantial.A publication bias based on p-hacking seemed improbable.

Discussion
The primary objective of this meta-analysis was to investigate the effectiveness of interventions aiming to improve hot and/or cold executive functions (EFs) in pediatric ABI populations.Results show that interventions are effective with small to medium effect sizes according to conventional criteria and a three-level model was more appropriate (taking into account dependent effect sizes within studies) than the two-level model.The second objective was to identify intervention, participants,' time of assessment and study characteristics associated with effect sizes.We found that lower methodological quality, studies that included participants with non-TBI and parents' participation in the intervention predicted larger effect sizes whereas age, number of sessions, a focus on hot or cold EFs and time of assessment did not.

Intervention effectiveness
To the best of our knowledge, this is the first multilevel meta-analysis including RCTs focusing on interventions to improve executive functioning that included both hot and cold EFs in one model.Our meta-analysis shows that a three-level model was more appropriate than a two-level model to represent the variability in the data.Most of the variance (62.76%) was introduced by level two (effect sizes) and a considerable proportion of the variance (20.15%) was introduced by level three (study level), indicating that effect size dependence, introduced by level three, should be taken into account.Statistical independence is one of the core assumptions in meta-analysis (Harrer et al., 2021).When dependency between effect sizes is not taken into account, there is a risk of false positives (Harrer et al., 2021).Thus, the current multilevel meta-analysis is theoretically more in line with the studies in the field, in which multiple outcomes are reported per study, and therefore likely to be dependent.
By including all types of EFs in one model, we found that regardless of the EF domain, interventions in pediatric ABI can lead to improvements.Because the three previous meta-analyses conducted separate analyses based on EF domain an overall estimate of treatment effectiveness was not provided (Corti et al., 2019;Robinson et al., 2014).Our results, based on 23 studies, are partly in line with the results from the three previous meta-analyses.Similar to our results, small to medium effect sizes were reported by Robinson et al. (2014) for attention (tasks n = 4, and behavior rating scales n = 8), working memory (behavior rating scales n = 3), Linden et al. (2016) for overall EFs (n = 3), and Corti et al. (2019) for behavioral outcomes (n = 7).At the same time and in contrast with our results, Robinson et al. (2014)reported large effect sizes for working memory (tasks n = 7), and non-significant effects for inhibition (n = 3).But Linden et al. (2016) and Corti et al. (2020) reported non-significant estimates for emotion regulation (n = 2) and cognitive outcomes (n = 7).The estimates provided in the previous meta-analyses were based on smaller numbers of studies and varied from not effective to large effect sizes.In contrast, we conducted one analysis including 23 studies, allowing to estimate the effect size with more precision and to test multiple moderator effects.
Even small effect sizes can be important in (clinical) practice (McCartney & Rosenthal, 2000).In particular, EFs are required for everyday life, to learn new skills, solve problems, and establish autonomy (Zelazo & Carlson, 2012;Zelazo & Müller, 2002;Zelazo et al., 2010).Cohen (1988) indicated that effect sizes should be evaluated taking into account the context and domain under investigation.Effects that are small according to the conventional Cohen´s criteria, might be large relative to the impact of field-specific interventions (Kraft, 2020).In addition, the inclusion of non-RCTs has been associated with inflated effect sizes (Bakermans-Kranenburg et al., 2003).Children and adolescents with ABI commonly present deficits in EFs which are a risk factor of mental health problems and hamper quality of life (Beauchamp & Anderson, 2010;Chavez-Arana, Catroppa, Yáñez-Téllez, Prieto-Corona, de León, et al., 2019).With an effect size of d = 0.38, the number needed to treat indicates that implementing an intervention with five pediatric patients with a diagnosis of ABI or their families leads to one additional patient or family to experience substantial improvement in EFs (Harrer et al., 2021).Interventions aiming to improve EFs are effective and can benefit this population.

Quality of the study
Moderation analyses revealed that lower methodological quality was associated with larger effect sizes.The criteria used to assess the quality of the studies, involved quality of reporting, external validity, internal validity, and confounders (Downs & Black, 1998).This score appeared to be useful to capture the nuances of the studies included in our meta-analyses.Thus, the cumulative differences in quality across studies (e.g., concealed randomization until recruitment was completed) appeared to be one of the sources of heterogeneity.For example, some studies did not describe characteristics of the participants who dropped out (Braga et al., 2012;Cook et al., 2014;Wade et al., 2011) did not report the actual probability values (e.g., < 0.05; Séguin et al., 2017;Tlustos et al., 2016;Wade, Carey, et al., 2006) or did not blind the assessors of the intervention outcomes (Aguilar et al., 2019;Chan & Fong, 2011;Brown et al., 2014;Wade et al., 2017).We included only RCTs, because this design is considered the gold standard to investigate intervention effectiveness (Jones & Podolsky, 2015).However, the use of a RCT design is not sufficient to assure high-quality evidence because a low-quality RCT can provide misleading evidence (Negrini et al., 2021).For this reason, the quality of the RCTs must be appraised (Negrini et al., 2021).By appraising the RCTs, we found that the less strict methodologies were associated with larger effect sizes, indicating that methodological aspects such as not including blind assessors, not concealing the randomization assignment, or not taken into account in the analyses participants who drop-out from the intervention can lead to reporting increased effect sizes.In general, health-care professionals make decisions grounded on evidence commonly obtained in systematic reviews of RCTs (Kolaski et al., 2021).However, if the methodological quality of RCTs is not high-quality research, the evidence used for treatment decisions is questionable.Our results emphasize the importance of using high-quality research methodologies to provide trustworthy evidence for treatment decisions.

Participants' characteristics
Studies that included participants with non-TBI were more effective compared to studies that included TBI only.It may be more challenging to improve EFs deficits associated with TBI, compared to deficits associated with non-TBI (Emanuelson et al., 2003).TBI is characterized by diffuse injuries, which entail a widely distributed damage to axons (Andriessen et al., 2010), and may lead to more persistent deficits in EFs.This is in line with previous evidence suggesting that children with all severities of TBI did not fully recover to their pre-injury level of functioning (Keenan et al., 2021).Two of the eight studies that included participants with ABI included participants with non-TBI only (Barrera et al., 2018;Ruiter, 2016), whereas other studies included a variety of brain injuries (along with TBI).The current study suggests that efficacy of intervention programs is related to the etiology of the injury.This could be due to the deficits in EFs present prior the TBI onset (McKinlay et al., 2017).For example, children and adolescents with poor EFs are more likely to present disruptive and impulsive behavior and make decisions that put their lives at risks, which also puts them at higher risk of experiencing a TBI (Hofmann et al., 2012;Oosterlaan et al., 1998;Zelazo & Carlson, 2012).Due to the reciprocal relationship between experiencing a TBI during early childhood and poor EFs, EFs may be more difficult to improve following TBI compared to non-TBI.
Age was not associated with treatment outcomes, indicating that interventions are effective regardless of the participant´s age.Of note, the age of the participants in the current meta-analysis ranged from 5 to 15 years.There is empirical evidence that younger age (<3 years) at injury onset is associated with worse EFs outcomes (Anderson et al., 2014).However, we found almost no studies investigating interventions in young children (Aguilar et al., 2019;Wade et al., 2017).Future research should study the effectiveness of interventions in young children.

Intervention characteristics
Parent participation predicted larger effect sizes in interventions aiming to improve EFs.Similar to our results, Brown et al. (2013) and Chavez-Arana et al. ( 2018) considered parent or caregiver participation as a key ingredient in interventions focusing on hot EFs.The current meta-analysis offers a more replicable appraisal.For cold EFs, Dvorak and van Heugten (2018) suggested that implementing the intervention at school may reduce stress for parents, as they have not to worry about finding time for organizing the training at home.Our results reveal that larger effect sizes were found in interventions that include parent participation.Therefore, even if for practical reasons the intervention is delivered at school, the involvement of parents in the intervention seems to be relevant.The importance of parent participation in the intervention may be due to the strong influence that the context has on the development of EFs (Zelazo et al., 2010).For example, dysfunctional parenting practices and parental stress have been associated with poor recovery following ABI (Chavez-Arana, Catroppa, Yáñez-Téllez, Prieto-Corona, Amaya-Hernández, et al., 2019;Rashid et al., 2014).At the same time, factors such as effective communication within the family, better coping strategies, education about pediatric ABI, and authoritative parenting practices have been identified as protective factors (Keenan et al., 2021;Rashid et al., 2014).These factors are targeted in many of the interventions by teaching families strategies to promote authoritative parenting practices (e.g., daily routines, effective instructions), effective communication (e.g., acknowledging the different perspectives and meanings that each family member gives to an event), coping skills (e.g., focusing on the present and noticing how day-to-day interactions influence adaptation) and provided education about pediatric ABI (e.g., recovery process; Chavez Arana et al., 2020;Brown et al., 2014;Hickey et al., 2016) 2018) reported more improvements in studies targeting hot EFs (e.g., 75% of the studies targeting emotion regulation) than in studies targeting cold EFs (e.g., 55.5% of the studies targeting attention; Chavez-Arana et al., 2018).However, by meta-analyzing the studies, we observed that the focus of the intervention on hot or cold EFs is not associated with treatment outcomes which illustrates that reviews may lead to incorrect conclusions.
The number of sessions was not associated with effect sizes.In a narrative review, Chavez-Arana et al. (2018) suggested that interventions delivered during a shorter period of time were more effective to improve cold EFs, whereas interventions delivered during an extensive period of time were more effective in improving hot EFs.The number of sessions varies greatly among the interventions included in this meta-analysis (from 6 to 102), probably due to the different EFs targeted or strategies used.That the number of sessions was not related to treatment effectiveness may be due to the different EFs domains targeted by the interventions.

Time of assessment
Time of assessment did not explain variability in effect sizes.In general, outcomes measured immediately after the intervention are expected to produce larger effect sizes (Kraft, 2020).Contrary to expectations, we found no differences between post and follow-up assessments, suggesting that results are maintained from post to follow-up assessment.That participants consistently use and practice the learned skills once the intervention was completed may explain why improvements are maintained.The effect maintenance should be considered good news from a cost-benefit perspective as intervention effects do not seem to fade away with time.

Publication bias
There was no indication of publication bias.Many studies (48%) included in this metaanalysis reported a trial registration number which may be related with the fact that researchers adhere to their original plan and avoid p-hacking.It also illustrates the importance of meta-analysis to synthesize null findings from multiple underpowered RCTs.The field would profit from coordinated efforts to develop multi-sites consortia in the planning stage of intervention evaluation to increase the number of participants involved and create sufficiently powered RCTs to detect the ES to be expected in this complicated clinical area.

Limitations
Although the current meta-analysis is the first multi-level meta-analysis focusing on the effectiveness of interventions aiming to improve EFs, there are also some limitations that should be mentioned.First, we investigated primary study outcomes only.Second, to avoid multicollinearity, delivery mode could not be included in our model.Thirdly, the clinical trials investigated in the current meta-analysis included different types of control groups (e.g., waitlist, alternative active intervention) and we did not test the type of control group as a moderator, because it was not initially planned and because the number of moderators tested already reached a limit considering the relatively modest number of effect sizes including in the meta-analysis.To enhance reproducibility and replicability, we made all "raw" data and the various analytic steps available in the Supplemental materials stored at the OSF website.

Future directions
Future meta-analyses should investigate whether the effects of interventions transfer to other settings and to outcomes that are not the main target of the intervention (e.g., academic skills); whether delivery mode is associated with treatment effectiveness; whether the type of control conditions introduced heterogeneity to the meta-analysis; and whether an interaction term between number of sessions X intervention focus on hot/cold moderates treatment effects.Future RCT studies might try to avoid the weaknesses found in the current set of RCTs on the level of designs, measures, and statistical power, to improve the quality of evidence for application in clinical practice.

Conclusions
Despite the fact that almost all individual studies showed non-significant effect sizes due to small samples in combination intervention studies aiming to improve EFs in children and adolescents with ABI are effective with small to medium effect sizes according to conventional criteria.Lower methodological quality, parent participation, and the inclusion of participants with non-TBI predicted larger effect sizes according to conventional standards.The results suggest that it is more difficult to improve EFs following TBI, compared to non-TBI.Regardless of the focus on hot or cold EFs, parent participation might be recommended to enhance treatment effectiveness.Focus of the intervention (hot or cold EFs), participant´s age, number of sessions, time of assessment were not associated with EFs.Using high-quality research methodologies with well-powered designs is fundamental to provide trustworthy evidence for health-care professionals.
by two independent reviewers (C.C-A.and C.S-J.):(a) participants' characteristics (age at intervention, etiology of ABI (TBI and/or non-TBI)); (b) intervention characteristics (focus on hot or cold EFs, delivery mode [online or face to face], number of sessions, parents participating [yes/no]); (c) measure characteristics (relevance of measure [primary or secondary outcome], higher scores indicate dysfunction [to identify effect sizes that needed to be reversed to indicate improvement]), time of assessment [post or follow-up]; (d) data needed to calculate effect sizes, and (e) methodological quality of the study using 26-items of the Downs and Black checklist for Measuring Study Quality (DB;

Figure 1 .
Figure 1.Flow diagram of the article selection.

Figure 2 .
Figure 2. Forest plot of the combined effect sizes of the 23 studies showing the effect sizes in Fisher Z, the overall combined effect size (Fisher Z = 0.18, 95%-CI [0.08 ~ 0.29]).This effect size is equivalent to a Cohen's d = 0.36 (CI 0.16 ~ 0.59).
brain injury; ACT: Acceptance and commitment therapy; AMAT-C: Amsterdam memory and attention training for children; CAPS: counselor-assisted problem solving; Deficits: indicates if presenting deficits or complaints prior the intervention was an inclusion criteria; F-up: Follow-up; FTF: Face to face; I-InTERACT: Internet-based Interacting Together Everyday Recovery After Childhood TBI; IP: Intervention program; Luminosity: Lumosity Cognitive Training (Lumos Labs, Inc.); non-TBI: Non-traumatic brain injury; Q: Questionnaire; Signposts: Signpost for building better behavior; SSTP: Stepping Stones Triple P; TBI: Traumatic Brain injury; T: Training; TOPS: Teen online problem solving; TP: Training program; UTD: Unable to determine.associated with parents participating in the intervention, and interventions targeting cold EFs were associated with larger number of sessions.Including a follow-up assessment was associated with larger sample size (r = −.51) and higher methodological quality (r = −.50).

Table 1 .
Characteristics of the papers and studies included in analyses.

Table 2 .
Correlations among moderators of intervention effect sizes.

Table 3 .
Results of multilevel meta-regression on intervention effect sizes.
ABI: acquired brain injury; Cold: Intervention focused on improving cold EFs; ci.lb: confidence interval lower bound; ci.ub: confidence interval upper bound; Hot: intervention focused on improving hot EFs.
. Parents are a fundamental constant in the child´s life, their participation may facilitate frequent practice of EFs in different real life contexts.By including both hot and cold EFs in one model, we were able to compare effect sizes, and found no difference in the efficacy between interventions focusing on hot vs cold EFs.Based on a previous narrative review (Chavez-Arana et al., 2018), we expected interventions targeting hot EFs to be more effective than interventions targeting cold EFs.In this review, Chavez-Arana et al. (