Multi-Level Meta-Analysis of Physical Activity Interventions During Childhood: Effects of Physical Activity on Cognition and Academic Achievement

There is evidence that physical activity positively influences cognition and academic outcomes in childhood. This systematic review used a three-level meta-analytic approach, which handles nested effect sizes, to assess the impact of physical activity interventions. Ninety-two randomised control trials in typically developing children (5–12 years old, N = 25,334) were identified. Control group type and intervention characteristics including duration, frequency, and teacher qualification were explored as potential moderators. Results showed physical activity interventions improved on-task behaviour with a large effect size (g = 1.04, p = 0.03 (95% CI: 0.08–2.00), very low-certainty evidence) and led to moderate improvements in creativity (g = 0.70, p < 0.01 (0.20–1.20), low-certainty evidence). Small beneficial effects were found for fluid intelligence (g = 0.16, p = 0.03 (0.02, 0.30), moderate-certainty evidence) and working memory (g = 0.18, p = 0.01 (0.07–0.29), very low-certainty evidence), but no overall benefit was observed for attention, inhibitory control, planning, cognitive flexibility or academic outcomes. Heterogeneity was high, and moderator analyses indicated beneficial effects of physical activity (PA) with academic instruction of 6–10-week duration with moderate or moderate to vigorous intensity on mathematics outcomes and enriched PA programmes on language outcomes. In contrast, aerobic PA with moderate to vigorous intensity benefitted executive function outcomes. These results therefore suggest differential mechanisms of impact of different types of PA on different aspects of cognition.


Introduction
Cognitive control refers to a set of top-down cognitive processes, often called executive functions (EF), that allows the regulation of attention and behaviour when automatic responses are inappropriate (Diamond, 2013). Core cognitive control processes include working memory, inhibitory control and cognitive flexibility. Individual differences in cognitive control are associated with academic achievement (AA) in preschool and throughout the school years, even after controlling for IQ (Allan et al., 2014;Dumontheil & Klingberg, 2012;Jacob & Parkinson, 2015;Zelazo et al., 2016). There has therefore been a search for interventions that may reliably foster cognitive control skills and show transfer of benefits to academic outcomes (Diamond & Ling, 2016). One research area that has grown in recent years is the study of whether physical activity can improve cognitive processes (Dishman et al., 2006;Tomporowski et al., 2008).
There is evidence that physical activity (PA) helps protect the brain against neurological diseases in ageing such as Parkinson's disease and Alzheimer's dementia (Colcombe & Kramer, 2003;Dauwan et al., 2021). Children and older adults have been the focus of studies of the effects of regular PA practice on cognition because of the proposed malleability of developmental processes and interest in compensating for ageing processes during these time periods (Hötting & Röder, 2013;Kramer & Colcombe, 2018). The neurobiological hypothesis suggests that regular PA encourages changes in the central nervous system through the formation of new neurons in the hippocampus, blood vessel formation in the brain and increased grey matter volume in the areas of the brain responsible for learning and memory processes (Dishman et al., 2006;Singh et al., 2019). Evidence supporting the neurobiological hypothesis has mainly come from animal studies (e.g. Neeper et al., 1996;Oliff et al., 1998;Rhodes et al., 2003;Vaynman et al., 2004) and correlational and intervention brain imaging studies in older adults (Colcombe et al., 2006;Reiter et al., 2015;Ruscheweyh et al., 2011;Scheewe et al., 2013;Voss et al., 2010Voss et al., , 2013a. Within the neurobiological hypothesis, the metabolic demands of physical effort are moderated by the dose of the activity (e.g. frequency, intensity, session duration or duration of practice) (Audiffren & André, 2019).
The effects of regular PA may be influenced by individual characteristics of the participant, including age, fitness level or weight status , or task and contextual factors. Considering the early malleability of brain regions associated with EFs, especially during childhood, children may experience different effects of regular PA than adults or older adults (Best & Miller, 2010). Task characteristics include aspects related to dose of PA such as frequency, intensity, session duration and duration of practice. Although some researchers have found that aerobic exercise has beneficial effects on cognition, in particular executive functions, others have argued that it does not (de Greeff et al., 2018;Diamond & Ling, 2016).
Beyond the intensity of PA, and the aerobic nature of the exercises, the type of PA can vary considerably. PA can be integrated with academic instruction, which may allow enhancing PA without losing academic instruction time (Mavilidi et al., 2018a, b;Sneck et al., 2019;Vazou & Skrade, 2017). The extent to which the PA is linked to the subject can vary. Examples of close integration would be asking children to give the answer to a sum by moving as many times as the answer (e.g. 2 + 3 = 5 star jumps) or to jump on letters to correctly spell a word. In other cases, the PA can be interspersed with the mathematics or spelling problems, for example when children have to run between workstations. PA can be implemented through holistic movement practices such as yoga (Vergeer and Biddle, 2021). Activities can vary in their cognitive demands. Cognitively enriched activities tend to go beyond drill practice and involve cognitively challenging tasks. For example, children may be asked to move in different ways between two parts of the hall or to play a game of soccer with new rules, with two balls and four goals (e.g. Kolovelonis et al., 2022). Context matters too (Álvarez-Bueno et al., 2017a, b). This may relate to the setting of practice, for example a school or sports club. Although settings are important, the experience of a teacher, irrespective of the setting, may have a greater influence on the effect of an intervention. In primary school, PA can be taught by specialist-trained teachers or generalist teachers. Evidence suggests that higher pedagogical qualifications in PA can elicit greater changes in cognitive skills and academic performance (Pesce et al., 2013;Sember et al., 2020;Tocci et al., 2022).
In general, only narrow transfer from the cognitive skill trained to real-life contexts have been shown in intervention studies (Diamond & Ling, 2019). It has been suggested that PA interventions constructed to target a specific cognitive skill such as working memory will therefore not have a broad transfer over to other cognitive skills such as inhibition, even though they are interrelated (e.g. Pesce et al., 2021a, b). A recent meta-analytic review investigated the effects of physical activity on brain structure and neurophysiological functioning in children (Meijer et al., 2020). The review focused on randomised control trials (RCT) and cross-over design studies incorporating structural neuroimaging, functional magnetic resonance imaging (fMRI), electroencephalography (EEG) and testing working memory, inhibitory control and switching. It was found that regular PA led to functional, but not structural changes, in the brain. More specifically, four studies measured white matter integrity in overweight (k = 2), deaf (k = 1) or healthy children (k = 1) and reported mixed findings. Ten studies measured EEG or fMRI data in healthy (k = 6) or clinical groups (k = 4) and reported positive changes in neurophysiological functioning associated with improved cognitive task performance (Meijer et al., 2020). This suggests that a cognitive pathway in children maybe more pertinent than in ageing, specifically one that involves changes in learning and memory processes rather than brain structure (Pesce et al., 2021a, b).
Two theories rely on cognitive explanations for change. First, the skills acquisition theory postulates that the motor and cognitive complexity of PA influence cognitive processes . For example, PA can be considered cognitively engaging when it requires complex movement patterns rather than simple repetitive movements. It has been suggested that the response to practicing complex tasks may interact with the response to the level of physical effort required . Second, the theory of embodied cognition emphasises the importance of grounding cognition in the body and suggests that mental processes are supported by interactions between the body, the brain and the external environment (Wilson & Foglia, 2017).
Whilst meta-analyses are now considering both quantitative and qualitative aspects of PA (e.g. Vazou et al., 2019), the main overarching question has been whether PA influences cognition and educational outcomes. There is systematic and meta-analytic evidence that sustained PA, applied in an experimental setting, positively influences cognition and academic outcomes in childhood (e.g. Donnelly et al., 2016), and notably, that it can lead to improvements in cognitive control and academic performance (Álvarez-Bueno et al., 2017a, b;de Greeff et al., 2018). However, these meta-analyses are not all in agreement with regard to which specific cognitive and academic outcomes are influenced by PA (Takacs & Kassai, 2019).
Recent meta-analytic evidence indicates that PA improves working memory (Álvarez-Bueno et al., 2017a;de Greeff et al., 2018;Takacs & Kassai, 2019), on-task behaviour (Álvarez-Bueno et al., 2017b;Watson et al., 2017) and mathematic performance (Álvarez-Bueno et al., 2017b;de Greeff et al., 2018;Sneck et al., 2019), but not reading or spelling skills (Álvarez-Bueno et al., 2017b;de Greeff et al., 2018). The meta-analytic evidence is more mixed with regard to attention, inhibitory control and cognitive flexibility benefits after PA (Álvarez-Bueno et al., 2017a;de Greeff et al., 2018;Takacs & Kassai, 2019;Xue et al., 2019). Overall, this pattern of results suggests that PA interventions may specifically impact fluid intelligence, the ability to reason quickly and think abstractly to solve problems, not relying on previous knowledge (Horn & Cattell, 1966;Meichenbaum, 2013), rather than crystallised intelligence. Creativity may play a role in problem-solving and fluid intelligence and is the focus of certain PA interventions (e.g. dance). Only one previous meta-analysis investigating PA programmes has included creativity and fluid intelligence outcomes and found small significant improvements with low heterogeneity (Álvarez-Bueno et al., 2017a). Álvarez-Bueno and colleagues included seven studies involving 724 children and adolescents for what they called higher-order metacognitive outcomes, which included measures of creativity and fluid intelligence. However, some studies that were included in this meta-analysis were not representative of the population including studies which focused on overweight children or a single ethnic background.
We identified three major limitations of previous meta-analyses. First, all previous studies applied random effects which did not allow for the inclusion of clustered effect sizes. Where more than one effect size was included for an outcome from a study, these were averaged, which compromises the robustness of results (e.g. Álvarez-Bueno et al., 2017a). We seek to extend previous meta-analytic research by applying multi-level modelling (Cheung, 2019) to allow the consideration of multiple outcomes within studies to increase the sample size for each outcome investigated, which will allow us to consolidate previous contrasting findings in the PA literature for typically developing children. Second, only one review included metacognitive outcomes (Álvarez-Bueno et al., 2017a). Since 2017, there has been an increase in studies focusing on fluid intelligence and creativity during childhood. Third, in recent years, there has been a focus on holistic and real-world movement practices that have not been included in previous reviews (e.g. yoga, creative dance/ movement, capoeira, gymnastics). Incorporating a broader range of PA, which focus on creativity and the whole person and not just their physical health, will allow us to explore more broadly links between PA and cognitive and educational outcomes. The focus will be on the sustained impact of regular physical activities rather than on acute effects (see Best and Miller (2010) for a discussion of this distinction). Here, we report the main meta-analytic findings and consider the possible impact of commonly considered intervention characteristics (type of control group, age, frequency, intervention duration, intensity, session duration, teacher qualifications and type of activity). In a companion paper (Vasilopoulos et al., in press), we consider how cognitively engaging different types of PA are by focusing on characteristics of an intervention that may foster a creative practice (e.g. group activities, openended activities, varied tasks). In this way, we aim to provide an encompassing view from a dose-response perspective in this paper, with our second companion study seeking to expand skill acquisition and embodied cognition theories by focusing on the qualitative aspects of physical activity that could support cognitive development Vazou et al., 2019).

Study Selection
This systematic review and meta-analysis was performed according to PRISMA guidelines (Page et al., 2021), and methods were pre-specified and documented in advance in a protocol that was published on the Open Science Framework database for preregistered reviews (https:// osf. io/ uvpb4). The primary search included seven electronic databases: PubMed, Education Resources Information Center, British Education Index, Australian Education Index, Applied Social Sciences Index and Abstracts, Web of Science and PsycINFO. In addition, we also searched reference lists and citations of eligible studies and previous reviews and meta-analyses of this literature to identify any additional studies.
The search strategy was built around identifying key terms for (i) PA, (ii), EFs or AA and (iii) children (Table 1, see Online resource Table S1 for complete search strategy for each index). Medical Subject Headings (MeSH) terms, free text words and boolean logic were used. The search was restricted to peer-reviewed publications written in English and published between 01/01/2000 and 30/09/2022, as there were no robust studies which focused on PA and educational outcomes for children before then (e.g. Álvarez-Bueno et al., 2017a;de Greeff et al., 2018). The searches were completed on 03/10/2022.
The eligibility criteria were determined based on the PICOS framework (Bowling & Ebrahim, 2009). Studies were eligible for the review if they (1) reported randomised control trials, (2) evaluated PA interventions with an objective outcome measure of cognition or AA in 5-12-year-old children, (3) and were peer-reviewed and published in the English language. Studies targeting populations with a diagnosed developmental delay, a developmental disorder, obesity or physical or mental illnesses were excluded as the focus of this review is typical development (see Table 2 for a complete list of inclusion and exclusion criteria). One reviewer ran the preliminary search; the article identification, screening and selection process were performed by two independent reviewers. The search retrieved 12,735 unique articles, of which 239 articles were deemed relevant based on the screening of titles and abstracts. These 239 articles were further assessed for eligibility based on full texts, after which 92 articles met all inclusion criteria.

Data Extraction
The quality of the included studies was assessed by two reviewers (HJ and YW), and any discrepancies were resolved with the involvement of a third reviewer (FV) according to the Effective Public Health Practice Project (EPHPP) Quality Assessment Tool for Quantitative Studies (Thomas et al., 2004). This tool has been used extensively in meta-analyses and reliably assesses randomisation, blinding and measures of variability (Armijo-Olivo et al., 2012). Each study was rated as weak, moderate or strong on six components: selection bias, study design, confounders, blinding, data collection methods, withdrawals and drop-outs. As recommended (Thomas et al., 2004), studies were given an overall score of weak, and considered to have a high risk of bias, if two or more components were rated as weak. Studies were given an overall score of moderate, and considered to have a medium risk of bias, if fewer than four components were rated as strong and one component was rated as weak. Studies were given an overall rating of strong and considered to have a low risk of bias, if four or more components were rated as strong and no component was rated as weak. In addition, the certainty of the evidence was assessed at an outcome level applying the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach (Schünemann et al., 2009). The following items were considered: risk of bias, inconsistency of results, indirectness of evidence, imprecision and publication bias. Data extraction was completed by two reviewers (HJ and YW) and checked by another reviewer (FV). Unclear cases were resolved through discussion with a third reviewer (ID). The inter-rater reliability score (Cohen's kappa) was 80.7%. A standardised data extraction form was used to record the following: (1) general information (authors, country, study design and randomisation method), (2) participant demographics (mean age, age range, sex distribution and sample size), (3) interventions characteristics (type, frequency, duration, length, intensity, description of the instructor/level of expertise and setting), (4) outcome measures (construct, instrument, ranges of possible scores and interpretations, timing of administration and method of administration), (5) descriptive statistics of the outcome measures (sample size for each measurement, mean/median, SD) or alternate metric of effects (e.g. Cohen's d, mean differences) for each available time point.
An ecological approach was taken when deciding categories for the duration of an intervention. Primary schools in the UK, USA, Australia, New Zealand and Scandinavia have similar holiday patterns with a break approximately every 6 weeks which can allow for shorter interventions. Therefore, duration categories were classified as ≤ 6 weeks, 7-10 weeks, 11-24 weeks or ≥ 25 weeks. The frequency categories of PA sessions per week were based on various government guidelines for PE provision for primary school children recommending a range of one to three sessions per week (Cleland et al., 2008;Davies et al., 2019;Lee, 2007). The frequency of sessions was categorised as 1, 2, 3 or > 3 sessions per week. The bout of each session (length of one teaching session in minutes) was categorised as < 20 min, 20-44 min, 45-60 min or > 60 min. Classification of teacher qualification was based on a recent meta-analysis investigating the quantity and quality of PE interventions during childhood (Sember et al., 2020). Classroom teachers and researchers were classified as having lower professional qualifications in relation to PE teaching skills, whilst exercise science researchers and PE teachers were considered as having higher professional qualifications. This classification is supported by research identifying a gap in specialised PE training of classroom teachers (Wright et al., 2020). The intensity of PA was categorised based on ratings provided in the studies and is known in sports literature: low, low to moderate, moderate, moderate to vigorous or vigorous (Nettlefold et al., 2011;Tanaka et al., 2018). Finally, the duration of effects was categorised as short term (< 2 weeks), medium term (2-12 weeks), and long term (> 12 weeks). We considered immediate effects within a 2-week period due to the difficulty of collecting data immediately post-intervention in a school setting.

Data Analysis
Statistical analyses were performed in R (v.3.3.2) using the rma.mv function of the metafor package (Viechtbauer, 2010). Analyses were first conducted grouping EF outcomes and AA outcomes and assessing summary effect sizes for fluid intelligence, creativity and on-task behaviour measures. Then, further analyses considered each EF and AA sub-domain separately: mathematics, language, on-task-behaviour, creativity, attention, cognitive flexibility, inhibitory control, planning and working memory. Meta-analyses were conducted when there were at least two studies investigating the same outcome (Higgins et al., 2022;Valentine et al., 2010). A three-level multilevel meta-analytic approach was used to handle non-independent effect sizes and nested effect sizes, e.g. when including more than one measure from a single study (Cheung, 2019). This is a deviation from our planned analyses; we decided to adopt this novel approach which allows the inclusion of a larger number of individual effect sizes to strengthen research on the impact of PA interventions.
Twelve studies included three conditions. For three studies, we included in the analyses the comparison between the PA intervention and the active control condition (which was sedentary), rather than the passive control condition, to control for effects associated with taking part in a novel intervention (Beck et al., 2016;D'Souza & Wiseheart, 2018;Frischen et al., 2019). Four further studies included two PA intervention conditions and a sedentary (or passive) control condition; in this case, we included in our analyses the comparison of each PA intervention against the control condition ( Barnard et al., 2014;Egger et al., 2019;Gallotta et al., 2015;Koutsandreou et al., 2015). The remaining four studies included two PA intervention conditions and a PE business as usual control condition; again, we included in our analyses the comparison of each PA intervention against the PE control condition (DeBruijn et al., 2020;Oppici et al., 2020a, b;Pesce et al., 2013;Schmidt et al., 2015).
Effect sizes were obtained by first calculating Cohen's d, using means or standardised betas and the pre-test pooled SD as it reduces sampling error (or SD transformed from SE, or 95% CI if the SD was not available) (Morris, 2008). The formula used for Cohen's d calculation was M 1 − M 2 /SD pooled (where SD pooled = √[(SD 1 2 + SD 2 2 )/2]), where 1 represents the intervention group and 2 represents the control group. To correct for small sample sizes, effect sizes were then transformed into Hedges' g using the formula: Cohen's d × [1 -3 / (4 × [n 1 + n 2 ] -9)] (Cumming, 2012). Only studies with sufficient data to calculate SD were included in this review.
In a first set of analyses, multi-level meta-analyses investigated whether PA interventions, combining across types of interventions, could benefit EF outcomes (combining working memory, inhibitory control, cognitive flexibility and attention), AA outcomes, fluid intelligence, creativity or on-task behaviour. Analyses then tested whether the type of control group ((i) physically active vs. sedentary, or (ii) business as usual (which could be physically active such as PE) vs. intervention) or mode of testing (observer report, paper and pencil test or computerised test) explained any variance in effect sizes across studies for each outcome. Finally, analyses investigated cognitive and academic achievement sub-domains separately.
In a second set of analyses, exploratory meta-regressions were conducted to investigate whether differences in (a) duration of intervention, (b) frequency, (c) intensity, (d) session duration, (e) teacher qualifications, (f) duration of effects, (g) mean age of participants and (h) type of activity affected the impact of PA interventions.
Effect sizes were classified as small (0 ≤ g ≤ 0.50), moderate (0.50 < g ≤ 0.80) or large (> 0.80) (Higgins et al., 2022). Influential studies were identified by calculating Cook's distance and using F 0.5 (p, n -p) as a cut-off and by checking the Cook's distance plots (Cook, 1977). Sensitivity analysis was completed by removing the influential studies from the analysis (Harrer et al., 2021). Publication bias was assessed by using the Luis Furuya-Kanamori (LFK) index and the Doi plot, which are considered stronger than the funnel plot and Egger's test and can be used for meta-analyses with fewer than 10 studies (Furuya-Kanamori et al., 2018;Harrer et al., 2021). Major asymmetry is present when the LFK index is greater than 2 (Furuya- Kanamori et al., 2018). Heterogeneity of each effect was assessed using the I 2 and the tau-squared (τ 2 ) test. Confidence intervals are reported to identify bias due to small samples and categorised as low (< 30%), moderate (50%) and high (> 75%) when reporting heterogeneity for I 2 (Ioannidis et al., 2007;von Hippel, 2015). Substantial heterogeneity is evident if τ 2 is above 1.

Results
This results section provides a description of study characteristics (n = 92), followed by intervention characteristics and meta-analytic findings. The details of the search results are summarised in Fig. 1.

Study Characteristics
A total of 92 studies were included in the meta-analysis. There were 14 studies which included two control groups (active and passive control groups), which allowed for 106 control groups to be included in the analysis, with a total of 25,334 participants. All studies which reported sex (93% of studies) included both sexes (mean percentage girls = 49.7%). Studies were conducted in 25 countries: the USA (n = 20), Italy (n = 10), Netherlands and Australia (n = 9), UK (n = 8), China and Denmark (n = 4), Norway, Spain and South Africa (n = 3), Brazil, Canada, Germany, India, Portugal, Sweden, Switzerland (n = 2), Chile, Croatia, FYR Macedonia, Japan, Mongolia, New Zealand, Poland and Tunisia (n = 1). Over one third of the studies (n = 26) included PA with academic instruction, whilst another third of studies implemented sports (n = 12), aerobic (n = 21) or cognitively enriched (n = 15) interventions. The remaining third of studies applied a mixture of dance (n = 9), PE interventions (e.g. increase in duration of or intensity of practice or reduction of sedentary time in PE classes, n = 6) and holistic movement practices (n = 3). Participants in the control groups took part in physical activity (e.g. PE, n = 44) or were sedentary (n = 53) and took part in an intervention (active control, n = 38) or not (passive control, n = 59). Studies which implemented PA breaks were classified as aerobic. Detailed characteristics of the studies included are provided in the Online resource Table S2.
The studies used a range of objective outcome measures, which are listed in Table 3.

Risk of Bias
Results of the risk of bias assessment, using the EPHPP Quality Assessment Tool for Quantitative Studies (Thomas et al., 2004), are shown in Fig. 2. The overall risk of bias varied but was generally low, see Online resource Fig. S1 for risk of bias by study. However, outcome assessors were blinded to the intervention condition in only six studies. Whilst in an educational context it can be difficult to blind children and teachers to the conditions they are in, there are fewer practical constraints to blinding the outcome assessors, beyond manpower. The included sample was representative of the general population in 16 out of the 92 studies.

Meta-Analysis of Overall PA Intervention Effects
Overall, 270 effect sizes on objective outcome measures were extracted and included in the meta-analyses. A total of 45 studies included EF outcomes, and 47 studies included AA outcomes, resulting in a total of 127 and 114 effect sizes, respectively; there were five studies with on-task behaviour measures resulting in six effect sizes, eight studies with fluid intelligence measures resulting in nine effect sizes, and five studies with creativity outcome measures resulting in 14 effect sizes (Table 4).   When combining across sub-domains, three-level meta-analysis indicated that PA interventions did not benefit executive function measures (g = 0.09, p = 0.24 (95% CI: − 0.06-0.25), very low-certainty evidence) or academic achievement (g = 0.14, p = 0.14 (95% CI: − 0.04-0.31), moderate-certainty evidence) (Table 4). Before running meta-analyses for each sub-domain, we investigated whether the type of control group ((a) physically active vs. sedentary, or (b) business as usual (which could be physically active such as PE) vs. intervention) or mode of testing (observer report, paper and pencil test or computerised test) explained any variance in effect sizes across studies. For AA, the type of control group (a : F(1, 112) = 3.13, p = 0.08; b: F(1, 112) = 21.03, p = 0.31) and mode of testing (F(2, 111) = 0.24, p = 0.79) did not contribute to the results. Similarly, the type of control group (a: F(1, 125) = 0.50, p = 0.48; b: F(1, 125) = 0.179, p = 0.18) and mode of testing (F(4, 122) = 1.41, p = 0.23) did not influence the effect observed on EF outcomes. As heterogeneity was high, further analyses focused on each sub-domain of cognitive and academic outcomes separately.
The three-level meta-analytic model showed small significant overall positive effects of PA interventions on fluid intelligence and working memory, as well as a large positive effect for creativity and on-task behaviour (Table 4). There were small but not significant beneficial effects of PA on attention, cognitive flexibility, inhibitory control, planning, mathematics and language sub-domains (Fig. 3, see forest plots in Online resource Figs. S2-S6). The type of control group or mode of testing did not influence these results. Results are presented in more detail for each subdomain showing a significant meta-analytic effect below.
Working memory studies reported a range of effect sizes (Fig. 4). Over 50% of studies used a highly trained teacher to implement the intervention twice per week in sessions lasting 45-60 min, with all results recorded immediately after the intervention took place. The 30 studies had a range of PA taught including holistic movement practices (n = 1), dance (n = 1), various sports (e.g. gymnastics, capoeira, tennis or football n = 6), aerobic (n = 6), PA with academic instruction (n = 4) and enriched programme (n = 6). Based on the GRADE approach (Online resource Table S3), it is uncertain that PA interventions improve working memory (very low-certainty evidence).
For creativity, studies tended to have a moderate or large positive effect (Fig. 5). Combined, they led to a moderate significant beneficial summary effect. Interventions implemented dance (n = 2) and enriched activities (n = 3). The type of control group (a: F(1, 12) = 3.22, p = 0.10; b: F(1, 12) = 0.03, p = 0.86) and mode of testing (F(1, 12) = 1.41, p = 0.26) did not contribute to the results. Based on the GRADE approach (Online resource Table S3), there is low-certainty evidence that PA interventions improve creativity.
Fluid intelligence studies reported mostly small positive effects (Fig. 6). All of the interventions bar one were taught by a highly trained professional. The eight studies had a range of PA taught including holistic movement practices (n = 2), dance (n = 1), aerobic (n = 2), PA with academic instruction (n = 2) or cognitively enriched programme (n = 1). Based on the GRADE approach (Online resource Table S3), there is moderate-certainty evidence that PA interventions improve fluid intelligence.   The outcome that benefited most from PA interventions was on-task behaviour performance with a large significant summary effect size favouring the intervention (Table 4, Fig. 7). Interventions were split 50:50 in their intensity between MPA and MVPA; two of the five studies were taught by highly trained professionals (Lakes & Hoyt, 2004;Mavilidi et al., 2018a, b); most studies (four out of five) used a sedentary control group and trained PA with academic instructions. Results for ontask behaviour should be interpreted with caution because of various limitations, there was high heterogeneity and the combined effect was in part driven by smaller   studies with large effect sizes (Fig. 7). Based on the GRADE approach (Online  resource Table S3), it is uncertain that PA interventions improve on-task behaviour (very low-certainty evidence).

Sensitivity Analyses
Sensitivity analyses were carried out on all domains of cognition and academic performance that had enough effect sizes not considered as outliers (> 2). Cook's distance was used to identify potentially influential studies for each outcome measure (see Online resource Figs. S7-S8). In the EF domain, removing influential studies had a large impact on estimated between-study heterogeneity for cognitive flexibility and planning, which reduced to lower rates (I 2 : 0-51.7%) ( Table 4). The influential studies excluded from the cognitive flexibility (Hillman et al., 2014) and planning analyses (Tottori et al., 2019) applied aerobic activity, the latter in the form of highintensity training. For attention, the influential study (Gallotta et al., 2015) applied cognitively enriched PA. It had a large effect size, and when it was removed, both within-study and between-study heterogeneity were much reduced. Heterogeneity could not be explained by sensitivity analysis for inhibitory control or working memory. This may have been driven by unexplored characteristics of the intervention, such as psychosocial variables, or differences in the tasks and environments that encourage a boost to cognition (Vazou et al., 2019).
Removing influential studies from overall academic performance, mathematics and language did not have a large impact on estimated heterogeneity indicating that the variability in effect sizes is due to true effect size differences between studies (Table 4). The effect sizes for overall AA, mathematics and language remained within the same orders of magnitude as initial estimates. For mathematic performance, the sensitivity model explained the within-study heterogeneity of our results, but between-study heterogeneity remained high once the influential study was removed. Removing influential studies had a large impact on heterogeneity for  Fig. 7 Forest plot for on-task behaviour creativity, reducing to zero with no overall effect on the results. An important caveat is that I 2 is unstable where k < 7 (Ioannidis et al., 2007;von Hippel, 2015).

Multi-Level Regressions Testing for Potential Moderators
Follow-up three-level meta-regression analyses were conducted on all outcomes, with and without influential effect sizes, to check whether between-study differences in effect sizes may have been driven by characteristics of the interventions: mean age, frequency, intervention duration, intensity, session duration, teacher qualifications, type of activity and study quality were entered as moderators in separate analyses (see Online resource Tables S4-S7). A small significant positive effect with narrow 95% CI and low to moderate heterogeneity were identified for EFs when PA was of moderate to vigorous intensity (g = 0.21 (95% CI: 0.07-0.34), n = 63 effect sizes), and a small effect was also specifically observed for aerobic interventions (g = 0.16 (95% CI: 0.03-0.28), n = 33) (Online resource Table S4). Moderator analyses were also ran on EF sub-domains and no specific effects were found, whether influential effect sizes were included or not.
For creativity, the lower number of studies meant that not all categories could be considered. A significant beneficial effect of medium size was observed for interventions that occurred once a week (g = 0.47 (95% CI: 0.31-0.62), n = 9) (Online resource Table S6). The effect of PA interventions on fluid intelligence was not associated with the characteristics of the intervention considered here (Online resource  Table S7). There was an insufficient sample size to run a moderator analysis for ontask behaviour.
Importantly, PA interventions effects on outcomes were not associated with quality rating, whether influential effect sizes were included or not (Online resource  Tables S4-S7).

Robustness and Publication Bias
Analysis of the LFK indices and Doi plots indicated that publication bias ranged from none to major evidence of bias (see Online resource Fig. S9-S10). There was evidence of publication bias for three outcomes which showed statistically significant summary effects of PA (fluid intelligence, on-task behaviour and working memory), and results should therefore be interpreted with caution. Other outcomes which had evidence of publication bias include inhibitory control, creativity and planning; however, the latter outcome had very small sample sizes (k < 4) (Thornton, 2000). Moderator analysis for publication bias suggests that PA interventions effects on outcomes were not associated with publication bias, whether influential effect sizes were included or not (Online resource Tables S4-S7).

Discussion
This multi-level meta-analysis of 92 studies showed that physical activity interventions in childhood could lead to small benefits in cognition and academic performance. Benefits were more specifically observed in working memory and fluid intelligence, with small summary effect sizes, moderate effect size for creativity and a large effect size for on-task behaviour. Moderator analyses indicated that executive functions benefitted from aerobic and/or moderate to vigorous physical activity, whilst mathematics achievement benefited from regular (3 times a week) 20-44 min long moderate to vigorous PA combined with academic instruction, and language achievement from enriched PA programmes. The multi-level approach allowed us to study within-and between-study heterogeneity and include a greater number of effect sizes than in previous work. The present research therefore extends previous findings in recent similar meta-analyses, which reported that PA benefits different aspects of cognition and academic performance (Álvarez-Bueno et al., 2017a;Álvarez-Bueno et al., 2017b;de Greeff et al., 2018;García-Hermoso et al., 2021;Takacs & Kassai, 2019;Vazou et al., 2019;Watson et al., 2017).

Executive Functions
Across all PA interventions, we found no evidence that PA benefitted executive function. However, moderator analyses indicated that moderate to vigorous PA interventions, and/or aerobic interventions, had significant positive effects, with small effect sizes. Previous meta-analytic studies had mixed results and differed in the type of PA interventions they included. de Greeff et al. (2018) considered a narrow selection of aerobic and cognitively enriched PA interventions and found small beneficial effects on EF, but did not separately report effects of aerobic PA on EF. Other studies included a broader range of PA such as yoga and taekwondo (Álvarez-Bueno et al., 2017a, k = 21;García-Hermoso et al., 2021, k = 7;Vazou et al., 2019, k = 21;Xue et al., 2019, k = 19) and reported small significant effects on overall EFs. Two of these studies identified small to moderate effects on EFs when practicing aerobic exercise (Vazou et al., 2019, k = 2;Xue et al., 2019, k = 10). The latter categorised aerobic interventions into a sports category, with eight of the ten studies applying aerobic activities (Xue et al., 2019). Only one meta-analysis which included a variety of aerobic and cognitively engaging physical activities did not find effects on overall EFs in typically developing children (Takacs & Kassai, 2019, k = 12).
Notably, meta-analyses on this topic applied pooled effect sizes and/or random effects models, which do not allow all the outcomes available in each study to be analysed and ultimately result in a smaller sample size. The present study used a three-level meta-analytic approach, and a wider variety of PA interventions were included in our meta-analysis, with recent publication of intervention research assessing dance, creative movement and gymnastics (k = 8, effect sizes = 35). The effect of PA on EF was not moderated by other variables such as session length, duration of an intervention, frequency of practice, teacher qualification or type of activity. Frequency of practice, session duration and type of activity were found to matter in a previous meta-analysis (Xue et al., 2019;k = 19), although the number of studies included was considerably smaller than in the present meta-analysis.
Whilst we considered a range of executive function measures, namely attention, cognitive flexibility, inhibitory control, working memory and planning, working memory was the only EF sub-domain that showed significant gains after PA interventions in childhood. This small beneficial effect found on working memory is consistent with most meta-analytic evidence (Álvarez-Bueno et al., 2017a;de Greeff et al., 2018). The studies included here in the working memory summary effect included a wide variety of activities. Between-study heterogeneity was low, but within-study heterogeneity was high and not explained by influencing effect sizes. The greater impact of PA interventions on working memory than on other aspects of EFs may be driven by the greater reliability of working memory measures across development compared to measures of attention, inhibition or cognitive flexibility (Ahmed et al., 2019). Importantly, much fewer studies have included planning, attention or cognitive flexibility than working memory or inhibitory control as outcome measures.
Only four studies included planning as an outcome measure; the two other meta-analyses which looked at planning similarly did not find benefits of PA (de Greeff et al., 2018, k = 4, including two studies on children experiencing obesity; Xue et al., 2019, k = 2). Two previous meta-analyses assessed the effects of PA on attention. One meta-analysis, which included a study with adolescent participants not included in the present study, found benefits of PA on attention (k = 6, Álvarez-Bueno et al., 2017a). The other meta-analysis reported a large significant summary effect of PA on attention; however, it only included two effect sizes from one study (de Greeff et al., 2018). Past meta-analyses of PA effects on inhibitory control had a lot fewer studies than the present meta-analysis, because of a recent increase in publication on this topic. Two of these meta-analyses focusing on childhood did not find benefits of PA (k = 7, de Greeff et al., 2018;k = 9, Takacs & Kassai, 2019). Two other meta-analyses which included around 30% of studies on adolescent samples reported a small significant summary effect for inhibition, suggesting the impact on inhibition of PA interventions may be greater in adolescence than childhood (k = 12, Álvarez-Bueno et al., 2017a;k = 15, Xue et al., 2019). Although these two studies did not find age as a continuous variable associated with the size of effects on inhibition, comparing children and adolescent samples may indicate differential impacts of PA on inhibitory control in future studies. A fifth meta-analysis focussing on inhibition, which reported a small significant summary effect, only included eight studies (Jackson et al., 2016). Finally, whilst one meta-analysis did find a small significant effect on cognitive flexibility (k = 4, de Greeff et al., 2018), other meta-analyses found small non-significant benefits (k = 4, Álvarez-Bueno et al., 2017a, k = 8, Takacs & Kassai, 2019k = 6, Xue et al., 2019), which is consistent with our results. It has been suggested that non-neurotypically developing children benefit more from PA interventions in the sub-domain of cognitive flexibility (Takacs & Kassai, 2019;Welsch et al., 2021).
Considering PA interventions involve the body, it might be more ecologically valid to use lab tests of EF which are physical (Doebel, 2020). For example, in the studies included in our meta-analysis, the Stroop test was a common measure of inhibition. An alternative test that may be more appropriate is the subtest 'statue' from the NEPSY-II, which was used in one study in this meta-analysis (Frischen et al., 2019). The test requires participants to hold a body position with eyes closed for 75 s and to not react to sound distracters (Anagnostou et al., 2013). As freezing in a position is often used in PE games, the statue test may provide a measure of near transfer reflecting gains in inhibitory control after PA interventions (D' Souza andWiseheart, 2018, Diamond, 2012).
The EFs included in this meta-analysis have come to be known as 'cool' EFs because they are used in conditions that are emotionally neutral (Holfelder et al., 2020). Tasks measuring cool EFs include lab tasks and are decontextualised from a real-life setting; indeed, studies included in meta-analyses mainly used standardised pencil or computer lab tests (Carlson & Moses, 2001). In contrast, 'hot' EF tasks include an emotional or reward component. For example, a child may be asked to refrain from touching an appealing unattended toy for a period of time (Doebel, 2020). Tasks that measure hot EF processes include gambling tasks for adults (e.g. Bechara et al., 1994) and delay reward tasks for children (e.g. Prencipe & Zelazo, 2005), and so far have not been given much attention in studies assessing the impact of PA interventions on cognition in children. To bridge the gap between neuroscience and physical activity, research could include the socio-emotional experience in a real-world setting like education (Zelazo & Carlson, 2012) by incorporating tasks which measure hot EF processes (Pesce et al., 2021b). The movement literature has recently started moving towards this approach (e.g. Condello et al., 2021a, b;.

Fluid Intelligence
A small beneficial effect of PA on fluid intelligence was observed in this meta-analysis. The number of studies assessing fluid intelligence was low (k = 8), and the PA types varied; however, they were mostly taught by a trained professional, heterogeneity was low, and the finding was overall found to have moderate certainty. Experimental tasks which are thought to best capture fluid intelligence, such has visuospatial matrix reasoning tasks, recruit brain regions closely overlapping with the multi-demand network. This network shows increased activation in a range of cognitively demanding tasks and has been proposed to organise goal-directed behaviour (Duncan, 2010;Duncan & Owen, 2000). The complexity of the tasks and games children engage in during PA interventions, which can include learning and following new set of rules (Duncan, 2010), may foster fluid intelligence skills. Another possible mechanism is that improvements in working memory mediate fluid intelligence gains (Kyllonen & Christal, 1990, Fuster, 2015; however, more work needs to be done to confirm this relationship (Melby-Lervåg & Hulme, 2013).

Creativity
A moderate benefit of PA for creativity was observed, with low-certainty evidence. These results are consistent with previous meta-analytic studies (Álvarez-Bueno et al., 2017a;Rominger et al., 2022), although the latter focussed on mainly an adult population. It has been previously suggested that dopamine levels, which are thought to be affected by physical activity (Knab & Lightfoot, 2010;Meeusen & De Meirleir, 1995), influence divergent thinking (Chermahini & Hommel, 2010;Kulisevsky et al., 2009;Zabelina et al., 2016) through the default mode network in the brain (Beaty et al., 2014;Buckner et al., 2008;Dang et al., 2012;Kühn et al., 2014;Nagano-Saito et al., 2009). Our study found that creativity was moderated by frequency of practice. A moderate beneficial effect on creativity was found when practice occurred once per week instead of three times per week. Children might find it overwhelming to practice PA more than once per week and therefore does not transfer to creative performance. Alternately, the fact that the dose-response moderators were not found to be an important contributor for creativity may be because there are other mechanisms at play. All PA interventions that assessed creativity as an outcome applied creative movement in a PE or dance context, suggesting that gains in creativity may be driven by near transfer effects of creatively focussed movement practices. Creative PA and its effects on creativity performance are further explored in our companion paper (Vasilopoulos et al., in press).

On-task Behaviour
Consistent with previous findings, our results showed that PA interventions had a strong effect on students' on-task behaviour (Álvarez-Bueno et al., 2017b, k = 5;Watson et al., 2017, k = 5). Whilst relatively few studies (k = 5) assessed on-task behaviour, this result suggests that observer-based on-task behaviour measures may be more able to detect gains due to PA interventions than computerised or penand-pencil tasks. On-task behaviour may be a sensitive measure because it reflects a combination of cognitive abilities (e.g. sustained attention), self-regulation and engagement (Mavilidi et al., 2018a, b). This group of studies used consistent instruments (observer reports), education contexts (Australia or USA) and types of interventions (mainly PA with academic instructions). It has been suggested that physically active classes can offer a way for pupils to engage in academic content and therefore keep them on task (Watson et al., 2017). It could be that the release of adrenaline (epinephrine and norepinephrine) associated with PA boosts children's alertness and leads to a better focus on learning (Jensen, 2000;Taras, 2005). It could also be that regular PA fulfils a need to move in young children, and so when they have moved enough, they are better prepared to stay on task on academic tasks. However, a limitation of these results was that there was high heterogeneity, which could not be explained by sensitivity analysis or characteristics of the intervention, and the combined effect was in part driven by smaller studies with large effect sizes. Nevertheless, we suggest observer-rated on-task behaviour is a promising outcome measure of PA interventions, and further work should assess whether improvements in on-task behaviour may mediate improvements in cognitive task performance and academic achievement (Mavilidi et al., 2020).

Academic Achievement
There was no significant overall benefit of physical activity interventions on academic achievement. However, as for EF, moderation analyses indicated that intervention with certain characteristics did lead to beneficial effects, mostly in mathematics. Importantly, a different pattern was observed than for EF, suggesting that different types of PA interventions benefit different aspects of cognition and academic outcomes. Whilst EF showed significant benefits from moderate to vigorous and/or aerobic PA, mathematics achievement showed benefits from frequent (3/ week) moderate or moderate to vigorous PA with academic instruction, and language achievement showed benefits from enriched PA programmes. A beneficial effect of only certain types of PA for mathematics performance is consistent with the mixed results found in previous meta-analyses which also looked at the effects of PA interventions on mathematics achievement in typically developing primary school-aged children (Álvarez-Bueno et al., 2017b, k = 16;Sneck et al., 2019, k = 11;de Greeff et al., 2018, k = 2).
Our analysis extends previous meta-analytic research on mathematics outcomes by including a significantly larger number of studies (k = 42) and effect sizes (54) with more varied physical activities and suggests specific benefits of PA with academic instruction for mathematics performance. The interventions varied in the extent to which mathematics were embedded in the physical activity. Future research could try to differentiate different types of PA with academic instruction to identify whether the observed beneficial effects may be driven by near transfer (increased mathematics instruction) or cumulative effects of temporary increased blood flow and oxygenation to the brain on mathematics practice and learning.
With regard to language, the present meta-analysis extended previous work evidence by including the results of 15 studies published since 2018 (Álvarez-Bueno et al., 2017b;de Greeff et al., 2018) and 28 studies published since 2011 (Fedewa & Ahn, 2011) amounting to 33 additional effect sizes to be used applying multilevel analysis. The lack of an overall significant summary effect confirms the findings of previous meta-analyses of PA interventions in typically developing children (Álvarez-Bueno et al., 2017a, b;de Greeff et al., 2018). However, the type of activity was found to moderate language performance. Specifically, enriched PA had a small beneficial effect. This could be because enriched practices focus on the learner's ability and scaffold PA practices to match their level. This allows a child to feel safe and interact with the PA environment physically and verbally, developing language ability.

Strengths and Limitations
A strength of this study is that it is the first meta-analyses using hierarchical modelling (Cheung, 2019), which allowed the modelling of nested variables and the inclusion of all outcomes, increasing the accuracy of results. Furthermore, exploratory moderator analyses were included to understand which aspects of PA interventions (e.g. teacher qualification, duration, frequency) may influence cognitive and educational outcomes. We also assessed the influence of sedentary vs. physically active control groups as well as active vs. business-as-usual control groups. In the current study, we specifically chose to focus on real-life settings (e.g. school-based interventions) which helps with generalisability. Only studies that used objective instruments of cognitive and academic performance were included, avoiding possible bias issues with self-report data. A limitation is that significant evidence for publication bias was found for three outcomes which had statistically significant effects (fluid intelligence, on-task behaviour and working memory).
Other limitations should be recognised when interpreting the results. Approximately half of the studies in our meta-analysis had small sample sizes consistent with feasibility studies which do not have the power to show a statistically significant improvement (Green et al., 2019). Indeed, there are barriers to the implementation of school-based PA intervention, such as time constraints and curriculum demands on teachers and pupils, and teacher training requirements of interventions (Naylor et al., 2015). Although the risk of bias was generally low, most studies (92%) did not implement blinding of outcome assessment and intervention delivery, which can be difficult to implement within an education context. Participant samples were also mostly not representative of the target population (79%). Only 18 of the included studies (24%) were preregistered trials. Furthermore, the certainty of the evidence was limited for most of the outcomes assessed here. Heterogeneity was also high and not always explained by influencing studies or main characteristics of the intervention variation. Other moderators not explored in our study which may have impacted the results include characteristics of the intervention, such as psychosocial variables (e.g. motivation, enjoyment), or differences in the tasks and environments that encourage a boost to cognition (Vazou et al., 2019). It may be the qualitative aspect of the interventions (e.g. teaching strategies and pedagogical practices) that play a role in influencing cognitive and educational outcomes (García-Hermoso et al., 2021, Diamond, 2012. In a companion paper, we further investigated whether other characteristics of the PA intervention, namely the extent to which interventions fostered creativity through, e.g. group activities, varied tasks, open-ended instructions, may better explain differences in effect sizes (Vasilopoulos et al. in press). Initially, we planned to consider moderating effects of children's ethnicity and socioeconomic status due to the link between socioeconomic status, PA participation and achievement (Vasilopoulos & Ellefson, 2021). However, these analyses could not be carried out due to the limited availability of information reported in each study.

Conclusion
Using hierarchical modelling, which allowed the analysis of nested variables and the inclusion of all outcomes, increasing the accuracy of results, this study updated previous meta-analytic findings of the effects of PA on cognition and academic performance in pre-adolescent children. Notwithstanding the low grade of evidence, benefits were found to be largest for on-task behaviour, followed by creativity, working memory and fluid intelligence. Moderation analyses indicated specific beneficial effects of aerobic PA for EF, PA with academic instruction for mathematics, and enriched PA for language outcomes. Heterogeneity was high and not fully explained by influencing studies or standard intervention characteristics. Considering how physical activity is taught (task and environment) may give a better understanding behind improvements in educational outcomes. Our results support the argument that sufficient time for physical education should be provided in school, as PA leads not only to health-related benefits, but also has a range of beneficial effects on cognition, creativity and on-task behaviour and more specifically academic achievement. Importantly, however, schools may want to tailor the type of PA carried out to improve specific outcomes. Researchers could start to incorporate tasks which measure 'hot' EF processes to better reflect the real-world context which physically active interventions are implemented in.