The Effectiveness of Interventions on Sustained Childhood Physical Activity: A Systematic Review and Meta-Analysis of Controlled Studies

Background Increased physical activity (PA) has been associated with a reduction in non-communicable disease risk factors and outcomes. However, interventions to increase childhood PA typically produce small to negligible effects. Recent reviews are limited due to lack of post-intervention follow-up measurement. This review aimed to examine measured effects at least six months post-intervention. Methods and Findings We searched PubMed, MEDLINE, EMBASE, PsychINFO, ScienceDirect, SportDiscus and Google Scholar between 1st January 1991 and 1st November 2014 for controlled studies reporting six-month post-intervention measurement for children aged 5 to 18 years. 14 studies met inclusion criteria; 12 reported moderate-to-vigorous PA (MVPA) (n = 5790) and 10 reported total PA (TPA) (n = 4855). We calculated overall effect estimates and 95% CI’s using random effects modelling with inverse variance weighting. Mean difference was calculated for MVPA, with standardised mean difference calculated to TPA due to measurement variation. Meta-regression assessed heterogeneity by continuous level variables. Negligible mean difference in MVPA existed in favour of the intervention group, amounting to 1.47 (95% CI -1.88, 4.82) mins/day compared to controls, while no difference was recorded on TPA. Sub-group analyses revealed males (2.65 mins/day: 95% CI 2.03, 3.27) reported higher levels of MVPA than females (-0.42 mins/day: 95% CI -7.77, 6.94), community settings (2.67 mins/day: 95% CI 2.05, 3.28) were more effective than school settings (1.70 mins/day: 95% CI -4.84, 8.25), and that treatment (4.47 mins/day: 95% CI -0.81, 9.76) demonstrated greater effects than population approaches (1.03 mins/day: 95% CI -2.54, 4.60). Meta-regression revealed no significant differences by factor on pooled effects. Significant heterogeneity existed between studies and potential for small study effects was present. Conclusions Improved PA levels subsequent to intervention were not maintained six month post-intervention. A potentially useful avenue of future research is to specifically explore community treatment of high risk individuals. Review Registration PROSPERO CRD42014007545

The inception of many of the above risks have been observed as commencing in childhood [16,17], with a lack of PA leading to impaired childhood health outcomes [18], increased risk factors and subsequent ill-health outcomes in adulthood, and a compromised attitude towards PA [16,19]. PA behaviour tracks 'reasonably well' across time, although stability reduces in adolescence and periods of transition [20]. In addition, evidence indicates that PA levels enter a broad decline in later childhood and adolescence [21], resulting in insufficient levels of PA during transition into adulthood [22,23].
The effectiveness of interventions to increase childhood PA has been systematically reviewed; specifically investigating preventative [24,25], treatment-based [26], school-based [27,28] and community-based studies [29] as well as comparative policy reviews [30]. The magnitude of measured effects on levels of PA following intervention has typically been small and, when taking into account consistently high levels of heterogeneity, risk of small sample bias and an over-reliance on self-report measurement, caution is essential when interpreting positive findings. In addition, reviews typically included studies reporting measurement of PA or sedentary behaviour within limited times of day (e.g. school recess, travel time or afterschool period), thereby failing to account for potential substitution [31].
These shortfalls were partially addressed in a recent systematic review by Metcalf, Henley and Wilkin [32] who investigated the effectiveness of interventions on levels of childhood PA across 30 controlled studies. Meta-analyses revealed only small-to-negligible effect on levels of Moderate to Vigorous Physical Activity (MVPA) and Total Physical Activity (TPA) immediately following intervention as measured by accelerometry, highlighting the potential for selfreport bias in previous reviews and the importance of drawing data from studies specifically reporting whole-day PA [32]. However, with the exception of Lai et al. [28], which focused exclusively on school-based interventions, published reviews provide little detail regarding the maintenance of effects on whole-day PA in children and therefore do not account for the potential effects of habit formation [33] and stage of change [34].

Aims
Given the shortfall in the literature, the primary objective was to conduct a systematic review to explore the effect of interventions on maintained whole-day childhood PA, including studies that measured physical activity level with either accelerometers or questionnaire. Furthermore, it was necessary to explore sustained effect sizes following a period of at least six months postintervention.

Search Strategy
The search encompassed PubMed, MEDLINE, EMBASE, PsychINFO, ScienceDirect, Sport-Discus and Google Scholar (first 1,000) for studies published between January 1991 and November 2014. Reference lists of included studies and relevant published reviews were hand searched for additional studies. Only English terms were used and only English language studies were included (see Table 1).

Study selection
Peer-reviewed studies were included if they utilised a trial design incorporating a non-PA control group, irrespective of whether randomisation was used. No restriction was applied regarding intervention duration, delivery personnel or setting. Inclusion required an intervention(s) targeting PA levels in non-clinical children or adolescents aged between 5-18 years inclusive. Studies must have utilised a measure of MVPA or TPA spanning at least two domains of physical activity obtained either by objective measurement or validated self-report measure. Finally, studies must have presented follow-up measurement data at least six months post-intervention for the same participants measured at baseline and included at least 50% follow-up measurement rate from baseline.
The lead researcher (JS) examined the titles of all studies identified from the initial database results and excluded all publications that were unambiguously irrelevant and duplications. Abstracts were then examined by the lead researcher (JS) and allocated to 'relevant,' 'irrelevant' and 'undecided' groups, with all undecided studies discussed with a second researcher (PS) and resolved through discussion. Full text articles were then accessed and reviewed by the lead researcher (JS), with the second researcher (PS) cross-checking all included studies and the third researcher cross-checking a 10% sample of excluded studies (CF).

Data extraction and standardisation
We extracted author(s), project title, nation, design, inclusion criteria, randomisation procedure where applicable, intervention and control descriptions, length of follow-up, losses to follow-up and/or drop out, measurement strategy, secondary outcome measures and results. Self- Table 1. Example Search Criteria for Databases. child* OR adolescen* OR "young people" AND "physical activity" OR sport* OR cycl* OR walk* OR "physical education" OR "television view*" OR "tv view*" OR sedentary OR danc* OR "physical inactivity" OR "physical fitness" OR lifestyle OR exercise OR screen time OR "active travel*"OR commut* AND clinical trial OR control* trial OR random* OR trial OR evaluation OR effect* OR random* sample OR control* doi: 10 report or objective measurement was recorded, with the specific questionnaire or accelerometer and the length of the measurement period. Participant characteristics were extracted on relative gender percentages, baseline age, baseline BMI or zBMI scores as well as baseline TPA and MVPA levels. Extracted data were entered into an Excel spreadsheet [35] for the purposes of recording and standardisation. Measurement strategy and measurement tools, along with target outcome and quality of reporting, varied considerably between studies necessitating a number of assumptions and transformations. TPA was measured using either an accelerometer [36][37][38][39][40][41] or questionnaire [42][43][44]. MVPA was also measured using accelerometer [36][37][38][39]45,46] or questionnaire [43,44,[47][48][49][50]. To permit meta-analysis on mean differences [51], MVPA effects were transformed into minutes per day. Where Moderate PA and Vigorous PA were presented separately [44] they were combined [52]. Where only Moderate or Vigorous PA was reported [48] this was taken to be sufficiently conceptually similar to MVPA and entered into the meta-analyses as an equivalent main effect. Where MVPA was presented as a percentage of TPA [37,40] the means and standard deviations were multiplied out to provide minutes per day. If effects were given as amount of change [47], this change was added to baseline figures to arrive at a follow-up effect. Where TPA was presented on a log scale [36], means and standard deviations were transformed using standard procedures [52]. Where geometric means were reported [48], it was assumed that these corresponded to the arithmetic means. Where inter-quartile range was reported as the indication of dispersion [48], the quartile points were plotted on an assumed normal distribution and the corresponding standard deviations were entered into the analysis. Where data were presented for separate experimental groups, primarily by gender but also for staggered intervention cohorts, the numbers, means and standard deviations were combined for entry into the metaanalyses [52]. Where specific data was missing from a paper two attempts were made to contact the correspondence author by email.

Statistical analysis
The group sizes, means and standard deviations were entered into Stata 13 [53], with MVPA and TPA analysed as separate outcomes. The effect sizes of all outcome-relevant studies were combined to provide the overall effect for both MVPA and TPA. The planned outputs were overall effect estimates and 95% confidence intervals using random effects modelling with inverse variance weighting. Random effects was chosen a priori as a moderate to high degree of heterogeneity was anticipated between studies [54]. Initial analysis of the papers revealed TPA to have been measured and reported using varied instruments, therefore the effect calculation for TPA used standardised mean difference, while mean difference was calculated for MVPA given the relative suitability of reported measurements to be standardised into mins/day.

Subgroup analyses
A priori subgroup analyses were planned for: participant characteristics (gender, age and cohort size); intervention characteristics (prevention vs. treatment, PA included vs. PA not included, intervention duration and school vs. community setting), and outcome characteristics (objective vs. subjective measurement and post-intervention follow-up delay).

Literature search
The searches were conducted and completed in February 2014. The initial search of databases resulted in 15,696 identified studies, with 13 additional studies identified from relevant systematic reviews. Removal of duplicates and analysis of titles then allowed unambiguously ineligible studies to be excluded, leaving a sub-total of 1,493. Scrutiny of abstracts of the remaining studies revealed 138 potentially relevant studies. Full text articles were then reviewed, producing a total of 18 preliminarily studies. Four further studies were excluded at the data extraction stage, leaving 14 studies for the final systematic review. A PRISMA flowchart [55] of the study selection process is provided in Fig 1.
Control characteristics. None of the control groups included a PA component, excepting those comparisons which were made between additional PA and 'normal practice' in which case the participants completed standard physical education classes within curriculum time. Differences in characteristics between baseline and intervention groups were reported in all cases, with no comparisons deemed to be at high risk of bias. Studies using a cluster-design reported methods to ensure groups were comparable at baseline.

Study quality
Quality was assessed using the Methodology Checklist for Randomised Controlled Trials [57].
Overall there was a high number of 'uncertain' verdicts against the papers, potentially indicating the reporting of relevant information within the published articles was more pertinent than         the actual methodological quality of the studies (Fig 2). Participants lost to follow-up ranged from 0% to 50%, with studies reporting analyses of attrition characteristics. Eight of the nine studies utilising cluster-randomised design reported appropriate statistical techniques by which to account for clustering within the aggregate outcomes. A visual inspection of funnel plots for both outcomes suggested the possibility of small-study effect (Fig 3).

Overall effect estimates
The collated results from twelve included studies showed weak evidence for a small increase in MVPA in favour of the intervention group with a mean difference of 1.47 minutes per day (95% CI -1.88, 4.82; p = 0.39) (Fig 4). For the ten studies reporting TPA the analysis showed no difference between the pooled effects of the intervention and those for the control group, with a standardised mean difference of -0.13 (95% CI -0.74, 0.48; p = 0.67) (Fig 5).

Subgroup analysis
There were no significant differences in outcomes across the majority of study level characteristics, summarised in Table 3. Individual meta-regressions of MVPA and TPA by continuous  level covariates confirmed the lack of statistical significance. Exceptions included male participants showing a mean difference (p < 0.001) in MVPA between intervention and control groups at post-intervention follow-up measurement, approximately equivalent to 2.65 mins/ day and community-based interventions showed an effect (p < 0.01). However Jago et al. [39], due to a significant effect and tight confidence intervals, accounted for the majority of the weighting within the pooled effects on these sub-groups; removal of this paper from the subgroup analysis produced non-significant results. The relative success of community-based interventions may be due in part to small study effect (Fig 3), in which systematic bias is introduced into meta-analyses due to publications bias against studies with small cohorts with nonsignificant effects [52]. Lastly, the treatment-based subgroup [37,47] that approached significance (p < 0.10) for MVPA, and also the Nemet et al. [42] study, were all conducted in community settings, potentially indicating that treatment and community approaches may cluster to promote sustained PA.

Discussion
There was a statistically non-significant (p = 0.39) mean difference in favour of intervention, approximating to a mean improvement of 1.47 minutes per day of MPVA compared to controls, although this figure is well below the sensitivity threshold of the utilised measurement tools. This result falls well short of the recommended improvements of PA for children [1] and is unlikely to be clinically significant even if maintained over time. There was no statistically significant (p = 0.87) difference in standardised mean difference of TPA. In the case of Cui et al. [49], the control group was assessed at six months post-baseline, rather than post-intervention, although one-study removed sensitivity analyses revealed no meaningful change to overall or sub-group effects. A similar analysis for Hovell et al. [48] was conducted given this papers reporting of geometric, rather than arithmetic, means with no differences found on the sub-group effects.
In PA studies it is typically not possible to blind participants or instructors to allocation, opening a potential source of bias into the delivery [58]. In addition, the measurement was often conducted by researchers not blinded to allocation [59], although sub-group analysis Sustained Impact of Childhood Physical Activity Interventions revealed no difference between self-report and objective measures for MVPA or TPA. Levels of heterogeneity apparent between studies that used self-report was consistently high across both outcomes (MVPA I 2 = 98%; TPA I 2 = 97%), potentially compromising the sensitivity of this measurement strategy to reliably demarcate significant from non-significant results in small studies.
Negligible effect on the main outcomes was consistent with Metcalf et al. [32], who conducted a meta-analysis on 30 studies measured by accelerometer immediately post-intervention, with Dobbins et al. [27], who reviewed 44 studies specifically regarding school-based interventions, and with Kamath et al. [24], who reviewed 18 studies on PA levels following interventions within a wider review into prevention of childhood obesity. Also concordant with Metcalf et al. [32], findings indicated that intervention duration was not associated with increased PA levels at follow-up, with an emergent trend that favoured studies implemented in a community setting, those that used a treatment approach and those with smaller cohort sizes, potentially implicating a cluster of factors associated with greater intervention success. However, it was not possible to distinguish between specific factors or rule out small study bias.
No evidence for harmful effects of intervention on PA was indicated.
The strengths of the current review lay in the specificity and uniqueness of the inclusion criteria regarding methodological approach, requiring follow-up measurement to have occurred at least six months post-intervention, presenting a meaningful analysis to the literature. Limitations included the relatively small number of included studies which left subgroups underpowered within the analyses. In addition, the use of exclusively English language publications introduced a potential for English bias. While the use of a single researcher to conduct the primary identification and extraction procedure may be seen to constitute a weakness the specificity of the inclusion criteria, particularly the clear requirement for a six-month postintervention follow-up measurement, reduced the likelihood of selection error.
This review reinforced previous evidence that PA interventions have little measured effect on TPA or MVPA levels in children, either immediately post-intervention or at six-month follow-up. The possibility remains that the included studies, plus PA interventions in general, were ineffective due to insufficiencies in intensity, duration, delivery quality, theoretical grounding and implementation or measurement sensitivity. Although the benefits of PA in childhood are intuitive, evidence has yet to support this viewpoint and resources may be better invested in alternative approaches to achieve positive effects. In terms of recommendations for future research, we suggest the inclusion of a rigorously implemented and reported follow-up measurement stage is incorporated into the method, as further publication of pre-post studies will not meaningfully add to the existing literature.
At the time of writing no publication had specifically investigated the maintenance of PA levels at follow-up; this represented an important gap in knowledge addressed by the current review. Sub-group analysis revealed a potential area of promise with the utilisation of PA intervention to treat of high risk children and warrants further investigation. The challenge remains to ensure that high methodological quality, particularly regarding measurement tools, is adhered to in future studies in order to build a meaningful evidence base.