Please don ’ t stop the music: A meta-analysis of the cognitive and academic benefits of instrumental musical training in childhood and adolescence

An extensive literature has investigated the impact of musical training on cognitive skills and academic achievement in children and adolescents. However, most of the studies have relied on cross-sectional designs, which makes it impossible to elucidate whether the observed differences are a consequence of the engagement in musical activities. Previous meta-analyses with longitudinal studies have also found inconsistent results, possibly due to their reliance on vague definitions of musical training. In addition, more evidence has appeared in recent years. The current meta-analysis investigates the impact of early programs that involve learning to play musical instruments on cognitive skills and academic achievement, as previous meta-analyses have not focused on this form of musical training. Following a systematic search, 34 independent samples of children and adolescents were included, with a total of 176 effect sizes and 5998 participants. All the studies had pre-post designs and, at least, one control group. Overall, we found a small but significant benefit ( g Δ = 0.26) with short-term programs, regardless of whether they were randomized or not. In addition, a small advantage at baseline was observed in studies with self-selection ( g pre = 0.28), indicating that participants who had the opportunity to select the activity consistently showed a slightly superior performance prior to the beginning of the intervention. Our findings support a nature and nurture approach to the relationship between instrumental training and cognitive skills. Nevertheless, evidence from well-conducted studies is still scarce and more studies are necessary to reach firmer conclusions.


Introduction
The literature about the effects of musical training on cognitive and brain function is growing rapidly.Multiple studies have documented that involvement in musical activities enhances auditory and sensorimotor processes (James et al., 2020;Kraus et al., 2014;Slater et al., 2015; for a review, see; Herholz & Zatorre, 2012).However, whether musical training impacts general cognitive abilities (e.g., memory or attention) and academic achievement (especially, in literacy and mathematics) is still debated.Playing an instrument is a complex task involving several perceptual modalities, sensorimotor integration, and higher-order cognitive processes.Moreover, structured instrumental learning is an effortful activity that needs to be maintained across long periods of time; it requires regular and motivated practice, learning of new and progressively more difficult material, and adapting to new contexts.Those characteristics have led some to propose that musical training is an optimal general cognitive training strategy that might have an impact beyond music performance itself, benefiting performance in daily life activities (e.g., Bugos et al., 2007).Extensive evidence has associated musicianship with advantages in general cognitive functions, often loosely related to musical skills, such as intelligence (Bugos, 2014;Schellenberg, 2006;Swaminathan et al., 2017), visuospatial abilities (Sluming et al., 2007), processing speed (Bugos, 2014;Jentzsch et al., 2014), executive control (Jentzsch et al., 2014;Medina & Barraza, 2019), attention and vigilance (Kaganovich et al., 2013;Rodrigues et al., 2013;Román-Caballero et al., 2021), and episodic and working memory (Talamini et al., 2017).Also, it might protect against the cognitive decline associated with aging (Román-Caballero et al., 2018).Unfortunately, most of the studies in the field are correlational, which does not allow establishing firm conclusions about the causal role of musical training in those advantages (Schellenberg, 2020).
A plausible alternative explanation for these results is that high-functioning children, with higher musical aptitude, higher socioeconomic status, and/or personality traits associated with cognitive improvements (e.g., openness to experience), are more likely to be interested in music and take music lessons (Corrigall et al., 2013;Swaminathan et al., 2017).Or perhaps individuals with better executive functions are more prepared to resist the temptation to abandon the continued effort that mastering an instrument entails.From this point of view, most of the cognitive and academic advantages observed in correlational studies and interventions without random assignment (where participants and their families chose musical activities) could be due to preexisting differences in children's intelligence, temperament, and environment.In addition, it has been argued that far transfer (i.e., the generalization of training in one domain to skills in a loosely related domain) rarely occurs with most types of cognitive training, because of the small overlap between domain-specific and domain-general abilities (Melby-Lervåg et al., 2016;Sala & Gobet, 2019).Extending this logic, it would be unlikely that musical training could enhance general cognitive abilities.
Other theoretical proposals have tried to reconcile both positions, arguing that expert musicians might have preexisting advantages (cognitive, personality, and/or musical aptitudes) that would promote the acquisition of musical skills and motivation to practice, while at the same time this long-term engagement would also result in multiple neural and cognitive changes (e.g., nature and nurture hypothesis, Wan & Schlaug, 2010).In this vein, the difference in magnitude between the effects observed in correlational studies (often Cohen's d around 0.8-1.0)and in experimental designs with random allocation (d ≈ 0.2; Corrigall et al., 2013; for a classic example, see Schellenberg, 2004) might be the consequence of musicians' in correlational studies benefitting from both preexisting cognitive advantages and musical training itself.Only a small number of experimental studies comply with basic methodological standards, such as randomization, the inclusion of an active control group, and blinding of the assessment, and, in practice, most of them involve short interventions (1-1.5 years long) and relatively small samples (≈ 25 participants per group), with the subsequent lack of statistical power to detect small-to-medium effect sizes. 1 Under those conditions, it is perhaps unsurprising that the results have been inconsistent across studies, with some studies providing evidence of a positive impact of instrumental learning (Frischen et al., 2021;James et al., 2020;Schellenberg, 2004), while other studies have shown null effects (D 'Souza & Wiseheart, 2018;Haywood et al., 2015).This is an ideal context for the application of a meta-analysis, as it allows a quantitative review of the literature and enables drawing firmer conclusions given an increased statistical power.Also, meta-analysis offers numerical estimators of the summary effect and between-studies consistency, which provide the opportunity to assess the relevance of interventions (and not only their statistical significance) and to identify potential moderators.Unfortunately, even at the meta-analytic level, there are inconsistent results concerning the impact of musical training in experimental and quasi-experimental studies (Butzlaff, 2000;Cooper, 2020;Gordon et al., 2015;Hetland, 2000;Jaschke et al., 2013;Sala & Gobet, 2017, 2020;Standley, 2008;Vaughn, 2000).Probably, one of the greatest sources of variability is the vague and inconsistent definition of musical training across meta-analyses (Jaschke et al., 2013), which usually combine highly heterogeneous musical interventions, including instrumental tuition, programs of music education such as Kindermusik, Orff, or Kodály methods, computerized training of musical skills, phonological training with music support, and listening programs, among others.
Although in the past some authors have called for analyzing each type of training program separately to reach reliable results (Jaschke et al., 2013), subsequent meta-analyses have continued to include and pool multiple types of interventions in the same analysis (Cooper, 2020;Gordon et al., 2015;Sala & Gobet, 2017;2020).Arguably, studies examining the effects of formal programs in instrumental training are ideal for investigating the causal role of musical training on cognitive skills and academic achievement.Most correlational studies reporting effects of musical training have compared expert instrumentalists with non-musicians, suggesting that instrumental programs might be advantageous.Formal programs in which the participants learn to play a complex musical instrument 1 A power analysis using G*Power 3.1 (Faul, Erdfelder, Buchner, & Lang, Georg, 2009) for a one-tailed t-test and an alpha of .05indicated that around 310 participants per group would be necessary to achieve an acceptable power of .80 with a Cohen's d of 0.20 (small effect), and 51 participants per group for a d of 0.50 (medium).Required sample sizes are larger when two-tailed contrast statistics or higher power values are used.R. Román-Caballero et al. and to read music notation are the most similar to the type of training that such expert musicians follow. 2 Additionally, although all types of musical training aim to promote musical skills (e.g., rhythm, pitch and timbre discrimination, singing, basic music notation, etc.), learning to play an instrument seems to pose greater cognitive demands than other musical activities, as it requires particularly intensive practice entailing hand dexterity, bimanual coordination, and core cognitive functions such as working memory and attention.For that reason, far transfer might be more probable with instrumental learning.Although some studies have reported cognitive improvements with non-instrumental interventions (for Kindermusik, Orff, Kodály or related methods, see Kaviani et al., 2014;Patscheke et al., 2016; for listening programs, see Bugos, 2010;Hole, 2013), there is evidence of greater benefits with instrumental programs, to such an extent that non-instrumental music programs have even been used as control conditions in some studies (see Bugos, 2010;James et al., 2020).Nevertheless, to the best of our knowledge, the impact of instrumental interventions has not been investigated separately in any previous meta-analysis, nor has it been tested as a moderator.
On the other hand, the most recent and comprehensive meta-analysis (i.e., Sala & Gobet, 2020), which included different musical interventions, found a positive small effect of musical training (g = 0.18, p < .001) that was reduced to null when characteristics of design quality (i.e., random allocation and active control) were taken into account (g ≈ 0).However, the difficulty of implementing methodologically rigorous designs adds to the inherent cost of instrumental interventions that require highly specialized material and professionals.This might explain why only a third of the studies included in Sala and Gobet's meta-analysis had instrumental programs (19 out of 54) and why many studies with instrumental programs have not used optimal experimental designs.Indeed, studies involving instrumental training were underrepresented among Sala and Gobet's studies with random assignment and/or with an active control group.More precisely, only 27% of the randomized studies (6 out of 22), 36% of those using an active control group (9 out of 25), and 31% of those using both randomization and an active control group (4 out of 13), had instrumental training.Given that non-instrumental interventions likely have a smaller impact on cognitive skills and academic achievement compared to instrumental ones, the greater representation of the former in Sala and Gobet's study may have led to conclusions mostly related to non-instrumental musical training.Thus, despite all the previous meta-analyses, the overall impact of formal instrumental learning remains uninvestigated.In addition, some outcomes included in the meta-analyses by Sala andGobet (2017, 2020) were measures of skills trained with active control activities (e.g., phonological abilities with phonological training), and therefore should not be analyzed in a far-transfer meta-analysis (Bigand & Tillmann, 2021).Finally, new studies have appeared since the publication of the most recent meta-analysis (Sala & Gobet, 2020) and we additionally found some studies that have never been included in any previous meta-analysis, including some from unpublished doctoral theses (such as Nering, 2002;Pelletier, 1963).
Considering all the above, it seems crucial to carry out a new comprehensive meta-analysis that separately investigates the impact of instrumental learning programs on cognitive skills and academic achievement.The present work aims to address this issue by shedding light on the debate about the causal role of musical training in school-age children and adolescents.Accordingly, we analyzed the pre-posttest cognitive and academic changes in the available experimental and quasi-experimental studies that used formal training programs involving learning to play a musical instrument.While experimental studies with random assignment of the participants allow drawing causal inferences about the effects of musical training (and, therefore, representing the main source for the causal conclusions in the present meta-analysis), non-randomized longitudinal studies were also included for comparison purposes.

Literature search
A systematic search strategy was used following the recommendations of PRISMA (Moher et al., 2009).Firstly, we consulted PubMed, ProQuest, Scopus, Web of Science, and ProQuest Dissertation & Theses using the search syntax "music*" AND ("training" OR "instruction" OR "educati*" OR "practice") AND ("child*" OR "adolescen*").Also, references from previous empirical studies, reviews, and meta-analyses on this subject were examined.The latest search was carried out in February 2021, without any time restriction.In total, 8560 potentially relevant results were found, among which 32 met the inclusion criteria described below and were included in our meta-analysis (Fig. 1).These studies included 34 independent samples, 179 effect sizes, and a total of 5998 participants.
2 Structured singing training, such as that received by lyrical singers, is also comparable to the training of expert musicians.Nevertheless, in the literature, is it often difficult to distinguish between formal singing programs (intensive in terms of technique, music theory, and out-of-class practice) and interventions directed at a more diverse population with more informal approaches.It is also the case that singing interventions build on a capacity for singing already present in individuals without training, whereas learning an instrument entails learning completely new skills.Some studies have found smaller effects for vocal training in comparison to instrumental training (Guhn et al., 2020;Kinney, 2008; for a null difference, see Schellenberg, 2004).Although the comparison of formal instrumental and vocal training remains an open question that needs confirmation from studies with experimental designs, the most notable evidence to date is from the study of Guhn et al., who showed advantages for both instrumental and vocal musical training in a remarkably large sample of students (N ≈ 110,000) who chose to take part in music courses or not.This result held even after controlling for several confounding variables (cultural background, SES, sex, and prior academic achievement).Crucially, instrumental learning led to larger differences in comparison to vocal training (ds ranging from 0.12 to 0.31), a result that the authors attributed, among other factors, to the complexity involved in learning to play an instrument.They suggested that this complexity might have a particularly positive impact on executive functions and, through them, on other cognitive domains.Because for much of this literature it is very difficult to determine whether studies used formal or informal vocal training, and considering the evidence from Guhn et al. that cognitive benefits for instrumental training are likely larger than for vocal training, we decided to constrain the scope of our meta-analysis to studies with formal learning of musical instruments.R. Román-Caballero et al.

Selection criteria
The studies selected in the review had to meet the following criteria: 1. Published articles or theses that included musical training programs involving at least learning to play an instrument; 2. The design of the studies included pretest and posttest measures, regardless of whether there was a random assignment of children/ adolescents to conditions or they themselves (or their parents or their teachers) selected the activity; 3. The studies included a comparison between a music-treated group and, at least, one control group (active or passive); 4. The participants had no previous formal musical training or instrumental learning prior to the program; 5.The studies contained sufficient information to calculate at least one effect size (mainly, means and standard deviations, t value, F value, and the standardized effect size itself; otherwise, authors were contacted and the studies were included if the information was provided); 6.The studies included at least one non-musical measure of academic and/or cognitive skills (note that near-transfer effects were not included); 7.At the moment of starting the training, participants were between 3 and 16 years old; 8.The participants of the study did not suffer from neurological or psychiatric conditions.
As the included studies used different instruments to assess the outcomes (with different scales) from study to study, we used a standardized estimator of the effect size: Hedges' g.There are multiple ways of estimating Hedges' g in pre-posttest designs with two groups (see below, Effect Size), the most common being the standardized mean difference with posttest measures only, which we will refer to as g post .An alternative index, proposed by Morris (2008), which we will refer to as g Δ , is the standardized mean change difference (i.e., the difference between the two groups in the change of the outcomes between pretest and posttest moments).An advantage of this index over g post is that it controls for preexisting differences at baseline.Collating both types of effect sizes, g post and g Δ , in a single meta-analysis requires making some assumptions.For instance, the variance of the pretest score is assumed to be equal to the variance of the posttest score.Similarly, both groups are assumed to be equivalent in baseline performance.However, it is arguable that these assumptions are not met in most circumstances.Previous research suggests that there are cognitive and personality differences in individuals who choose and continue with musical training as an activity (Corrigall et al., 2013).For that reason, and unlike previous reviews (Sala & Gobet, 2017;2020;Vaughn, 2000), we constrained our review only to pre-posttest studies.
We included studies with random assignment (randomized studies), studies in which the children or their parents or their teachers selected the training group (self-selection studies), and studies with other allocation strategies that can not be consider random (such as quasi-randomization; non-randomized studies), as all of them can offer valuable information for the debate.On the one hand, randomized studies allow the establishment of more conclusive causal inferences about the effects of training, as randomization of the individuals reduces bias due to preexisting differences in cognitive, academic or musical skills, or other confounds (e.g., personality traits; Corrigall et al., 2013).On the other hand, studies that allowed the participants to choose the program has higher risk of selection effects, which might be observable in the overall difference in the pretest performance.Whereas we based the main inferences about causality, moderating variables and publication bias on randomized and non-randomized studies, the inclusion of self-selection studies was restricted to the assessment of baseline differences and the overall analysis for comparison purpose.

Effect size
We used the formula proposed by Morris (2008) for g Δ as an estimator of the effect size in the main analyses, where M pos and M pre represent the scores at pretest and posttest, respectively, for the treatment group (T) and the control group (C), and SD pooled, pre is the pooled standard deviation for the pretest scores of both groups.Moreover, c p is a correction factor of the small sample bias, given by where N T and N C are the number of participants in the treatment group and the control group.Positive values of g Δ represent greater benefits in favor of treatment group, and negative values index the contrary.We multiplied by − 1 those effects in which it was necessary to keep the mentioned direction.The g values were interpreted according to Cohen's criteria (Cohen, 1992): values close to 0.2, 0.5, and 0.8 or higher are interpreted as small, medium, and large effects, respectively.The variance of g Δ was calculated following the formula by Morris (2008).
where r is the correlation between pretest and posttest scores.We directly estimated r from raw data when they were available or used the following formula when other reported statistics made it possible: Using these equations, we could extract 75 correlation coefficients and their respective variances from 14 studies, with a metaanalytic mean r of 0.71 (see Data S2 in https://osf.io/9y5tp/).This final value of r is close to 0.70 that Rosenthal (1991) proposed as a conservative assumption when pre-posttest correlations were not available.Considering that, we conducted our analyses assuming r = 0.70.
Furthermore, as previous literature pointed to the existence of baseline differences between individuals who chose to take musical training and individuals who did not (Corrigall et al., 2013;Swaminathan et al., 2017), we were also interested in comparing the performance of both groups just at baseline.For that purpose, we calculated the traditional Hedges' g only with pretest scores (called here g pre ) ) . (7)

Meta-analysis, heterogeneity and moderator analysis
As is often in psychology meta-analyses, most of the included studies contributed with more than one effect size from the same sample, which rendered the outcomes not independent.Most of the conventional meta-analytic procedures, however, assume independence between effect sizes.The robust variance estimation approach (RVE; Hedges et al., 2010) has been developed to deal with correlated structure of outcomes.This method estimates the correlation matrix and sets the weights according to a correlated or a hierarchical structure.Simulation studies show that RVE is remarkably accurate in estimating the mean effect and the confidence interval, even with a small number of studies (m = 10) and when they include a large number of dependent estimates per study (k = 10; Hedges et al., 2010).We used the robumeta package for R (Fisher et al., 2017) for implementation of RVE conducted in the main analyses (all the data and R script for the analyses are fully available in the Supplementary Material).We chose a correlated dependence model with small-sample corrections (Tipton, 2015).
First, we studied the overall impact of musical training, fitting an overall meta-analytic model with randomized and nonrandomized studies, and then for each group of studies separately.For comparison, we repeated the analysis with self-selection studies (combined with the rest of studies and separately).The usual heterogeneity indexes, τ 2 and I 2 , were computed.To identify studies with outlying outcomes, we fitted a multilevel model with the rma.mv() function of metafor (Viechtbauer, 2010) and estimated the Studentized residuals (>2) and Cook's distance (>4/n).For the analysis of differences at baseline, we fitted separate RVE models for randomized, non-randomized and self-selection studies using the g pre as effect size estimate.
Then, we assessed the influence of the following moderating variables on effect sizes: (1) randomization (randomized vs. nonrandomized studies, note that self-selection studies were not included in moderator analyses); (2) type of control group (active vs. passive); (3) whether there was blinding of assessors or the measure was computerized (yes/no); (4) age of the participants at the baseline (in years); (5) duration of the training program (in months); (6) between-groups baseline difference, measured as g pre ; (7) low socioeconomic status (SES) of the sample (yes vs. no/not reported); and (8) the type of cognitive or academic outcome (mathematics, literacy,3 intelligence, processing speed, short-term memory, long-term memory, visuospatial abilities, phonological processing, and executive functions).Regarding the type of control, we conducted the analyses with the effects corresponding to the two comparisons (experimental vs. active control group, and experimental vs. passive control group) in those studies in which both were available in the same study.

Publication bias
Several lines of evidence indicate that multiple factors of the reporting and the publication procedure can drastically affect the results of a meta-analysis.Studies reporting significant and large effect sizes are more likely to be published or made available than statistically non-significant results or results that contradict an accepted theory (Carter et al., 2019).This phenomenon (called publication bias) leads to studies with null or negative estimates being less accessible and underrepresented in meta-analyses.Several methods have been developed to detect publication bias and correct for its adverse consequences over the final effect.
One popular approach is the visual inspection of small-study effects in a funnel plot and the use of the trim-and-fill method to correct the final estimate.The funnel plot is a display of the individual effect sizes on the x-axis against the corresponding standard errors on the y-axis.An asymmetric distribution can be a sign of publication bias, with missing studies in non-significant regions of the plot (Egger et al., 1997).The trim-and-fill method (Duval & Tweedie, 2000) detects (and removes) studies causing funnel plot asymmetry and then imputes missing studies to estimate a bias-corrected effect size.Alternatively, the precision-effect test and the precision-effect estimate with standard error procedures (PET and PEESE; Stanley & Doucouliagos, 2014) are based on a meta-regression approach to test for selective reporting and adjust for small-study effects.Both methods use a measure of precision as a covariate in the meta-analytic model (the standard error of the effect size in the case of PET, and sampling variance for PEESE), where the significance of the regression coefficient tests for publication bias, and the intercept of the model is taken as the true underlying effect.Thirdly, selection models (Vevea & Hedges, 1995) assume that the probability of publication depends on the p value.In our meta-analysis we use a selection model with a single cut point at p one-tailed = .025,which divides the range of possible p values into significant and non-significant values.
The previous methods assume independent effect sizes in their original formulation.A way to account for dependence is to combine all the effect sizes coming from the same sample generating an average estimate for each study, and conduct the classic methods on these aggregated estimates (Rodgers & Pustejovsky, 2020).In addition, some recent approaches directly handle the issue of dependence.For instance, the logic of PET-PEESE and other regression-based methods can be extended to multilevel models and RVE (Fernández-Castilla et al., 2021;Friese et al., 2017;Rodgers & Pustejovsky, 2020).Mathur and VanderWeele (2020) also proposed a sensitivity analysis that can be fitted with RVE.Assuming that positive results are more likely to be published than null or negative results by an unknown ratio (η, which is > 1 under publication bias), it is possible to estimate how strong this ratio would need to be to make the final effect negligible.Values of 1.5 are frequent in psychology literature, whereas values over 5 are rare (the 95th quantile of the estimated selection ratios, Mathur & VanderWeele, 2020).
Simulation studies show that ignoring dependence results in inflated Type I error (Rodgers & Pustejovsky, 2020).Although the methods that handle correlated effect sizes exhibit better performance, none of them stands as superior in terms of performance.Their performance depends on many parameters, such as the number of studies, heterogeneity, the degree of publication bias, and so on (Carter et al., 2019;Rodgers & Pustejovsky, 2020).A reasonable strategy is to use in combination several of them, and interpret their results taking into account the conditions of the meta-analysis (Carter et al., 2019).In the present meta-analysis we chose four methods to test publication bias and adjust the mean estimate: (i) the trim-and-fill method (with the L0 and R0 estimators) and (ii) the selection model, both using aggregates, (iii) the RVE regression-based approaches (RVE PET and RVE PEESE), and (iv) the Mathur and Van-derWeele's sensitivity analysis.We used the MAd package in R (Del Re & Hoyt, 2014) to generate within-study aggregates, while we carried out the Vevea and Hedges' selection model (1995) with the weightr package (Coburn & Vevea, 2019) and the Mathur and VanderWeele's sensitivity analysis with the PublicationBias package (Mathur & VanderWeele, 2020).For the RVE meta-regression test, we chose a modified formula of the sampling variance and, in parallel, a variance-stabilizing transformation for the standardized mean difference to prevent the artifactual dependence between the effect size and its precision estimate (Pustejovsky & Rodgers, 2019; see Appendix A).
Regarding the conditions of the present meta-analysis, previous comprehensive meta-analyses of the literature (Sala & Gobet, 2017a, 2020) revealed moderate heterogeneity (τ ≈ 0.2), a small sample of studies using instrumental programs (m ≈ 20), some evidence of publication bias, and a small uncorrected effect (g ≈ 0.20).Under similar conditions, trim-and-fill, selection model and the RVE meta-regression show acceptable Type I error rates when there is no publication bias (below a nominal level of 0.1, and RVE meta-regression below 0.05; Rodgers & Pustejovsky, 2020).When there is selective reporting, the three methods have low power, especially trim-and-fill, although selection model can detect publication bias more often.The limited power of RVE meta-regression was especially sensitive to heterogeneity and the size of the true effect, becoming lower with higher heterogeneity and smaller effects.Regarding the adjustment of the effect, the original PET-PEESE (which assumes independence) performed worse with smaller true effects and higher heterogeneity, consistently underestimating the true effect.Furthermore, its estimate should be interpreted with caution in small meta-analyses (with 20 studies or less; Stanley, 2017).Additionally, we conducted a simulation analysis with the software developed by Carter et al. (2019; http://www.shinyapps.org/apps/metaExplorer/)comparing the performance of the standard versions (not accounting for dependence) of trim-and-fill, PET-PEESE, and selection model under conditions similar to those in previous comprehensive meta-analyses (Sala & Gobet, 2017a, 2020; for further details, see Appendix B).Under the predefined conditions, the selection model achieved the best performance correcting the estimate (in terms of root square mean error and coverage) and trim-and-fill the worst, systematically overestimating the effect.The performance of PET-PEESE fell between both extremes.Finally, the Mathur and VanderWeele's sensitivity analysis seems to be relatively unbiased with values of η below 20 (i.e., a publication probability 20 times higher for positive than null or negative results; Mathur & VanderWeele, 2020).

Sensitivity analyses
Only a subset of the studies reported sufficient information to compute the pre-posttest correlation with Equations ( 6) and (7).To confirm that the results of the meta-analysis do not hinge critically on our decision to assume a correlation of 0.70 for all the studies, we repeated the analyses estimating V gΔ with r = 0.50 and r = 0.60.In the same vein, we assumed a within-effects correlation of 0.50 to estimate the sampling variance of the aggregates in the publication bias assessment.We also conducted the analyses with a correlation of 0.80 and 0.30.Moreover, we carried out sensitivity analyses following a multilevel Bayesian approach using the brms R package (Bürkner, 2017).The results of all the sensitivity analyses were similar to those reported here, showing far transfer with musical training (Appendices C and D), the modulating role of several variables on this effect (Appendix C), and little evidence of publication bias (Appendix E).

Results
Thirty-two empirical studies meeting the selection criteria were included in the systematic review, contributing a total of 179 cognitive/academic outcomes from 34 independent samples. 4As a consequence of our comprehensive search among the gray literature, we identified four theses and a report from a charity foundation (Haywood et al., 2015) that met our inclusion criteria.Moreover, fifteen of the studies have not been included in the most recent meta-analysis by Sala and Gobet (2020), in part because their inclusion criteria excluded programs in which the participants self-selected the program (although, some self-selection studies were included in their set: Degé et al., 2011;Geoghegan & Mitchelmore, 1996;Habibi et al., 2018;Hogan et al., 2018;Kempert et al., 2016; with contributed with a null mean effect, g = 0.03).Ten of the new studies were programs that allowed the selection of the group, two were randomized and three were non-randomized.Additionally, regarding the studies with instrumental training that the present meta-analysis have in common with the recent one by Sala and Gobet (17 studies), we identified 14 outcomes that had not been previously analyzed that overall showed moderate effects in favor of musical training (mean g = 0.40).
Among all the independent samples, eight had random assignment of participants to groups, twelve were non-randomized, and fourteen were self-selection studies.Seven samples had both active and passive control groups, five only active, and 22 only passive.The main characteristics of the studies are summarized in Table 1.Regarding sample characteristics, the mean age of the samples  included in our meta-analysis was 8 years (SD = 2.2; range: 3.9-14.7 years) and the mean duration of the programs was 17 months (SD = 16.3;range: 0.75-60 months).A total of 1664 children/adolescents took musical training, whereas 4334 were part of control groups (3670 in passive control groups and 664 in active control groups involving activities such as reading, drama, natural sciences lessons, visual arts, sports, dance, or non-musical computer-based programs).

Assessment of baseline differences
To test whether there were systematic baseline differences between participants in the treatment and control conditions, we conducted a meta-analysis of g pre .As expected, the mean effect size was non-significant for randomized studies, g pre = 0.00, 95% CI [− 0.14, 0.15], p = .957;τ 2 = 0.01; I 2 = 14.20% (see Appendix G, Figure G.1), confirming that randomization had been successful in these studies.Also, there was no baseline difference in non-randomized studies without group selection, g pre = 0.03, 95% CI [− 0.08, 0.14], p = .510;τ 2 = 0; I 2 = 0% (see Appendix G, Figure G.2). On the other hand, there was a positive and significant baseline difference in favor of musical training groups among self-selection studies, g pre = 0.29, 95% CI [0.12, 0.47], p = .003;τ 2 = 0.06; I 2 = 62.96% (see Appendix G, Figure G.3), which suggests that children/adolescents who voluntarily selected musical training as an extracurricular activity (over other programs such as sports or drama lessons) showed better initial cognitive and academic scores than their counterparts.A multilevel Bayesian approach using the brms R package (Bürkner, 2017) replicated previous results.Whereas there was strong evidence in favor of the lack of difference at baseline in randomized studies, BF 10 = 0.09, and non-randomized studies, BF 10 = 0.06; it showed substantial evidence in favor of preexisting differences in self-selection studies, BF 10 = 7.20.

Moderator analyses
Most of the moderators (randomization, active control, blinding of assessors/computerized measures, age of the participants, duration of the program, baseline difference, and low SES) were not significant when they were individually added to the model of the randomized and non-randomized studies (Table 2).Overall, the results suggested that several academic or cognitive domains were more sensitive than others to the impact of learning to play an instrument (see Table 3).To find out which combination of moderators provided the best fit for the data, we carried out a backward stepwise selection (α exclusion = 0.10) with all the moderators.The best meta-regressive model did not retain any moderator.
When the influence of moderators was assessed only with randomized studies, no separate variable reached the significance level explaining heterogeneity.However, when more complex structures of moderators were considered, the best meta-regressive model included age, baseline difference, and low SES (residual heterogeneity: τ 2 = 0.08; I 2 = 59.56%).The model suggests that the effect of musical training in randomized studies was smaller in older individuals and individuals with higher performance at baseline, whereas larger effects were found with low SES.

Publication bias
Visual inspectioning the funnel plot of the aggregates of randomized and non-randomized studies (self-selection studies were not included in publication bias analyses), there was no clear asymmetry in the distribution of effects (Fig. 4).Consistent with this, the trim-and-fill and RVE meta-regression showed no evidence of asymmetry of the funnel plot (i.e., publication bias and small-study effects).Trim-and-fill with the R0 estimator detected one missing study (see white circle in Fig. 4), but not with the L0 estimator (no missing studies).On the other hand, the regression coefficients for the standard error and the sampling variance were not significant in the RVE PET/PEESE meta-regressions (see Table 4).The likelihood ratio test of the Vevea and Hedges's selection model was not indicative of publication bias either (p = .606).Finally, in the Mathur and VanderWeele's sensitivity analysis, no value of η could render the estimate equal to 0 or non-significant.An η ≈ 9 was necessary to diminish the final estimate to g Δ = 0.10.Those results suggest that the meta-analytic conclusions are robust regardless of the severity of publication bias.Moreover, the multiple tests of R. Román-Caballero et al. publication bias yielded similar results when they were conducted only with randomized studies, suggesting little evidence of selective reporting or small-study effects (Table 5).
Regarding the adjustment of the multiple methods, the corrected effect sightly differed from the uncorrected estimate (g Δ = 0.26) in most of the cases (see Table 4).Only regression-based methods yielded non-significant estimates and, among them, only PET returned a negligible corrected estimate (g Δ = 0.01, p = .958).The Mathur and VanderWeele's sensitivity analysis yielded an effect closely identical to the uncorrected one (g Δ = 0.23, p = .003)with the mean value of η in psychology literature (η = 1.5).When the publication probability for positive results was five times the probability for null or negative (the 95th quantile in psychology; Mathur & Van-derWeele, 2020), the adjusted effect remained positive and significant (g Δ = 0.13, p = .003).Despite the smaller number of randomized studies (m = 8), the results were similar when the adjustments were applied in that group of studies (Table 5).
In summary, none of the methods detected substantial evidence of publication bias or small-study effects, including those with higher power, such as the Vevas and Hedges' selection model.In addition, the corrected estimate had similar size in most of the cases.Only the RVE PET approach showed a reduction in the effect.However, it is probable that the attenuation with RVE PET was a consequence of its worse performance under the observed conditions of moderate-to-high heterogeneity, small number of studies, and small effect size (Stanley, 2017; also see our performance simulation with Carter et al.'s software, Appendix B).Previous simulation studies showed that PET tends to underestimate the true effect under the conditions observed in our meta-analysis (Stanley, 2017).Therefore, taking all the approaches in consideration, the results suggest that the true underlying effect is non-zero.

Discussion
The present meta-analysis investigates the causal effects of learning to play an instrument on cognitive skills and academic achievement during the school years.Overall, a small benefit (g Δ = 0.26) was found with relatively short-term programs (with a mean Fig. 4. Funnel plot with trim-and-fill of the aggregate effects of randomized and non-randomized studies (black circles).One missing study was imputed with trim-and-fill (white circle) using the R0 estimator.The contour of the funnel takes into account the heterogeneity of the trim-and-fill model.The light gray zones show effects between p = .10and p = .05,and the dark gray zones show effects between p = .05and p = .01.

Table 4
Tests of publication bias for randomized and non-randomized studies combined.duration of 17 months), regardless of whether or not there was a random assignment of participants to musical training versus control groups, and independently of the type of control group (i.e., active vs. passive).The fact that a positive result was also found in randomized designs taken alone supports the idea of a causal role of musical training in the observed improvements.Complementarily, it is important to note the detection of a bias in baseline performance in favor of music groups across studies in which the participants chose the training group (g pre = 0.29).This indicates that participants who self-selected to play an instrument consistently showed better performance prior to the beginning of the intervention, compared to those who decided to enrol in an alternative control activity.As this pretest disparity was small, and most of the studies were underpowered to detect it, 5 it is perhaps unsurprising that the authors of those studies usually claimed to have matched groups.However, our meta-analytic evidence reveals that this was not the case.In contrast, and importantly, the pretest differences were null in randomized studies, as one would expect from truly random assignment of participants to groups.Furthermore, there was scarce evidence of publication bias and our conclusions remain valid under almost all the bias-correction methods that we applied (trim-and-fill, selection model, PEESE, and sensitivity analysis), except PET.Simulation studies have found that the last method performed poorly under conditions of moderate-to-high heterogeneity, reduced number of studies, and a putative small true effect as the one explored in the present meta-analysis.Therefore, it reinforces the conclusion that current evidence supports a causal effect of instrumental musical training on cognitive skills and academic achievement in children and adolescents.
Our findings are in line with a nature and nurture approach (Wan & Schlaug, 2010).According to this view, preexisting cognitive advantages and higher levels of academic achievement, such as those we observed at baseline in self-selection studies, would facilitate the learning of musical skills.In addition, engagement in the complex activity of learning to play a musical instrument for a long period of time would lead to neurocognitive adaptations producing further enhancements in general cognitive skills and academic achievement.Even for expert music performance and the skills directly trained, deliberate practice seems insufficient to wholly explain individual differences (only ~ 30%; Hambrick et al., 2014), and part of the remaining variability might come from preexisting factors such as genetic factors and early musical experience (Seesjärvi et al., 2016).To our knowledge, only one experimental study has investigated far transfer with instrumental learning using a monozygotic cotwin control design (Nering, 2002).In this study, one of the twins was randomly selected to take a piano training program while the other was assigned to a waitlist group.After 7 months, experimental twins overperformed the control group in intelligence scores.Although the comparison group did not participate in an alternative activity (such as in other experimental studies with positive outcomes; e.g., Frischen et al., 2021), the inclusion of monozygotic twin pairs with a common genotype and an early rearing environment supports that musical training has an impact on extra-musical cognitive skills even when genetic factors and shared environment are controlled.
Under a nature and nurture approach, it is not surprising that the differences reported previously between musicians and nonmusicians in correlational studies (g = 0.8-1; Corrigall et al., 2013) tend to be remarkably larger than the effects of short-term training in children that we observed in our meta-analysis of experimentally controlled studies (g ≈ 0.2).The combination of both initial differences and additional enhancements produced by the involvement in musical training for many years can explain the larger effect observed in correlational studies comparing adult musicians and non-musicians.In a recent study, Mankel and Bidelman (2018) reported similar findings with auditory processing, where listeners with inherently more adept auditory skills but no formal musical training showed better speech encoding than a low-musicality group, whereas formally trained musicians showed superior musicality and outperformed both groups of non-musicians on speech encoding.Taken together, their results suggest that preexisting factors may play a role in the relationship between musical experience and enhanced auditory functions, at the same time that musical training might provide an additional experience-dependent boost of preexisting differences.
Moreover, our results are in line with a recent longitudinal genetic study with over 1600 twins, including biological and adoptive adolescent siblings (Gustavson et al., 2021).Instrument engagement was highly heritable and genetically correlated with verbal abilities at 12 and 16 years of age, suggesting that a common set of genetic influences predisposes individuals towards both music engagement and high verbal intelligence (i.e., selection bias).However, instrument engagement was associated with later verbal ability (at 16 years old) even when controlling for 12-year full-scale intelligence or 12-year verbal intelligence, providing evidence for small direct benefits of musical training on later language abilities.

Table 5
Tests of publication bias for randomized studies. 5A power analysis using G*Power 3.1 (Faul et al., 2009) for a two-tailed t-test and an alpha of .05indicated that around 188 participants per group would be necessary to achieve an acceptable power of.80 with a Cohen's d of 0.29.

R. Román-Caballero et al.
Following this logic, the less controlled the correlational studies, the larger one would expect the observed effect of musical training to be.For instance, Medina and Barraza (2019) observed an extremely large advantage in executive control for professional pianists (d = 1.51) using a visuospatial attentional task (i.e., the Attentional Networks Test or ANT; Fan et al., 2002), which correlated with the number of years of musical practice.This exceptionally large effect was likely inflated by the lack of control over several variables potentially enhancing attention.Indeed, in a similar study with an ANT-like task (i.e., the Attentional Networks Test for Interactions and Vigilanceexecutive and arousal components or ANTI-Vea; Luna et al., 2018), Román-Caballero et al. (2021) found a smaller difference (d = 0.25) when the effect of musical training was measured while controlling for a wide list of sociodemographic and lifestyle confounds.This inflation is still present in observational studies with large samples, such as Guhn et al. (2020;N ≈ 110,000), in which reductions around 60% or more were observed in all the measures after controlling for multiple confounders (cultural background, socioeconomic status, sex, and prior academic achievement).Thus, the long history of training in professional musicians (about 12 years in Medina & Barraza, 2019) likely fosters their cognitive capacities, although in a more modest way than reported in uncontrolled cross-sectional studies.

Methodological quality and other musical programs
Unlike previous studies reporting an inverse relationship between design quality and the magnitude of the effects (see Sala & Gobet, 2017a, 2020), we did not find a significant reduction in the size of the outcome of randomized studies compared to non-randomized ones.On the contrary, the benefits for studies with random allocation were numerically greater (randomized: g Δ = 0.26, vs. self-selection: g Δ = 0.11).As noted above, this inconsistency could be a consequence of non-instrumental programs being overrepresented in the studies with higher methodological quality in the meta-analysis by Sala and Gobet (2020; only 31% of those with higher methodological quality involved instrumental training).Previous studies show that the benefits of non-instrumental interventions, such as preschool training of musical skills or active listening, are smaller than those of instrumental training (Bugos, 2010;James et al., 2020).A plausible explanation is that non-instrumental programs are less cognitively demanding and also that the skills they train are more restricted to the music domain compared to instrumental programs.
Indeed, a reanalysis of the data meta-analyzed by Sala and Gobet (2020) supports these impressions.Excluding studies with self-selection of the musical training program (Geoghegan & Mitchelmore, 1996;Hogan et al., 2018;Kempert et al., 2016), those with only posttest designs (five), and those excluded as outliers, we identified 30 non-instrumental studies included in Sala and Gobet's review, which used computerized training of musical skills (4 studies), phonological processing training with music support (1), and Kindermusik, Orff, Kodály or other related methods (25).We compared these non-instrumental studies to the 18 studies with instrumental programs with random or not self-selected assignment included in our meta-analysis.When design quality was not taken into account (i.e, when studies with randomized and non-randomized allocation, as well as active and passive controls, were analyzed), both non-instrumental and instrumental programs showed similar and significant benefits (g Δ, instrumental = 0.26, p < .001, vs. g Δ, non-instrumental = 0.20, p = .002).However, this result changed remarkably when we constrained the analyses to randomized studies, finding that only instrumental programs had a significant effect (g Δ, instrumental = 0.26, p = .013,vs. g Δ, non-instrumental = 0.11, p = .197).Similarly, when the analyses were constrained to studies with active control groups, instrumental programs outperformed non-instrumental interventions (g Δ, instrumental = 0.23 vs. g Δ, non-instrumental = 0.01).Therefore, it seems that the null result with high-quality designs reported by Sala and Gobet (2020) was biased by the overrepresentation of non-instrumental interventions (73% of the randomized studies) that, according to our reanalyses, do not seem to produce far transfer benefits.The confound between design quality and the type of musical training in the previous meta-analysis makes it necessary to take their conclusions with caution and limits its generalizability to all musical programs.And again, it reinforces the importance of analyzing instrumental learning programs separately, as in the present meta-analysis.
Altogether, our results support the preferred use of random allocation of participants and pre-posttest designs, rather than onlyposttest, to shed light on the debate about the causal role of musical training.However, the duration of studies involving randomized programs tends to be short and many children assigned to the musical training group may not be motivated to learn to play an instrument, both of which might undermine any potential effects of training.For instance, Schellenberg (2004;Corrigall et al., 2013) reported that the participants randomly assigned to music lessons had minimal practice between lessons (about 10-15 min/week), which contrasts substantially with the practice at home of children who motivationally select music as an extracurricular activity.While further studies are necessary with randomized pre-posttest designs, active control groups and blind assessment, the evidence from programs where the children select the activity is also interesting on its own, as these studies usually investigate the effects of longer-term interventions and in ecological situations (Habibi et al., 2018;Tervaniemi et al., 2018).In any case, when it comes to instrumental learning, and not musical education in general, conclusions such as "since there is no phenomenon, there is nothing to explain" (Sala & Gobet, 2020, p. 9) or "researchers and policymakers should seriously consider stopping spending resources for this type of research" (Sala & Gobet, 2017b) seem overpessimistic in light of our results, and upcoming investigation will be essential to clarify this debate.

Baseline differences, socioeconomic status, and age of the participants
Although randomization and the inclusion of an active control group did not explain between-studies variability in our metaanalysis, other moderators accounted for part of the heterogeneity.In the model of randomized studies, three variables were shown to be influential: baseline differences between groups, age of the participants, and SES.First, the larger the baseline difference, the smaller the observed effect of musical training.This could be due to children with a lower initial level of performance having a R. Román-Caballero et al. greater window of opportunity, and vice versa.Similar results have been found for general cognitive training (Jaeggi et al., 2011;Whitlock et al., 2012).Conversely, participants who had the chance to choose musical training programs showed better academic and cognitive scores at baseline, but their benefits (g Δ = 0.11) were numerically the smallest compared to those in random (g Δ = 0.26) or other types of non-random allocation (g Δ = 0.25), in which there was no pretest bias.However, when baseline performance is explicitly controlled in the model of self-selection studies, it predicts a similar pre-postest difference under conditions of no pretest bias (fitted g Δ = 0.19 for self-selection studies vs. fitted g Δ = 0.22 for randomized studies).An alternative explanation for the pretest effect is a regression toward the mean of those samples of children who showed remarkably disparate scores at baseline (higher or lower).
In the same vein, participants with lower SES showed greater improvements compared to those with middle-high SES.Again, this might be the consequence of a large margin for improvement for individuals whose development of cognitive skills and academic achievement is limited by their socioeconomic environments (Diamond, 2012).Therefore, this suggests that, although higher-functioning individuals are more likely to select and maintain musical practice for many years, children with a less favorable background can also benefit from musical training as long as they engage in it for enough time (Fasano et al., 2019;Portowitz et al., 2009;Tierney et al., 2015).If this finding is confirmed by future research, musical training can become an excellent candidate to contribute to reducing cognitive and academic differences due to social disparities.
Finally, the age of the participants at the beginning of the training program seems to modulate the impact of musical training.Our results are consistent with previous cross-sectional studies that show greater neural and cognitive advantages for earlier onsets of the training (Fauvel et al., 2014;Hanna-Pladdy & Gajewski, 2012;Schlaug et al., 1995;Vaquero et al., 2016).This relationship is suggestive of a sensitive period during which instrumental learning is likely to have stronger and more permanent effects on non-musical skills (White-Schwoch et al., 2013), perhaps as a consequence of greater neural plasticity earlier in development, and because those early neurocognitive changes might serve as a scaffold for future training (Vaquero et al., 2016).

Type of outcome and other moderators
Despite the identification of several moderators, heterogeneity remained moderate (I 2 = 59.56%).Part of this variability may be due to artifacts such as differences in measurement error (in relation to the reliability and validity of the tests) or reporting and transcriptional errors (e.g., inaccuracy in coding data, computational errors, errors in reading computer output, or typographical errors).Additionally, although the duration of the programs was known, the participants might have had different levels of engagement and differed in the amount of between-lessons practice.Unfortunately, this information is rarely reported in the studies, so it is hard, if not impossible, to detect this type of biases in most studies (especially, when the outcomes are not outlier values).On the other hand, this heterogeneity may indicate the existence of other unknown variables that can modulate the final effect.In this sense, the type of outcome was a significant moderator when it was individually entered in the overall model, suggesting that the impact of the interventions is not the same for all cognitive and academic domains.Looking at Table 3, we observed that some cognitive abilities, such as executive functions (g Δ = 0.41), improved more than others.Unfortunately, the number of observations per type of enhanced cognitive abilities was low (only two out of nine were assessed at least in ten studies).Thus, the analysis was overly underpowered and needs to be addressed in future research.
The characteristics of the training program might also explain part of the observed variability: the method of instruction, the instrument learned, the music style, or whether the tuition was individual or in small groups.Studies such as Bianco et al. (2017) and Guhn et al. (2020) pointed out that each instrument involves idiosyncratic skills that might have specific cognitive and academic consequences.For example, Bianco et al. (2017) interpreted the differences between drummers and no-drummer musicians in a go/no-go task as a result of the greater amount of physical activity necessary to play drums.Likewise, Guhn et al. (2020) alleged that vocal school music does not require learning musical notation or playing an instrument, which could explain the smaller academic improvements observed with vocal music (compared to instrumental learning programs).In the same vein, small-group learning has been shown to increase transfer compared to individual-learning programs (Pai et al., 2015).Although scarce, these studies open the door to further experimental research examining the influence of training program singularities.

Transfer in musical training
A relevant contribution of our meta-analysis is that the benefits of instrumental learning were observed in cognitive tasks and contexts quite distinct from musical performance.Computerized psychological tasks (e.g., go/no-go), and standardized tests of intelligence or academic achievement have little in common with playing a musical instrument at a concert or rehearsal.Thus, our findings are in accordance with the idea that, besides improving domain-specific abilities, involvement in the stimulating activity of playing a musical instrument enhances distant functions.Nevertheless, not all cognitive domains and academic achievement appear to be equally sensitive to instrumental training (see Table 3).For example, executive functions showed the most robust benefits.This is not surprising, as music-making places high demands on the abilities of control and self-regulation, monitoring, planning, and focused and sustained attention, among others.Some authors have expressed skepticism about far transfer (Roediger, 2013;Sala & Gobet, 2019;Thorndike, 1906).Specifically, Thorndike (1906) proposed that transfer only occurs when trained and untrained processes share features in common.Under this approach, he concluded that "the most common and surest source of general improvement of a capacity is to train it in many particular connections" (Thorndike, 1906, p. 248).Unlike many other cognitive activities involving highly specific contexts and tasks, music-making requires the coordination of several skills and sensory modalities and involves a wide and constantly augmented variety of stimuli, social situations, and types of performance.Therefore, musical training has singular characteristics that made it a plausible cognitive enhancer, even from a skeptical perspective.
One explanation for far transfer is that regular training in a particular basic cognitive process fosters the process itself and, as a consequence, affords advantages to any daily task that also hinges on the same skill.However, this explanation is undoubtedly simplistic, as evoking a "brain as a muscle" metaphor fails to explain why cognitive training programs sometimes fail to extend their benefits to other activities (Gathercole et al., 2019;Roediger, 2013;Simons et al., 2016;Taatgen, 2013).An alternative proposal conceptualizes transfer as the consequence of acquiring complex cognitive skills that can be applied to untrained tasks with some overlap (Gathercole et al., 2019;Taatgen, 2013).The cognitive routine framework (Gathercole et al., 2019) posits that training on unfamiliar or highly demanding tasks, such as learning to play an instrument, leads to the development of new complex cognitive skills.Transfer then occurs when one of these new skills can be applied to a novel activity.In the case of musical training, several studies have reported superior memory scores for adult musicians when they were compared to non-musician counterparts (Franklin et al., 2008;Jakobson et al., 2008; for longitudinal studies, see Portowitz et al., 2009;and Roden et al., 2012), but the evidence suggests that the advantage is largely due to more robust and efficient coding (such as an improved rehearsal mechanism, Franklin et al., 2008; or increased use of semantic information organization strategies, Jakobson et al., 2008).In line with these results, musical training could stimulate the development of singular strategies, such as mental rehearsal or semantic organization, that can be applied in several non-musical tasks.Accordingly, the expansion of cognitive capacities along with the development of new complex skills could explain the broad benefits observed with musical instrumental learning.
Finally, the cognitive and academic benefits of musical training across a wide range of areas may be in part a consequence of its noticeable impact on attention and executive functions.Attention and executive functions are engaged in many daily activities as well as in many of the tasks used in the studies included in the present meta-analysis for measuring academic achievement and cognitive functions in the included studies.Therefore, any benefit in attention and executive functions might indirectly influence performance in those activities (i.e., acting as a mediator factor between musical training and non-musical skills; Hannon & Trainor, 2007;Moreno & Farzan, 2015;Román-Caballero et al., 2018).
Musical training and practice may also pose a unique type of ongoing challenge that might facilitate far transfer.No matter what level of technical and artistic mastery a musician achieves, there is always room for improvement.Furthermore, there are always new pieces, interpretations, styles, and genres of music to learn.And different musicians and ensembles to play with, adjust to, and learn from.Thus, improvement through the application of effortful control can remain a rewarding challenge throughout the lifespan.As a representative anecdote, when the virtuoso cellist Pau Casals was asked why he continued to practice four and five hours a day when he was eighty years old, he answered: "Because I think I am making progress" (Lyons, 1958).

Practical significance
Our results support that learning to play a musical instrument is an activity with cognitive and academic benefits, although they are fairly small.The overall effect in the present meta-analysis (g Δ = 0.26) indicates a probability of 57.3% that a randomly selected person from the musical training group will show higher cognitive skills and academic achievement than a person selected from the control group (only 7.3% above chance level).One pertinent question is the practical significance of this effect, as musical training is an effortful activity that takes many years.In this regard, Hunter and Schmidt (2015) claimed: The question for a treatment is really not whether it had an effect but whether the effect is as large as a theory predicts, whether the effect is large enough to be of practical importance, or whether the effect is larger or smaller than some other treatment or some variation of the treatment.(Hunter & Schmidt, 2015, pp. 246-247) For example, Schellenberg (2004), in one of the first randomized studies with children participants, found that after 36 weeks of intervention the difference between the IQ gain of the keyboard and the passive control groups was only about 2 points.The benefit was even smaller when music participants were compared to children who took drama lessons (a gain difference of 1 IQ point).Similarly, our meta-analysis of studies with programs lasting 16 months on average showed a similar overall increase, corresponding to about 3 IQ points.This contribution is rather small and probably makes very little difference in daily life, so, in that case, musical training might be not one of the first-choice interventions if the only purpose is cognitive enhancement.However, 1-1.5 years of musical training and, in some cases, with reduced engagement might be not enough to produce substantial benefits on cognition.The few studies that assessed longer training periods have shown a greater impact of playing an instrument (Costa-Giomi, 1999, 2004: 36 months, mean g = 0.43;James et al., 2020: 48 months, mean g = 0.39;Portowitz et al., 2009: 24 months, mean g = 0.73), opening the possibility that the changes have practical relevance in the long run, with years of training.Furthermore, in contrast to the general population, learning to play an instrument might have significant daily life implications, both in the short and long term, for populations with lower cognitive development (fitted g Δ = 0.69, assuming one standard deviation below the comparison group at baseline), low SES (fitted g Δ = 0.41), or both (fitted g Δ = 0.97).
The earliest musical instruments date back more than 35,000 years, which indicates that human beings have practiced musical activities involving the use of instruments since the Upper Paleolithic (Conard et al., 2009).It is likely that music originally emerged as a cultural creation, anaexaptatio of the auditory system (i.e., pitch and timing processing) that had evolved for auditory scene analysis (Trainor, 2015).However, engaging in music-making could confer benefits, such as enhanced group cohesion, cooperation, and mood regulation, that may have led to music-specific adaptations and refine human musical skills (Savage et al., 2020;Trainor, 2015).Thus, although musical education might offer certain cognitive advantages, in the end, musicality and music traditions seem to be more strongly linked to other adaptive purposes, such as social bonding.It seems reasonable that the main functions and motives to be involved in musical activities (and subsequently, their more visible effects) are historically distinct from cognitive enhancement.Along these lines, several longitudinal studies have found benefits in such types of domains: emotional development and empathy R. Román-Caballero et al. (Rabinowitch et al., 2013), prosocial skills (Schellenberg et al., 2015), self-esteem (Costa-Giomi, 2004;Rickard et al., 2013), academic self-esteem (Degé et al., 2014;Degé & Schwarzer, 2018), and mood and quality of life (Seinfeld et al., 2013).Therefore, the cognitive benefits that could appear with musical training should be not taken as the principal value or goal of playing an instrument in most contexts, but as a precious supplementary effect that adds value to an ancient human facet with many other functions.

Conclusions
The present meta-analysis shows that learning to play an instrument during the school years has a modest but significant cognitive and academic impact.Longitudinal evidence suggests both a causal role of musical training and the existence of a self-selection bias, whereby children with favorable backgrounds and higher initial functioning are more likely to choose to learn to play or keep learning an instrument.The contrast of these findings with the null results reported for other types of musical and cognitive programs indicates, once again, the rareness of far transfer.Although the mechanisms for transfer remain unknown, instrumental learning in structured programs would be an optimal framework to investigate them.Finally, given that reliable evidence is still scarce, further studies in this field will be relevant to reach firmer conclusions.

Fig. 1 .
Fig. 1.Flowchart of the studies included in the systematic review and meta-analysis.

Table 1
Characteristics of the studies included in the meta-analysis.

Table 3
Final effect of each type of cognitive/academic outcome.
Note.Significant results are depicted in bold; m = number of studies, k = number of outcomes, p = p value.

Table 2
Results of the meta-regressive analyses.