The cognitive and academic benefits of Cogmed: a meta-analysis

Cogmed Working Memory Training (CWMT) is a commercial cognitive-training program de- signed to foster working-memory capacity. Enhanced working-memory capacity is then supposed to increase one's overall cognitive function and academic achievement. This meta-analysis in- vestigates the eﬀects of CWMT on cognitive and academic outcomes. The inclusion criteria were met by 50 studies (637 eﬀect sizes). Highly consistent near-zero eﬀects were estimated in far-transfer measures of cognitive ability (e.g., attention and intelligence) and academic achievement (language ability and mathematics). By contrast, slightly heterogeneous small to medium eﬀects were observed in memory tasks (i.e., near transfer). Moderator analysis showed that these eﬀects were weaker for near-transfer measures not directly related to the trained tasks. These results highlight that, while near transfer occurs regularly, far transfer is rare or, possibly, inexistent. Transfer thus appears to be a function of the degree of overlap between trained tasks and outcome tasks.


Cogmed Working Memory Training
Following this idea, some commercial cognitive-training computerized programs have been designed in the last two decades to boost WM capacity, overall cognitive ability, and everyday functioning. The most well-known, studied, and influential of such programs is Pearson's Cogmed Working Memory Training (hereafter CWMT; www.cogmed.com). Simons et al. (2016) classify CWMT as one of the five commercial cognitive-training programs whose effectiveness had been assessed in several publications (see also SharpBrains, 2015). The studies investigating the effects of CWMT are undoubtedly the most numerous and best designed (e.g., often including active controls) in this category (Simons et al., 2016, pp. 143-148).
CWMT is usually administered by school personnel or clinical practitioners who have been trained by Cogmed coaches. All the three types of CWMT programs--Cogmed JM for preschoolers, Cogmed RM for older children, and Cogmed QM for adults--consist of 25, 30, and 45-min sessions over a five-week period. Trainees perform the training either in a school or rehabilitation environment, or at home under remote supervision (Simons et al., 2016). The training regimens include gamified verbal and visuo-spatial WM tasks that require trainees to recall increasingly longer sequences of information as their performance improves with practice (for more details, see Shipstead, Hicks, & Engle, 2012a;Shipstead, Redick, & Engle, 2012b;Simons et al., 2016).
The preschool training, Cogmed JM, involves training tasks that are linked together with a theme-park design. Preschoolers are required to direct their attention towards a sequence of items (i.e. array of bright colored fur ball creatures), hold the sequence in their WM, and then select the items in their original order using a mouse or a touch pad. The duration of the WM tasks gets adjusted based on the trainee's performance. Correct responses get reinforced with positive visual stimuli (e.g., smiles) and the duration of the intervals between stimuli and recall increases, whereas incorrect responses elicit negative visual stimuli (e.g., frowning). The schoolaged children training, Cogmed RM, involves training tasks presented on a space-themed interface design. Similar to Cogmed JM, trainees are required to recall a sequence of items from memory. However, the tasks in Cogmed RM have more targets and longer sequences, thus they are relatively more difficult than Cogmed JM tasks. In addition, trainees' scores are presented on the screen so that they can challenge themselves and try to outperform their previous score. As an incentive and reward for task completion, a racing game called "RoboRacing", which involves collecting coins and racing against the clock, is presented at the end of each daily training. The adult training, Cogmed QM, involves similar training exercises as Cogmed RM. However, in this version, trainees may need to concentrate more because the interface is less visually appealing and the emphasis on surpassing prior performance is less apparent. For more details, see Roche and Johnson (2014).
Notably, the software has been claimed to increase performance in academic, social, and professional settings (www.cogmed. com/how-is-cogmed-different). Cogmed avers that CWMT leads to improvements in attention, reading, mathematics, cognitive control, and cognitive functioning in daily life (Pearson, 2016). Nonetheless, Cogmed also acknowledges that further and more compelling evidence is needed, especially with regard to the presumed academic benefits of CWMT.
As in most cognitive-training programs, the findings regarding CWMT have been mixed, which has kept researchers from reaching a definite conclusion on the topic. While some scientists have expressed optimism due to promising results (e.g., Shinaver, Entwistle, & Söderqvist, 2014), others have highlighted the overall insufficient quality of the experimental design of the studies investigating the effects of CWMT (e.g., Simons et al., 2016). The lack of active controls (or any controls), non-random allocation of the participants to the groups, and small sample sizes are some of the major flaws that may bias the results of CWMT studies and introduce spurious variability in the pool of data.

The present study
This paper evaluates the impact of CWMT on people's cognitive and academic skills via meta-analysis. We focus on two primary goals. First, we evaluate the differential impact of CWMT on performance in cognitive tasks as a function of the type of transfer. The field has customarily distinguished between far-transfer and near-transfer effects (Barnett & Ceci, 2002). While the latter concern the performance on memory tasks (as proxies for WM capacity), the former deal with cognitive and academic tasks (e.g., fluid intelligence, attention, language, and mathematics). Furthermore, several memory tasks employed as outcome measures closely resemble the tasks included in CWMT. The training is thus expected to exert a stronger effect on such tasks rather than those memory tasks that are not directly related to the trained tasks.
Second, we aim to quantify and explain the amount of variability in the findings in this literature. We employ moderator analysis to investigate the potential sources of within-and between-study heterogeneity. This analysis addresses a fundamental point: statistically accounting for the degree of true heterogeneity is the only reliable way to make some sense of the mixed results the field has produced so far.
To the best of our knowledge, only two meta-analyses have been carried out specifically on CWMT so far. Spencer-Smith and Klingberg's (2015) meta-analysis is somewhat limited in scope (it deals solely with subjective measures of inattention in daily life). Nutley & Ralph's (n.d.) meta-analysis is relatively outdated (only 16 studies are included, and the most recent ones are from 2012). Both these meta-analytic syntheses have reported results supporting the effectiveness of CWMT. Other meta-analyses have included some CWMT interventions within the broader context of WM training, showing less optimistic results (e.g., Melby-Lervåg, Redick, & Hulme, 2016;Sala & Gobet, 2017a). However, it is worth mentioning that the number of CWMT interventions included in these latter meta-analyses is quite limited (19 in Melby-Lervåg et al., 2016;four in Sala & Gobet, 2017a) and no conclusive response about CWMT effectiveness can be drawn from these syntheses. In the last few years, the number of eligible studies has more than doubled, and no previous meta-analysis has in fact been conclusive regarding the actual impact of CWMT on cognitive ability and academic skills. Given the large amount of experimental evidence collected so far, the prominent position of CWMT in the landscape of commercial cognitive-training programs, the importance of theoretical and potential practical (clinical and educational) implications, and the contradictory claims expressed by different researchers in the field, we think that an up-to-date meta-analytic synthesis implementing a sound modeling design is required.

Inclusion criteria
The studies were included according to the following five criteria: 1. The study included at least one group trained solely on CWMT and at least one control group not engaged in adaptive CWMT or any other adaptive WM-training program. This criterion was fundamental to isolate the variable of interest, that is, the impact of CWMT on performance in cognitive/academic tasks; 2. At least one cognitive/academic task was administered. Self-reported measures and parent/teacher rating questionnaires were excluded. 1 Also, when the control group was involved in activities closely related to one of the outcome measures (e.g., controls involved in a math course), the relevant effect sizes were excluded (e.g., tests of mathematical achievement); 3. The study included both pre-and post-test assessments; 4. The study reported new data (i.e., it did not report only duplicate results from previous studies); 5. The data reported in the study (or provided by the author) were sufficient to compute an effect size.
We searched for eligible published and unpublished articles through December 31st, 2017. We sent emails (n = 11) to researchers in the field asking for the necessary data to calculate the effect sizes. We received three positive replies. In total, we found 50 studies, conducted from 2005 to 2017, that met all the inclusion criteria. These studies included 637 effect sizes and a total of 3059 participants. The entire procedure is described in Fig. 1. The Supplemental materials available online contain the details of all the included studies and a list of the excluded studies.

Meta-analytic models
Each effect size was considered either near-transfer or far-transfer. The near-transfer effect sizes consisted of memory tasks referring to the Gsm construct as defined by the Cattell-Horn-Carroll model (CHC model;McGrew, 2009). Far-transfer effect sizes referred to all the other cognitive measures (for the details, see 3.4. Moderators section and the Supplemental materials available online). Two authors coded each effect size independently and reached 100% agreement.

Moderators
We chose seven potential moderators: 1. Allocation (dichotomous variable): Whether the participants were randomly assigned to the experimental and control groups; 2. Control group (active or non-active; dichotomous variable): Whether the CWMT group was compared to another cognitively demanding activity (e.g., non-adaptive training); no-contact groups and business-as-usual groups were considered as "non-active." Also, in one study (Hadwin & Richards, 2016) the control group was involved in non-cognitive tasks (cognitive behavioral therapy). This control group was labeled as "non-active" as well; 3. Baseline difference (continuous variable): The standardized mean difference corrected for upward bias (i.e., Hedges's g; see 3.5. Effect Size Calculation) between the experimental and control groups at pre-test assessment. This moderator was included to N. D. Aksayli, et al. Educational Research Review 27 (2019) 229-243 check whether the part of the observed heterogeneity was due to regression to the mean; 4. Age (categorical variable): Whether the participants were children (16-year-old or younger), adults (17-55-year-old), or older adults (older than 55) 2 ; 5. Population (dichotomous variable): Whether the participants were typical subjects not suffering from any clinical conditions (e.g., ADHD) or intellectual disabilities; 6. Measure (categorical variable): This moderator, which was added only in the far-transfer models, included (a) measures of cognitive skills such as fluid intelligence (Gf in the CHC nomenclature) and attentional skills (Gs/Gt); (b) measures of academic skills such as language ability (Language) and mathematical ability (Math); and (c) full-scale IQ (i.e., batteries including tests of verbal and non-verbal intelligence, and sometimes tests of Gs/Gt). Those effect sizes that did not fall into any of the above categories were labeled as (d) "miscellaneous." The first two authors coded each effect size for moderator variables independently. The Cohen's kappa was κ = 0.98. The two authors resolved every discrepancy by discussion; 7. Criterion (categorical variable): Whether the task resembled one of the training tasks in CWMT (very near transfer) or was a different memory task (lesser near transfer). This moderator was added only in the near-transfer models. The first two authors coded each effect size for moderator variables independently. The inter-rater agreement was 98%. The two authors resolved every discrepancy by discussion.

Effect Size Calculation
The effect sizes were calculated for each eligible task reported in the primary studies. Those effect sizes that were redundant (e.g., sum of digit span forward and backward when the individual indexes were reported) were excluded.
The effect size used was Hedges's g. The formula for the effect size was: e post e pre c post c pre pooled pre (1) where M e_post and M e_pre are the mean performance of the experimental group at post-test and pre-test, respectively, M c_post and M c_pre are the mean performance of the control group at post-test and pre-test, respectively, SD pooled_pre is the pooled pre-test SDs in the experimental group and the control group, and N is the total sample size. The formula for the corresponding sampling error variance of the effect size was 3 : where d is the standardized mean difference (i.e., the first factor of Equation (1)), N e the size of the experimental group, and N c the size of the control group (Hedges & Olkin, 1985;Schmidt & Hunter, 2015, pp. 292-293).

Modeling approach
We employed robust variance estimation (RVE) with hierarchical weights and small-sample corrections to calculate the overall effect size and perform meta-regression analysis (Hedges, Tipton, & Johnson, 2010;Tanner-Smith & Tipton, 2014;Tanner-Smith, Tipton, & Polanin, 2016). RVE models nested effect sizes (i.e., effect sizes extracted from the same study) and calculates robust standard errors. RVE also estimates the within-cluster-variance (ω 2 ) and between-cluster-variance components (τ 2 ) expressing the amount of true heterogeneity in the dataset. We thus grouped all the effect sizes extracted from one study into the same cluster. The Robumeta software R package (Fisher, Tipton, & Zhipeng, 2017) was used to run the analyses.

Sensitivity analysis
To test the robustness of the results, we performed Viechtbauer and Cheung's (2010) influential case analysis (run with Metafor R package; Viechtbauer, 2010). This analysis evaluated whether some effect sizes were outliers or exerted an unusually strong influence on the overall effect sizes. 4 The meta-analytic models were thus run both with and without influential effect sizes.
Once the influential effect sizes were removed, we used Cheung and Chan's (2014) weighted-samplewise correction to merge the effect sizes extracted from the same paper. (For more details on the procedure, see the Supplemental material available online.) We then ran several 2 The type of CWMT (JM, RM, and QM) was not added as a moderator because it was confounded with age. 3 It is worth noting that the most accurate formula for the calculation of sampling error variance in repeated measures designs with control groups requires pre-post-test correlations (Schmidt & Hunter, 2015, pp. 343-355). Such information is rarely provided in the included primary studies (only Honoré & Noël, 2017 report pre-post-test correlations). That said, we think that the formula we used is an acceptable approximation. In the supplemental materials, we report the R codes to reproduce the results with the Schmidt and Hunter's (2015) formula assuming a realistic pre-posttest correlation (r = 0.650). The only appreciable difference is a further reduction of the amount of true heterogeneity. 4 A few very-near-transfer effect sizes were excessively large (e.g., g > 2). However, the influential case analysis did not detect them because the relevant sampling variances were too high. These effect sizes were excluded. For the details, see 4. Results section. publication bias analyses. 5 Running multiple publication bias analyses is recommended to test the robustness of the naïve (i.e., uncorrected) estimates (Kepes & McDaniel, 2015). First, we visually inspected the funnel plots for possible asymmetries. Second, we used the trim-and-fill analysis (Duval & Tweedie, 2000). This method estimates whether some smaller-than-average effects have been suppressed and calculates a corrected overall effect size based on the asymmetry observed in the funnel plots. We used all the three estimators (L0, R0, and Q0) described in Duval and Tweedie (2000; run with Metafor R package). The three estimators differ from each other only regarding the type of nonparametric test they implement. Using three different estimators is recommended in order to increase the reliability of the corrected overall effect sizes. Finally, since trim-and-fill analysis sometimes provide false negatives (i.e., no effect sizes filled in the presence of publication bias; Simonsohn, Nelson, & Simmons, 2014), we used the PET-PEESE estimates as a further method to assess publication bias (Stanley & Doucouliagos, 2014). The PET estimator is the intercept of a weighted linear regression where the dependent variable is the effect size, the independent variable is the standard error, and the weight is the inverse of the standard error squared (i.e., precision). The PEESE estimator is obtained by replacing the standard error with the standard error squared as the independent variable. If PET suggests the presence of a real effect (i.e., intercept different from zero; p < .100, one-tailed), the PEESE estimator must be preferred over the PET estimator (Stanley, 2017;Stanley & Doucouliagos, 2014).

Follow-up effects
A subsample of the studies reported both immediate post-test effects and follow-up effects. Two studies (Foy & Mann, 2014;Roberts et al., 2016) reported only follow-up effects. The effect sizes were calculated by replacing the numerator in formula (1) with the difference between the follow-up mean and the pre-test mean in the two groups. The analyses described above were run for follow-up effects as well.
Furthermore, we ran additional analyses to test the robustness of the effects from post-test to follow-up. We included only those studies that tested the participants at both post-test and follow-up. These analyses are reported in the Supplemental materials available online (Tables S1-S4).

Immediate post-test
The RVE model included all the effect sizes related to far-transfer measures, that is, those measures that shared no overlap with the training tasks. The overall effect size was ḡ = 0.048, SE = 0.031, 95% CI [-0.017; 0.113], m = 39, k = 194, df = 16.62, p = .135, ω 2 = 0.000, τ 2 = 0.006. We ran a meta-regression model including all the moderators. The only significant moderator was Baseline (b = −0.345, SE = 0.082, p = .001, ω 2 = 0.000, τ 2 = 0.000). This does not mean that the small positive effect found (ḡ = 0.048) is attributable to regression to the mean. In fact, the overall effect size at baseline was near-zero (ḡ = 0.011). Rather, only the low between-study heterogeneity observed (τ 2 = 0.006) was affected by the differences at baseline.
We found one influential case (g = 0.195, ID = 59; see Supplemental materials available online). After excluding this effect, the overall effect size was ḡ = 0.044, SE = 0.031, 95% CI [-0.022; 0.111], m = 38, k = 193, df = 15.92, p = .175, ω 2 = 0.000, τ 2 = 0.006. Again, Baseline was the only significant moderator (b = −0.345, SE = 0.082, p = .001, ω 2 = 0.000, τ 2 = 0.000). These results showed that the overall far-transfer effect was not significantly different from zero and that the observed true heterogeneity was only due to a statistical artifact (i.e., regression to the mean). Since all the observed true heterogeneity was accounted for, no variance was left to be explained and thus no other potential moderator could have affected the outcomes.
With regard to publication bias, the funnel plot looked slightly asymmetrical (a few extreme effects were observed on the right of the mean but not on the left; Fig. 2).
The funnel plot looked approximatively symmetrical (Fig. 3). We found two influential cases (g = 2.343, ID = 383 and g = 2.265, ID = 195). Another effect size was excluded because it was considered as an outlier (g = 2.152, ID = 552). After excluding these effects, the overall effect size was ḡ = 0.427, SE = 0.047, 95%  These analyses thus showed that the overall near-transfer effect was medium (ḡ = 0.444). This effect was robust to the exclusion of the influential cases. Finally, the effect was consistent. In fact, most of the observed true heterogeneity was accounted for by a few influential/extreme effects, the Baseline moderator (i.e., regression to the mean), and the degree of overlap between the trained task and the transfer task (moderator Criterion). The residual heterogeneity was low (ω 2 = 0.014, τ 2 = 0.030).

Analysis of criterion moderator.
The degree of overlap between the training tasks of CWMT and the memory tasks was a significant moderator. We thus ran separate analyses for very-near-transfer measures and lesser-near-transfer measures. First, we ran an RVE model including all the effect sizes related to very-near-transfer measures. The overall effect size was ḡ = 0.566, SE = 0.046, 95% CI [0.472; 0.660], m = 42, k = 154, df = 23.81, p < .001, ω 2 = 0.061, τ 2 = 0.040. We ran a meta-regression model including all the moderators. No moderator was significant.
These results showed that the participants' performance on those tasks that closely resembled the CWMT training tasks was greater than the overall near-transfer effect (ḡ = 0.566 and ḡ = 0.444, respectively). The model was homogeneous, especially when influential cases and outliers were removed, and baseline difference were controlled for.
Second, we ran an RVE model including all the effect sizes related to lesser-near-transfer measures. The overall effect size was ḡ = 0.246, SE = 0.069, 95% CI [0.102; 0.391], m = 33, k = 93, df = 19.49, p = .002, ω 2 = 0.000, τ 2 = 0.091. We ran a meta-regression model including all the moderators. Baseline was a significant moderator (b = −0.598, SE = 0.169, p = .006). Also, Age was a significant moderator, with the effect of the training significantly higher in the children than in the adults and older adults (b = 0.318, SE = 0.117, p = .018). These moderators explained nearly all the observed true heterogeneity (ω 2 = 0.000, τ 2 = 0.018). The overall effect size in the sample of children was ḡ = 0.457, SE = 0.082, p < .001. However, the higher effect size in children is probably due to a large extent to regression to the mean. In fact, the overall effect size at baseline was significantly negative (ḡ = −0.198, SE = 0.054, p = .004). In other words, this effect was a statistical artifact due to a certain amount of collinearity between Baseline and Age moderators.
The analyses thus showed a very clear pattern of results. While the very-near-transfer overall effect was medium (about ḡ = 0.450 at least), the lesser-near-transfer effect was significantly smaller (about ḡ = 0.250 at most). This pattern of results was in line with the hypothesis according to which transfer is a function of the extent to which the trained task and the target task overlap (i.e., share common features).
The pattern of results regarding near-transfer effects at follow-up was thus the same as that at immediate post-test: significant overall effects and some true heterogeneity mainly explained by Baseline and Criterion moderators. The only difference was the size of the overall effect. In fact, the post-test overall near-transfer effect was somewhat greater than the follow-up one (ḡ = 0.444 and ḡ = 0.365, respectively).

Analysis of criterion moderator.
Like immediate-post-test effects, the degree of overlap between the training tasks of CWMT and the memory tasks was a significant moderator at follow-up. We thus ran separate analyses for the very-near-transfer measures and lesser-near-transfer measures. First, we ran an RVE model including all the effect sizes related to very-near-transfer measures. The overall effect size was ḡ = 0.487, SE = 0.091, 95% CI [0.292; 0.682], m = 22, k = 60, df = 13.27, p < .001, ω 2 = 0.017, τ 2 = 0.118. We ran a meta-regression model including all the moderators. No moderator was significant.
We found one influential case (g = 1.998, ID = 99) and another effect was excluded because, like in the previous model, it was a The funnel plot looked slightly asymmetrical (Fig. 6).
The results at follow-up were very similar to the ones observed immediately after the post-test. Once again, the only difference was represented by the size of the effects. The uncorrected overall effect at follow-up was slightly smaller than its homologous at immediate post-test (ḡ = 0.487 and ḡ = 0.566, respectively). This pattern of results was slightly magnified in the publication-biascorrected estimates. While the corrected estimates were about ḡ = 0.450 at immediate post-test, they ranged between ḡ = 0.234 and ḡ = 0.380 at follow-up. This difference probably depended on post-test-to-follow-up attrition rate (i.e., only a portion of the studies reported follow-up effects).

Discussion
The present paper aimed to analyze the impact of CWMT on people's cognitive function and academic achievement. While the training regimen increased the performance on memory tasks, no appreciable effect was found in far-transfer tasks (no estimate significantly different from zero). The overall effect was estimated to be around zero at follow-up. 6 These outcomes corroborate the Fig. 6. Funnel plot of observed outcomes (gs) and standard errors of very-near-transfer measures at follow-up. 6 Interestingly, the lack of follow-up far-transfer effects also seems to reject the hypothesis that some time is needed in order for generalized results reported in most recent meta-analyses and systematic reviews in the broader field of WM training (e.g., Dougherty, Hamovits, & Tidwell, 2016;Gillam, Holbrook, Mecham, & Weller, 2018;Melby-Lervåg et al., 2016;Sala & Gobet, 2017a, b;Soveri, Antfolk, Karlsson, Salo, & Laine, 2017). Conversely, our findings contradict the more optimistic conclusions of those meta-analyses and systematic reviews specifically examining the impact of CWMT on far-transfer measures (Nutley & Ralph, n.d.;Pearson, 2016;Shinaver et al., 2014;Spencer-Smith & Klingberg, 2015).
The discrepancy between our findings and the ones reported in the latter reviews and meta-analyses stems from several factors. First, and most obvious, our meta-analysis includes many more studies than previous meta-analyses. The inclusion of the most recent studies has increased the precision of the estimated effects and corrected the early optimistic findings. Second, our inclusion criteria are stricter and have led to the exclusion of those studies whose experimental design does not meet a minimum standard of quality (e.g., inclusion of a control group and exclusion of subjective measures of cognitive/academic skill). In fact, poor design quality is often associated with more optimistic results in the field of cognitive training (Simons et al., 2016). Third, unlike the previous meta-analyses, we have employed a set of up-to-date meta-analytic techniques (e.g., RVE) and diagnostics (e.g., influential-case analysis and multiple publication-bias analyses). These methods are necessary to produce unbiased and more reliable estimates (Appelbaum et al., 2018). The quality of the modeling approach may thus explain the difference between ours and the other meta-analyses in the field.
Crucially, the far-transfer meta-analytic models did not exhibit any true between-study or within-study heterogeneity (ω 2 = 0.000 and τ 2 = 0.000 when baseline differences are controlled for). Overall, the results regarding far transfer are consistent with a nonphenomenon: no generalized effects occurred regardless of any potential moderator (e.g., outcome measure, age, or type of population). This consistency is, as far as we are concerned, the most significant novel aspect regarding this particular field of research. In fact, from a statistical point of view at the very least, the results referring to far-transfer effects were not mixed at all. Therefore, the idea that WM-training programs such as CWMT exert stronger benefits to low-WM individuals (e.g., Klingberg et al., 2002;Weicker et al., 2016) is not supported. Overall, these findings thus corroborate the hypothesis according to which the lack of broad generalization of cognitive skills acquired by training is an invariant of human cognition . To date, the empirical evidence indicates that the possibility of enhancing general cognition by training is scientifically implausible (e.g., Moreau, Macnamara, & Hambrick, 2018;. As pointed out by some scholars (e.g., Engle, 2015), human cognition is the product of a biological system. Thus, it is very unlikely that any short-term cognitive-training program could significantly affect it. CWMT appears to be no exception. Consequently, to date, CWMT cannot be recommended as an educational tool at any age and for any population. Furthermore, since the overall idea of fostering cognitive skills by training seems substantially implausible, these findings cast some doubts about the claimed positive effects of other commercial cognitive-training programs (e.g., Neuroracer, Cognifit, and Lumosity, just to mention some). In this respect, the present meta-analysis is in line with the general skepticism about the alleged benefits of commercial cognitive-training programs expressed by Simon et al. (2016) and reported in large trials (e.g., ACTIVE; Rebok et al., 2014). Future studies will contribute to refute or corroborate this view. Finally, given the consistent lack of broad generalization of skills, it is our conviction that other types of intervention should be preferred in order to improve academic achievement. More promising examples include teaching learning strategies (for a review, see McCabe, Redick, & Engle, 2016) and (footnote continued) benefits to occur (e.g., Pearson, 2016, p. 17).
The near-transfer effects deserve a more nuanced discussion. The training program increased performance on memory tasks immediately after post-test (ḡ = 0.444) and this improvement remained significant after several months, although it slightly decreased (ḡ = 0.365 at follow-up). The models showed some amount of within-and between-study true heterogeneity. Most of this heterogeneity was explained by the between-group differences at baseline and similarity between the training tasks in CWMT and memory tasks. As expected, the participants improved the most in those memory tasks whose demands and visual stimuli were very similar to the trained tasks (very-near-transfer; ḡ = 0.566 and ḡ = 0.487 immediately after post-test and at follow-up, respectively). These effects appeared to be relatively robust to publication bias (realistically, no more than 0.100-0.150 standardized mean difference of bias). As already highlighted by many researchers in the field (e.g., Shipstead et al., 2012a, b;Simons et al., 2016) and some of the authors of the primary studies included in this meta-analysis (e.g., Brehmer, Westerberg, & Bäckman, 2012), these effects should not be interpreted as evidence of memory enhancement. Rather, such effects denote improvement in the ability to perform the trained tasks.
CWMT also seems to exert a moderate effect on the participants' performance on those memory tasks not included in the training program or related to the trained tasks (lesser-near-transfer). The overall effect sizes were small but significantly different from zero; they remained significantly different from zero for a few months after the end of the training, although some decrease was observed (ḡ = 0.246, p = .002 and ḡ = 0.176, p = .005, at post-test and follow-up, respectively; see also Tables S1-S4 in the Supplemental material available online). These effects were also highly consistent (ω 2 = 0.000 and τ 2 ≤0.001 after controlling for baseline differences and excluding the few influential cases and outliers). Nevertheless, some evidence of publication bias was found at follow-up (e.g., the PET estimate was ḡ = 0.089).

Does CWMT enhance working memory?
The findings regarding far-transfer and very-near-transfer effects are easily interpretable: far-transfer does not exist with CWMT and very-near transfer indicates that the acquired skills can be used in highly similar tasks. In contrast, no straightforward explanation is possible for the improvements on those memory tasks not directly related to CWMT training tasks. A possibility is that the observed lesser-near-transfer effects stem from genuine cognitive enhancement. That is, CWMT may slightly increase WM capacity. The alternative possibility is that CWMT makes the participants more able to perform a certain class of tasks. For example, Shipstead et al. (2012a) have noticed that, even though complex span tasks (e.g., odd one out) are usually categorized as lesser-near-transfer (using our nomenclature), they still share some degree of overlap with the trained tasks (mostly simple-span tasks). Thus, people undergoing CWMT training may simply acquire the ability to perform such tasks slightly more efficiently than controls. That would explain the small observed effect sizes in the near-transfer measures and the concurrent absence of far transfer.
In line with Shipstead et al. (2012a, b), our opinion is that CWMT does not foster WM capacity, any other core cognitive mechanism, or academic skills. Two considerations lead us to this conclusion. First, the effect sizes observed in lesser-near-transfer measures are quite small and tend to diminish a few months after the end of the training. This result can be accounted for by the moderate, yet meaningful, degree of overlap between the trained tasks and memory tasks. Second, and most crucially, WM capacity is a major predictor of academic achievement and is highly correlated with fluid intelligence. Also, as seen earlier, low WM capacity is comorbid with several learning disabilities. Enhanced WM capacity is supposed to make information processing more efficient, which, in turn, should bring a wide set of benefits in academic, professional, and social contexts (see Pearson, 2016). Thus, if CWMT training program were enhancing the participants' WM capacity, improvements in other cognitive and academic tasks should have been observed at either post-test assessment or follow-up assessment. However, this was not the case.
That being said, we think that the topic deserves further investigation. Specifically, the field would substantially benefit from the study of the impact of the training on latent factors rather than observed variables. Cognitive skills are commonly defined as the shared variance between many different cognitive tasks (e.g., Strata II and III of the CHC model; McGrew, 2009). Improvements on a latent factor extracted from a broad set of memory tasks would represent far more compelling evidence of cognitive enhancement than that often provided in the reviewed primary studies, which are based on few observed measures. Such an experimental design would dramatically contribute to settling the debate regarding the true significance of near transfer induced by CWMT and any other cognitive-training program.

Conclusions
This meta-analysis has examined the impact of CWMT on people's performance on cognitive tests. Small to null effects were observed on far-transfer measures (i.e., fluid intelligence, attention, and mathematical/language skills). The findings were highly consistent (i.e., very low or no true heterogeneity). Thus, the CWMT had no appreciable impact on overall cognitive ability or academic skills.
More robust effects were found on measures of WM capacity very similar to the trained tasks (e.g., digit span and span board tasks). Nevertheless, CWMT exerted only a small effect on measures of WM capacity not directly linked to the trained tasks. Differences at baseline accounted for most of the observed true heterogeneity. Because of the small size of the effects and the lack of generalization across other cognitive and academic skills, the presumed benefits of CWMT on WM capacity remain doubtful. Future studies should test the effect of CWMT on latent factors estimated from many different measures of WM capacity.