Checking the “Academic Selection” argument. Chess players outperform non-chess players in cognitive skills related to intelligence: A meta-analysis

Article history: Received 27 May 2016 Received in revised form 23 December 2016 Accepted 24 January 2017 Available online 3 February 2017 Substantial research in the psychology of expertise has shown that experts in several fields (e.g., science, mathematics) performbetter than non-experts on standardized tests of intelligence. This evidence suggests that intelligence plays an important role in the acquisition of expertise. However, a counter argument is that the difference between experts and non-experts is not due to individuals' traits but to academic selection processes. For instance, in science, high scores on standardized tests (e.g., SAT and then GRE) are needed to be admitted to a university program for training. Thus, the “academic selection process” hypothesis is that expert vs. non-expert differences in cognitive ability reflect ability-related differences in access to training opportunities. To test this hypothesis, we focused on a domain in which there are no selection processes based on test scores: chess. This meta-analysis revealed that chess players outperformed non-chess players in intelligence-related skills (d = 0.49). Therefore, this outcome does not corroborate the academic selection process argument, and consequently, supports the idea that access to training alone cannot explain expert performance.


Introduction
Whether intelligence plays a significant role in determining expert performance has been intensely debated for more than a century. Starting with Galton's (1869) theory of hereditary genius, the idea that expert performance needs intelligence, and hence, that experts are on average more intelligent than laypeople, has received substantial support. Barron (1963) suggested that in some areas, such as physics and mathematics, the minimum threshold to achieve graduation is an IQ of 120. More recently, a longitudinal study showed that SAT scores of gifted children at the age of 13 predicted future success (e.g., number of scientific publications and patents) in their respective fields (Lubinski, 2009;Robertson, Smeets, Lubinski, & Benbow, 2010). Finally, in a meta-analytic review, Hunter and Hunter (1984) reported a strong positive correlation (r = 0.53) between job performance and general intelligence (g). Moreover, the strength of this relationship was positively related to job complexity. These results were confirmed in successive meta-analyses (Hunter, Schmidt, & Le, 2006;Schmidt & Hunter, 1998;Schmidt, Shaffer, & Oh, 2008). Ericsson (2014) has recently offered a competing view of expertise in which he argues that the difference in cognitive ability between experts and non-experts may be due to academic selection processes. Admission into undergraduate and postgraduate education programs is often based on test scores correlating with general intelligence measures (e.g., SAT for undergraduate studies, GRE for graduate studies, LSAT for law school). Therefore, Ericsson (2014) claims that if experts-such as scientists or musicians-score better on intelligence tests than non-experts, this may only be because individuals with higher cognitive ability had access to the training that made them experts.
Both views recognize the evidence that experts tend to have greater cognitive ability than their non-expert counterparts in many fields. However, they differ in what is assumed to account for the relationship between expertise and cognitive ability. The traditional view holds that cognitive ability is directly predictive of expertise. That is, experts have greater cognitive ability than non-experts because cognitive ability accounts for some of the variance in performance. The competing view (Ericsson, 2014) holds that cognitive ability predicts access to training, which in turn predicts expertise. That is, according to this view, experts have greater cognitive ability than non-experts because those with greater cognitive ability were selected for training; cognitive ability is not directly predictive of expert performance.
To test these competing views, we chose a field-chess-in which there is no selection process that limits access to training. According to the academic selection process hypothesis (Ericsson, 2014), if there are no barriers to training based on cognitive ability, then there should be no difference in cognitive ability between chess players who are training in this area of expertise and non-chess players who are not engaged in training.

Chess and expertise
Thanks to a reliable indicator of players' strength (Elo rating;Elo, 1978) and a balance between tactical calculation and strategic thinking (Hambrick et al., 2014), chess is widely considered an archetypical domain for the study of expertise, i.e., the ability to perform better than the majority of people in a particular domain . In fact, to emphasize the idea that chess is an ideal model environment for research in the psychology of expertise, Simon and Chase (1973) indicated that the role chess plays in cognitive science is comparable to that of Drosophila (i.e., fruit fly) for the field of genetics.
Chess is also an excellent model environment to investigate the importance of cognitive ability and deliberate practice--the engagement in specific structured activities designed to improve performance in a field--in the acquisition of expertise Gobet, De Voogt, & Retschitzki, 2004;Grabner, 2014). Achieving mastery in chess requires intensive practice, but other factors seem to play an important role, too. For example, Campitelli and Gobet (2011) found that the amount of deliberate practice necessary to reach expertise in chess varied massively between players, suggesting that deliberate practice alone cannot explain individual differences in expert performance in chess. Since chess is a complex and intellectual activity, general intelligence and more specific cognitive abilities-such as visuospatial ability, short-term and working memory, planning, processing speed, and problem-solving skills-have been considered as potential predictors of individual differences in chess skill (Bilalić, McLeod, & Gobet, 2007;Burgoyne et al., 2016;de Bruin, Kok, Leppink, & Camp, 2014;Frydman & Lynn, 1992;Jastrzembski, Charness, & Vasyukova, 2006;Schneider, Gruber, Gold, & Opwis, 1993;Waters, Gobet, & Leyden, 2002).
Unlike other fields of expertise, access to chess practice and instruction does not depend on performance on standardized tests. That is, chess is an activity open to many people without restriction. Joining a chess club does not require any preliminary testing, and participation in courses and tournaments is often open to players of any level and relatively affordable. Certainly, a few chess players are funded by chess federations and admitted to elite tournaments, but the selection is always based mainly (if not completely) on players' chess performance. Chess is thus an ideal domain to test Ericsson's (2014) argument that differences in cognitive ability between experts and non-experts are due to performance on standardized tests limiting access to the domain, given that no such limitation applies to the field of chess.

Purpose of the present study
To test the academic selection hypothesis, we investigated whether cognitive ability differences emerge between those who have entered a field and those who have not in a field where there is no selection test: chess. We conducted a meta-analysis across all the studies meeting our inclusion criteria that measured cognitive ability. This meta-analysis examines studies of natural groups-those who have chosen to enter the field and those who have not 2 -and thus our results cannot address the causal structure of a relationship between chess player status and overall cognitive ability. However, finding a relationship would support the notion that cognitive ability is a strong candidate factor to play a role in expert performance and that expert vs. non-expert differences in cognitive ability cannot be assumed to only be due to academic selection processes.

Method
We designed the random-effects meta-analysis and report the results in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement (Moher, Liberati, Tetzlaff, & Altman, 2009). The entire procedure is summarized in Fig. 1.

Literature search
A systematic search strategy was used to find the relevant studies. Google Scholar, ERIC, Psyc-Info, JSTOR, Scopus, and ProQuest Dissertations & Theses databases were searched to identify potentially relevant studies. We used two specific combinations of keywords: (a) chess AND (intelligence OR memory OR planning OR cognitive OR ability); and (b) (chess OR Elo OR DWZ OR Fide) AND (intelligence OR memory OR planning OR cognitive OR ability). In addition, previous narrative reviews were examined, and we e-mailed (n = 7) researchers in the field asking for unpublished studies and data.

Inclusion criteria
To be included in our meta-analysis the study needed to: 1. report a comparison between chess players and non-chess players; 3 2. include at least one measure of intelligence; 4 and 3. report an effect size or the authors needed to provide enough information to calculate an effect size.
We found seven studies, conducted between 1987 and April, 1, 2016, that met all the inclusion criteria. These studies included seven independent samples and 19 effect sizes, 5 with a total of 485 participants. (Table 1) 2.3.1. Anderson (2004) In this study, chess players (n = 80) and non-chess players (n = 46), aged 11-14 years, were administered the d2 Test of Attention (Brickenkamp & Zillmer, 1998). This test measures processing speed and discrimination of similar visual stimuli. No measure of chess skill was provided because the chess players, probably due to their young age, did not have an Elo rating (or equivalent). The participants were from two middle schools with open chess courses to their students. The chess players were those who had selected to enroll in a chess course. The chess players and non-chess players were matched on age and years of education.

Campitelli and Labollita (2016)
In this study, chess players (n = 25) and non-chess players (n = 25), aged 15-55 years, were administered the Tower of London (a measure of planning ability; Shallice, 1982) and the Cognitive Reflection Test (Campitelli & Gerrans, 2014). According to a commonly accepted metric (Elo N 2000;see Gobet & Simon, 2000), all the chess players can be considered chess experts, and the sample included some international grandmasters (Elo N 2500). The chess players and non-chess players were matched on age and level of education. 6 2.3.3. Doll and Mayr (1987) In this study, chess players (n = 27) and non-chess players (n = 88), aged 18-51 years, were administered the Berlin Intelligence Structure (BIS; Jäger, 1982), a test of general intelligence. The BIS consists of seven subscales measuring three content-related skills (verbal, numerical, and figural) and four operational skills (processing speed, memory, creativity, and information processing capacity). All the chess players in the sample were titled players (Masters, Fide Masters, and International Masters). The chess players and non-chess players were matched on age.
2.3.4. Hänggi, Brütsch, Siegel, and Jäncke (2014) In this study, chess players (n = 20) and non-chess players (n = 20), aged 19-41 years, were administered the Raven's Advanced Progressive Matrices (Raven, 1998), a mental rotation task (Peters et al., 1995), and a block-tapping test (a measure of visuospatial short-term memory; Schellig, 1997). The chess players were all experts, and there were several international grandmasters (Elo N 2500). The two groups were matched on age and level of education.
2.3.5. Unterrainer, Kaller, Halsband, and Rahm (2006) In this study, chess players (n = 25) and non-chess players (n = 25), mean age 29.3 years, were administered the Tower of London, the Standard Progressive Matrices (Raven, 1960), the digit span test, and the Corsi block-tapping test (Milner, 1971). Chess rating ranged from 1250 to 2100 (amateurs, intermediates, and experts). The chess players and non-chess players were matched on age and level of education.
2.3.6. Unterrainer, Kaller, Leonhart, and Rahm (2011) In this study, chess players (n = 30) and non-chess players (n = 30), aged 20-50 years, were administered the Tower of London with a time 3 Studies with experimental treatments were not included. See Sala and Gobet (2016) for a meta-analytic review of studies having incorporated an experimental chess treatment. 4 The measures of cognitive ability included in the present meta-analysis are varied, but the majority correlate moderately or strongly with general intelligence. For example, the approximate correlation between full-scale IQ and Raven's Progressive Matrices is 0.80 (Jensen, 1998); between full-scale IQ and mental rotation is 0.41 (Ozer, 1987); between full-scale IQ and block-tapping is 0.38 (Orsini, 1994); between full-scale IQ and the Tower of London is 0.46 (Morice & Delahunty, 1996); between the SAT and the Cognitive Reflection Test is 0.44 (Frederick, 2005); between full-scale IQ and perseverative errors in Wisconsin Card Sorting Task is −0.30 (Ardila, Pineda, & Rosselli, 2000); and, between fullscale IQ and 2-choice processing speed tasks is 0.41 (Vernon, 1983). 5 To account for effect sizes from dependent samples, we adjusted (i.e., lowered) the weight of such samples using the method designed by Cheung and Chan (2004). For a list of the adjusted Ns, see the data file openly available at https://osf.io/t3uk2/.  (2011) In this study, chess players (n = 22) and non-chess players (n = 22), aged 7-11 years, were administered the Wisconsin Card Sorting Task (WCST). 8 This test measures problem-solving and the ability to adapt to changing rules. Probably due to the young age of the participants, no measure of chess skill was available. The chess players were from a local chess club. The chess players and non-chess players were matched on age and years of education.

Effect size
The standardized mean difference (Cohen's d) in cognitive ability scores between the chess players and non-chess players was used as the effect size. The effect sizes were then corrected for attenuation due to measurement unreliability (Schmidt & Hunter, 2015) by using the two following formulas: Se where d′ is the corrected Cohen's d, Se′ the corrected standard error (Se), and a the square root of the reliability coefficient (see the data file openly available at https://osf.io/t3uk2/ for a list of the coefficients and corrected effect sizes and standard errors). Finally, the Comprehensive Meta-Analysis (Version 3.0; Biostat, Englewood, NJ) and Metafor (Viechtbauer, 2010) software packages were used for conducting the meta-analytic, publication bias, and outlier analyses.

Main model
The random-effects meta-analytic overall effect size was d = 0.49, CI [0.26; 0.72], k = 19, p b 0.001. The forest plot is shown in Fig. 2. The I 2 -a statistic that specifies the percentage of between-study variability in effect sizes due to heterogeneity rather than random error-was 66.56.
To evaluate whether any effect size exerted a strong effect on the meta-analytic overall effect size, we performed a one-study-removed analysis (Borenstein, Hedges, Higgins, & Rothstein, 2009;Kepes, McDaniel, Brannick, & Banks, 2013). This analysis showed that the maximum difference between the overall effect size (d = 0.49) and the sev-

Outlier analysis
To test whether some effect sizes had an unusually large influence on the overall results, Viechtbauer and Cheung's (2010) outlier detection analysis was performed. No outliers were found.

Publication bias analysis
Publication bias occurs when studies with small samples and small effect sizes are systematically suppressed from the literature. To investigate whether our results were affected by publication bias, we examined a contour-enhanced funnel plot (Peters, Sutton, Jones, Abrams, & Rushton, 2008) depicting the relation between effect size and standard error, and performed Duval and Tweedie's (2000) trim-and-fill analysis. In the presence of publication bias, effect sizes are missing from the bottom left part of the funnel plot (small effect sizes with high standard error). That is, when standard error is high, larger-than-average effects sizes (those on the bottom right) are more likely to be published than smaller-than-average effect sizes (those on the bottom left). The trimand-fill analysis estimates the number of missing studies from the funnel plot and imputes the missing effect sizes based on the observed data's asymmetry to create a more symmetrical funnel plot.
The funnel plot was approximately symmetrical around the metaanalytic mean (d = 0.49; Fig. 4), suggesting no presence of publication bias. The trim-and-fill analysis estimated no missing smaller-than-average effect sizes either left or right of the mean. Cumulative meta-analysis (Borenstein et al., 2009;Schmidt & Hunter, 2015) showed that the small-N effect sizes did not sensibly affect the overall effect size, suggesting no presence of publication bias (Fig. 5). 7 While the groups were matched on general intelligence, an additional measure of cognitive ability was administered (Tower of London) and thus we included it. Given the pattern of results-chess players tending to have higher intelligence than non-chess players-it is likely that our overall effect size would be larger had the study authors not matched the chess players and non-chess players on general intelligence. 8 Perseverative errors score is the only measure from the WCST correlating with measures of full-scale IQ (see Ardila, Pineda, & Rosselli, 2000). Thus, only the data concerning perseverative errors were used to extrapolate an effect size.    Finally, using multiple methods for the detection of publication bias is important to test the robustness of results (Kepes & McDaniel, 2015). We thus ran two additional publication bias analyses: (a) selection models (moderate two-tailed selection and severe two-tailed selection; 9 Vevea & Woods, 2005); and (b) PET-PEESE (Stanley & Doucouliagos, 2014). The results of the two analyses showed either minimal differences between the overall effect size (d = 0.49) and the point estimates, or non-significant point estimate (PET-PEESE). All the results are summarized in Table 2.

Additional meta-analytic models
Doll and Mayr (1987) was the only study not controlling for the level of education of the participants. Thus, we ran a random-effects meta-analytical model without the effect sizes extracted from that study.
The random-effects meta-analytic overall effect size was d = 0.38, CI [0.10; 0.66], k = 12, p = 0.009. The degree of heterogeneity was I 2 = 60.41. One-study-removed analysis showed that the maximum differ- With regard to publication bias analyses, the pattern of results was similar to the main model, that is small differences between the overall effect sizes and the point estimate (PET-PEESE point estimates were still not significant). All the results are summarized in Table 3, while cumulative meta-analysis is shown in Fig. 7.
Finally, four studies (i.e., Doll & Mayr, 1987;Campitelli & Labollita, 2016;Hänggi et al., 2014;Unterrainer et al., 2006) reported statistically dependent effect sizes (k = 16). To evaluate whether those effect sizes meaningfully affected the overall effect size, we calculated the weighted average effect sizes for three of these studies, while, for Doll and Mayr (1987), we used the reported full-scale measure of the BIS (see Section 2.3 and Table 1). We thus inserted the merged effect sizes in a new meta-analytic model. The random-effects meta-analytic overall 9 The two-tailed selections were preferred over the one-tailed selections because of the particular features of the studies included in this meta-analysis. In fact, several studies used the data analyzed here as control variables (e.g., Hänggi et al., 2014), or tried to support the null hypothesis (i.e., no significant difference between chess players and nonchess players; Unterrainer et al., 2011). Moreover, some studies reported more than one variable regardless of the significance of the outcomes (e.g., Doll & Mayr, 1987). Thus, there is no reason to think that high p-values were more likely to be suppressed from the literature (as assumed by one-tailed selections). Note. N = number of the participants; k = number of the effect sizes; d = random effects meta-analytic mean; 95% CI = 95% confidence intervals; 80% Cr. I = 80% credibility intervals; Q = weighted sum of square deviations from the mean; I 2 = ratio of true heterogeneity; τ = between-sample standard deviation; osr = one-sample removed: minimum, maximum, and median overall effect sizes; t&f = trim-and-fill; sm m = two-tailed moderate selection model; sm s = two-tailed moderate selection model; PET = precision effect test (one-tailed p-value in brackets); PEESE = precision effect estimate with standard error (one-tailed p-value in brackets).

Fig. 6.
Forest plot of the one-study-removed analysis without Doll and Mayr (1987). Each row reports the overall effect size (with 95% CI and p-value) that the model would estimate if the effect size of the corresponding study were removed from the analysis. The ds are sorted by magnitude (from smallest to largest). Note. N = number of the participants; k = number of the effect sizes; d = random effects meta-analytic mean; 95% CI = 95% confidence intervals; 80% Cr. I = 80% credibility intervals; Q = weighted sum of square deviations from the mean; I 2 = ratio of true heterogeneity; τ = between-sample standard deviation; osr = one-sample removed: minimum, maximum, and median overall effect sizes; t&f = trim-and-fill; sm m = two-tailed moderate selection model; sm s = two-tailed moderate selection model; PET = precision effect test (its one-tailed p-value in brackets); PEESE = precision effect estimate with standard error (its one-tailed p-value in brackets). a One effect size filled right of the mean. No missing effect sizes left of the mean. effect size was slightly smaller but still significant, d = 0.39, CI [0.06; 0.73], k = 7, p = 0.022. Due to scarcity of the effect sizes, no additional analysis was performed for this model.

Discussion
Ericsson (2014) argued that observed differences in cognitive ability between experts and non-experts are likely due to academic selection processes. That is, those who have higher cognitive ability are more likely to become experts because admission tests limit opportunities for those with lower cognitive ability to enter the field and receive training. Chess does not use academic selection mechanisms, which makes it an ideal domain to test whether cognitive ability differences emerge without the presence of confounding access limitations. We analyzed the performance of chess players compared to non-chess players in cognitive abilities related to general intelligence. The results demonstrated that chess players' overall cognitive ability is higher than age-matched comparison groups (d = 0.49, p b 0.001). Importantly, the difference in cognitive ability between chess players and non-chess players does not seem to be due to level of education, because this variable was controlled for in almost all the reviewed studies. When the only study not controlling for level of education (i.e., Doll & Mayr, 1987) was excluded from the analysis, the overall effect size remained significant (d = 0.38, p = 0.009). Thus, we found no evidence for Ericsson's (2014) claim that expert vs. non-expert differences in cognitive ability only reflect abilityrelated differences in access to training opportunities.
This finding, combined with the results of Burgoyne et al. (2016), a meta-analysis that found a significant positive correlation between chess skill and cognitive ability, provides evidence in favour of the idea that cognitive ability, to some extent, accounts for the acquisition of chess skill. Given that deliberate practice appears to be necessary, but not sufficient, to achieve high levels of expert performance in chess (Campitelli & Gobet, 2011;Hambrick et al., 2014), it is important to identify what other factors influence expertise acquisition in chess.

4.1.
What is the size of the "True Effect?" The advantage of chess players in overall cognitive ability remains statistically significant in all the meta-analytic models. However, the es- Evaluating the most likely estimate of the effect size depends mainly on the reliability of the results of Doll and Mayr (1987). Unfortunately, not enough information was provided to verify whether the educational level of the participants affected the results. If we assume that Doll and Mayr's (1987) results are trustworthy, then the probable mean estimate lies between 0.49 and 0.40 (the effect size of the main model and its smallest point estimate from publication bias analysis, respectively). If we assume that these results are unreliable, then the mean estimate is reasonably between 0.44 (i.e., the estimation of the trim-and-fill analysis) and 0.35 (i.e., the estimation of the two-tailed moderate selection model). 10 Either way, the size of the effect representing the superiority of chess players over non-chess players in overall cognitive ability appears to be approximatively medium (Cohen, 1988).

Limitations of the study
The total numbers of effect sizes (k = 19) and studies (n = 7; n = 6 without Doll & Mayr, 1987) are relatively small. For this reason, the outcome of the study, although statistically significant, must be interpreted with caution. In addition, although chess players outperformed nonchess players in almost all the reviewed cases, regardless of the participants' age and type of cognitive ability measured, the limited number of studies and effect sizes prevented us from testing for moderating effects of age and type of ability. For example, the difference between chess players and non-chess players may be more pronounced in youth than in adulthood (or vice versa). Moreover, this meta-analysis estimates an overall effect size from indicators of different cognitive abilities. In fact, the moderate degree of heterogeneity between effect sizes (I 2 = 66.56) indicates that the distribution is sufficiently homogenous to be aggregated in the same model (see footnote 4). Thus, the results support the hypothesis that chess players tend to have higher overall cognitive ability, but little can be inferred about the difference between chess players and non-chess players in particular cognitive abilities. For example, chess players may excel only in specific cognitive abilities (e.g., visuospatial ability, working memory) but not in measures of verbal reasoning. Additionally, it is yet to be determined whether the difference between chess players' and non-chess players' cognitive ability is stronger when non-chess players are compared to chess experts (Elo N 2000) and masters (Elo N 2200), or to amateurs and intermediate players.
Finally, our results specifically speak to the domain of chess. It is possible that the academic selection hypothesis holds in other intellectual domains. However, our results indicate that, at a minimum, the academic selection hypothesis does not hold for chess and therefore cannot be generalized to all intellectual domains. Future research is needed to test 10 The value provided by the severe two-tailed selection model (i.e., 0.30) is very likely to be an underestimation. First, the trim-and-fill analysis found a missing study right of the mean (point estimate 0.44), which means the overall meta-analytic mean (d = 0.38) may be an underestimation rather than an overestimation. Second, as mentioned in footnote 9, the assumption that non-significant effect sizes--especially the ones with p-values between 0.10 and 0.90--have been systematically suppressed from the literature is probably wrong. For these reasons, we consider the moderate selection model point estimate (i.e., 0.35) more reliable than the severe selection model one and sufficiently conservative. whether the academic selection hypothesis accurately reflects expertise in any domain.

Recommendations for future research and conclusions
As mentioned above, the number of studies comparing chess players to non-chess players on cognitive skills is quite small. Additional studies are needed that compare chess players of different ages and ratings to matched non-chess players on a variety of cognitive tests. A larger number of studies examining variables such as level of skill, the age of players, and a broad range of cognitive abilities would allow us to test more sophisticated models and hypotheses.
Based on the available evidence, we demonstrated that chess players tend to perform better than non-chess players on measures of cognitive ability. Consequently, cognitive ability is a strong candidate to play an important role, together with deliberate practice, in skill acquisition in chess. Contrary to Ericsson's (2014) hypothesis, this difference in cognitive ability cannot be explained by academic selection processes, because no such processes operate in chess. As previously mentioned, deliberate practice alone is insufficient to account for differences in chess skill among chess players (Campitelli & Gobet, 2011;Hambrick et al., 2014). In line with that finding, this study suggests that cognitive ability may also play an important role. To determine whether the effect found in this study goes above and beyond the effect of deliberate practice on chess skill, studies should follow Bilalić et al.'s (2007) recommendation of measuring both deliberate practice and general abilities.