Self-concept in poor readers: a systematic review and meta-analysis

Background The aims of this systematic review and meta-analyses were to determine if there is a statistically reliable association between poor reading and poor self-concept, and if such an association is moderated by domain of self-concept, type of reading impairment, or contextual factors including age, gender, reading instruction, and school environment. Methodology We searched 10 key databases for published and unpublished studies, as well as reference lists of included studies, and studies that cited included studies. We calculated standardised mean differences (SMDs) and 95% confidence intervals for one primary outcome (average self-concept) and 10 secondary outcomes (10 domains of self-concept). We assessed the data for risk of bias, heterogeneity, sensitivity, reporting bias, and quality of evidence. Results Thirteen studies with 3,348 participants met our selection criteria. Meta-analyses revealed statistically significant SMDs for average self-concept (−0.57) and five domains of self-concept (reading/writing/spelling: −1.03; academic: −0.67; math: −0.64; behaviour: −0.32; physical appearance: −0.28). The quality of evidence for the primary outcome was moderate, and for secondary outcomes was low, due to lack of data. Conclusions These outcomes suggest a probable moderate association between poor reading and average self-concept; a possible strong association between poor reading and reading-writing-spelling self-concept; and possible moderate associations between poor reading and self-concept in the self-concept domains of academia, mathematics, behaviour, and physical appearance.


INTRODUCTION
The ability to read is a normally-distributed cognitive skill, and hence 16 percent of people have reading skills that fall more than one standard deviation below the level expected for their age or grade (Shaywitz et al., 1992). Over the last decade or so, clinicians, teachers, and researchers have become increasingly concerned that people with poor reading are at increased risk for poor emotional health. This concern is supported by Similarly, Zeleke found that most included studies (N = 30) found a reliable difference between groups with and without learning disability for academic self-concept (89%), but not for general self-concept (68%) or social self-concept (70%). Unfortunately, neither of these reviews reported if the overall effect sizes were statistically reliable, or if the groups with learning disability included poor readers. Thus, these reviews do not provide direct evidence for the strength or statistical reliability of the association between poor reading and poor self-concept.
These reviews do, however, suggest a second potential explanation for the mixed outcomes of previous studies. Both reviews found larger group-effect sizes for academic self-concept than general self-concept and social self-concept. This pattern of results suggests that poor reading may be more closely associated with some domains of selfconcept (e.g., academic self-concept) than others (e.g., social self-concept). Thus, previous studies of the association between poor reading and poor self-concept may have produced mixed outcomes because poor reading is associated with some domains of self-concept but not others (Explanation 2).
It is also possible that these mixed findings emerged because some studies did not recruit participants with the ''right'' type of reading problem. People with poor reading have different reading difficulties. Some find it hard to read words accurately via phonological recoding (i.e., the ability to use letter-sound rules to read new words), some with reading words via visual word recognition (or ''whole word reading''), and some with reading words via with phonological recoding and visual word recognition (Castles & Coltheart, 1993;McArthur et al., 2013;Peterson, Pennington & Olson, 2013;Stuart & Stainthorp, 2015;Ziegler et al., 2008). In contrast, some children have no problems with phonological recoding or visual word recognition, but struggle to read texts fluently (Meisinger, Bloom & Hynd, 2010) or understand the meaning of texts (Nation et al., 2010). It is possible that some of these reading difficulties are more closely associated with poor self-concept than others. Hence, the type of reading problem (or problems) experienced by a sample of poor readers may determine whether or not a study finds an association between poor reading and poor self-concept (Explanation 3).
The strength of such an association, if it exists, may also depend on contextual factors, such as the age of participants, their gender, the type of reading instruction that they have received, and their learning environment. Regarding age, there is evidence that self-concept fluctuates across the lifespan, dropping from childhood to adolescence, increasing throughout adulthood, and then declining in older age (Marsh, 1989;Robins & Trzesniewski, 2005). There is also evidence that reading self-concept, in particular, starts to decline after the first three years of instruction (Chapman & Tunmer, 1995), further supporting the idea that age may modulate the association between self-concept and reading. There may also be effects of gender on self-concept, as suggested by reports of poorer academic self-concept in females than males (Katzir, Kim & Dotan, 2018), and increased age-related declines in academic self-concept in girls compared to boys (De Fraine, Van (De Fraine, Van Dammxe & Onghena, 2007). Type of reading instruction may also affect the strength of the association between reading self-concept: Tunmer & Chapman (2002) and Chapman & Tunmer (2003) reported that children who were taught to read via word-level instruction had higher reading and academic self-concept than children instructed using text-based approaches. More broadly, self-concept-both general and academic-may be modulated by a child's school environment (Srivastava & Joshi, 2011;Yaratan & Yucesoylu, 2010). This evidence suggests that contextual factors-including age, gender, reading instruction, and school environment-may determine if a study finds an association between poor reading or not (Explanation 4).
In sum, we currently do not know if poor reading is associated with poor self-concept because of inconsistent findings in the existing literature. These mixed findings might arise for a number of reasons: (1) poor reading is not associated with poor self-concept, producing spurious and unreliable outcomes; (2) poor reading is associated with some types of self-concept but not others; (3) poor self-concept is associated with some types of reading problems but not others; (4) poor reading is association with poor self-concept in some contexts (age, gender, reading instruction, school environment) but not others. The aim of this study was to conduct a systematic review and meta-analysis to determine if there is a reliable association between poor reading and poor self-concept (Explanation 1), and if so, if this association is moderated by domain of poor self-concept (Explanation 2), type of poor reading (Explanation 3), or one or more contextual factors (age, gender, reading instruction, school environment; Explanation 4).

METHODS
The methods, analyses, and reporting procedures used in this review were guided by the rigorous standards used by Cochrane Reviews to summarise evidence across intervention studies. Minor adjustments were made to the methods to cater for the cross-sectional studies that were included in this review.

Differences between the registered protocol and review
This review differed from the pre-registered protocol in four respects (McArthur et al., 2016b). First, we stated that we would conduct a subgroup analysis to determine if the strength of the association between poor reading and self-concept differs for different types (i.e., subgroups) of self-concept (e.g., reading versus academic versus social versus parent/home). In the current review, a subgroup analysis involved: (1) allocating each accepted study to the appropriate subgroup (e.g., academic self-concept); (2) calculating the mean standardised mean difference (SMD) between poor readers and a control group across all studies in each subgroup; and (3) comparing the SMDs of the subgroups to identify any statistically significant differences. Unfortunately, as explained below, there were not enough studies (i.e., at least 10) to allow us to statistically compare the strength of the associations between poor reading and different domains of self-concept.
Second, we planned to use a second subgroup analysis to determine if poor readers with comorbid impairments (e.g., language or attention problems) are more likely to have poor self-concept than poor readers without comorbid impairments. This was an error in logic since we also aimed to exclude poor readers with comorbid impairments-a common approach used in studies of poor readers used to minimise confounding effects. Thus, this subgroup analysis was not attempted. Third, we planned to search seven sources for grey literature. We searched these sources to the best of our ability, but our efforts were hampered by poor search tools (opengrey.eu, base-search.net, trove.nla.gov.au, phcris.org.au/roar/, worldcat.org/) and non-relevant content (opendoar.org, research.allacademic.com/). We would not recommend these sources for future systematic reviews.
Fourth, based on the suggestions of a reviewer, we added contextual factors to the review that had not been included in registered protocol (age, gender, reading instruction, and school environment).

Types of studies
This review included studies that compared self-concept in one or more groups of poor readers to appropriate control data. Control data could be provided by a matched group of typical readers or a standardised normative measure. Studies could be cross-sectional studies or intervention studies. In the latter case, data were collected from the initial assessment session prior to intervention. Only studies that used groups of at least 11 participants were included in the review. This (lenient) criterion was calculated from the smallest N needed to detect a very large group effect (Cohens d = 1.3) with a power of 0.8 and significance of 0.05 (two-tailed test; AI Therapy Statistics' Sample Size Calculator)

Types of participants
Participants were English-speaking children, adolescents, or adults whose word reading accuracy or reading fluency was either one grade or year (for children) or one standard deviation (for children, adolescents, and adults) below the mean level of typical readers for no known reason. Specifically, they did not have a comorbid developmental disorder (e.g., autism, language impairment, attention deficit hyperactivity disorder, attention deficit disorder); a physical problem (e.g., impaired vision); or a neurological problem (e.g., brain damage) that could explain their reading difficulty.
This review focused on English-speaking poor readers because English is a nontransparent written language, meaning that many words cannot be read accurately using the letter-sound rules. This contrasts with transparent languages, such as Spanish and Italian, which can be read accurately using the letter-sound rules. The non-transparency of English makes it harder to learn to read than transparent languages, making reading failure more severe and obvious (Seymour et al., 2003). Severity of reading failure correlates with academic self-concept (McArthur et al., 2016a). Thus, the strength of the relationship between poor reading and self-concept may vary between languages. This review therefore focused on poor readers who spoke English as their primary language at school or work, who lived in a country where English was the official language, and who were receiving reading instruction in English. We did not include studies that included non-English speaking participants who had just arrived in an English-speaking country.
It is noteworthy that the reading criteria used in this review did not include poor reading comprehension on its own (i.e., without evidence of poor reading accuracy or fluency) because poor reading comprehension can arise from poor spoken language comprehension rather than poor reading (Gough & Tunmer, 1986). There is evidence that poor spoken language is associated with poor self-concept, raising the risk that poor spoken language, but not poor reading, could be responsible for an apparent association between poor reading comprehension and poor self-concept.
It is also noteworthy that in line with the latest Diagnostic and Statistical Manual of Mental Disorders (5th Edition), we did not include IQ as a criterion to identify a specific learning problem for reading. We also did not exclude participants based on age, gender, or socioeconomic status (SES) since reading difficulties are experienced by people across all these demographic variables.

Types of self-concept measures
We only included studies that indexed self-concept with standardised and normed measures that were administered directly to poor readers (i.e., not to carers or teachers). We excluded studies that used indirect self-concept measures administered to significant others since it is difficult for others to estimate a person's true perception of self, and because teachers and peers' perceptions of the academic and social competence of children with learning problems are typically negative (Kavale & Forness, 1996). We excluded studies that did not include standardised and normed assessments of self-concept since non-standardised and non-normed measures are less likely to have established reliability and validity than normed assessments, and are less able to reliably indicate if performance falls within or below the average range. If a study included both direct and indirect self-concept measures, or both standardised-normed measures and non-standardised-normed measures, only the direct and standardised-normed indices were included in the analysis.

Types of outcome measures
Primary outcomes. The primary outcome was ''average self-concept'', which was calculated for each study by taking the mean of scores for all self-concept assessments administered to each group (e.g., poor readers) in that study. This primary outcome was used to test Explanation 1.
Secondary outcomes. There are many different domains of self-concept. To identify the most relevant domains for this review, we were guided by the assessments used by the included studies. The second last column of Table 1 shows all these assessments, which could be categorised into 10 domains, as shown in the final column of Table 1: reading/writing/spelling self-concept, academic self-concept, school self-concept, work self-concept, math self-concept, behaviour self-concept, social self-concept, athletic selfconcept, physical appearance self-concept, and global self-concept. These secondary outcomes were used to test Explanation 2.
It is noteworthy that the last of these domains -global self-concept -represents the perception of oneself in general, which does not represent a specific domain of self-concept per se. We retained this category as a secondary reliability check for the primary outcome -average self-concept -which indexed the perception of oneself across multiple domains for many studies (see Table 1).
Timing of outcome measures. Primary and secondary outcomes were assessed at the same time as reading. We did not include studies that measured reading and self-concept at  We used the following search terms (or the equivalent for unpublished study databases). For poor readers, we used the terms: (1) dyslexia, (2) poor reading, (3) reading disability or difficulty or disorder or impairment or deficit or delay, (4) learning disability or difficulty or disorder or impairment or deficit or delay. For self-concept, we used the terms: (1) self-concept, (2) self esteem, (3) self confidence. For example, the search terms entered into PsycInfo were: 1. (dyslexi* or (poor adj1 read*) or ((read* or learn*) adj1 (dis* or diff* or impair* or def* or delay)) or (word blind*)) 2. self and (concept or esteem or confidence) 3. 1 & 2 4. Limit to English Language and Human

Searching other resources
The reference lists of included studies were reviewed to identify further relevant studies. We also identified and reviewed studies that cited included studies using Google Scholar.

Data collection and analysis
Data were collected and analysed according to Cochrane Review procedures. All statistics were calculated with Cochrane Review's REVMAN meta-analysis tool.

Selection of studies
Studies identified by the searches were first checked for duplicates, which were removed. Each study author was paired with another to form a ''review pair'' (GM with DF, NF with NB, MB with NB). Each author in each pair initially screened non-duplicates for eligibility using titles and abstracts. Works that did not include 'reading' or 'dyslexia' were removed since extensive pilot testing established that such works never include poor readers. Each study author compared their included and excluded studies with their review partner. Any inconsistencies were discussed in detail and until the source of the mismatch was resolved to the satisfaction of both parties. If no agreement could be found, then a referee was used to make a final decision (the first author for review pairs that did not include GM, and NB for review pairs that did include GM). Full-text versions of eligible studies were downloaded and again reviewed by review pairs. Each pair compared accepted and rejected studies, discussed any mismatches, and resolved any inconsistencies. Studies identified via the reference lists and citing studies were also reviewed by two authors, again with any mismatches discussed in person and resolved.

Data extraction and management
Data were extracted using a customised form included in Appendix S1. The form collected descriptive data (author name, year of publication, reading assessments, any subtests of the assessment, reliability coefficients of assessment, self-concept assessments, any subtests of the assessment, reliability coefficients of assessment) and group data (number of groups in the study, group type, group size, and group means and standard deviations for outcome measures).
Data were extracted by two people. Any inconsistencies between data extracted were discussed and resolved between the pair. Authors of studies were contacted if there was any ambiguity about data (e.g., missing data; see below). A table of correspondence with study authors is included in (Appendix S2). Data was entered into Cochrane's REVMAN by the first author. It was double checked by the second author.

Dealing with missing data
If a study had missing data (e.g., means, SDs), we requested that data from the corresponding author (see Appendix S2). If this request failed, we contacted the co-authors. If an appeal for missing data did not result in a full data set, we only included data for participants whose results were known.

Data synthesis
Multiple groups. If a study tested multiple groups of poor readers on a particular outcome, we calculated the average mean, SD, and N across these groups before comparing to the mean, SD, and N of the control group. We did the same if a study used multiple groups of controls before comparing to the poor readers.
Multiple tests. If a study measured an outcome with more than one assessment that used the same scale (e.g., scaled scores with a mean of 10 and SD of 3), we calculated the average mean and SD across the two assessments. If the assessments used different scales (e.g., one used scaled scores and one used z scores), we (1) used RevMan to calculate the SMDs for each measure separately, (2) calculated the mean SMDs for the two measures, (3) removed the original data entries for the two assessments, and (4) inserted a new entry that used the mean SMD for the experimental group, 0 for the control mean, 1 for the SDs of both groups, and the N of the study.

Group effects
All studies reported continuous data. Different studies used different assessments to measure outcomes that used different scales (see Table 1 for measures used in each study). We therefore used standardised mean differences (SMDs) with 95% confidence intervals (CIs) calculated from means and SDs for groups with poor reading and typical reading. We used a random effects model to compare SMDs of groups (rather than a fixed effects model) since we predicted that different studies would use different measures to assess self-concept, which would introduce heterogeneity between study outcomes in effect sizes. Random effects models adjust estimates to incorporate heterogeneity more effectively than fixed effects model, which presume similar effects between studies.
We considered SMDs of 0.20, 0.50 and 0.80 to represent small, moderate, and large group effects, respectively (Cohen, 1992). In line with Schünemann et al. (2011), we considered 95% CIs to be narrow if the range was around 0.10; medium if the range was around 0.30; and wide if over 0.60. These 95% CI ranges translate to high precision, moderate precision, and low precision in data. We considered group effects with a P value less than or equal to 0.05 to be statistically significant and hence statistically reliable.

Subgroup analyses
Six subgroup analyses were required to determine if there were statistically significant differences between: (1) domains of self-concept (reading/spelling/writing, academic, math, global, behavioural, school, physical appearance, work, social, athletic, and home; Explanation 2); (2) types of reading impairment (phonological dyslexia; surface dyslexia; mixed dyslexia; poor comprehenders; Explanation 3); (3) age groups (children aged up to 12 years; adolescents aged from 13 to 17 years; and adults aged 18 years and above); (4) gender types (female, male); (5) reading instruction types (word level versus text level); and (6) school environments (government school, private school, learning specialist school). In line with Cochrane Review standards, we planned to compare subgroups that comprised at least 10 studies to ensure adequate power (Deeks, Higgins & Altman, 2011).

Risk of bias
We used an adapted version of the Newcastle Ottowa Scale (NOS; (Wells et al., 2014) to determine risk of bias in the individual studies (see Table 2 and Appendix S3). We used this scale instead of Cochrane's Risk of Bias procedure because the latter was designed for intervention studies rather than cross-sectional correlational studies, which were the focus of the current review. Two independent authors (GM and NB) made ratings using this scale, which has a maximum of 9 stars/points. Studies were evaluated based on three tiers of ratings: Low (0 to 3 stars); medium (4 to 7 stars); and high (8 to 11 stars). If there was a mismatch between authors, these were discussed and resolved.

Heterogeneity
We used a Chi 2 test with a P value of 0.10 to examine the degree of consistency in the effect sizes found by the included studies (i.e., heterogeneity; Deeks, Higgins & Altman, 2011). Further, we used the I 2 statistic (with a cut-off value of 70%) to estimate the percentage of variance in the effects owing to heterogeneity rather than chance. For any outcome that Table 2 Risk of bias ratings for each included study. See Appendix S3 for meaning of a and b ratings, along with allocation of stars ( * ). Lower scores represent higher risk of bias (1-4 high risk; 5-7 moderate risk; 8-10 low risk). Boetsch, Green & Pennington (1996)

Sample (Maximum 4 points)
had an I 2 statistic greater than 70%, we (1) double-checked the data, (2) reconsidered the validity and reliability of the measures, and (3) examined outlier studies to see if there was an obvious reason for the outlying result. If something was identified in step (3), we redid the meta-analysis with the offending study removed.

Sensitivity analysis
We conducted two sensitivity analyses: 1. Removal of any studies with 10 or fewer participants in experimental and control groups 2. Comparison of fixed effects and random effects meta-analyses for outcomes with high heterogeneity.

Reporting bias
We used funnel plots to explore reporting bias for any outcome that had data from more than 10 studies which did not have similar standard errors for their effect sizes (Sterne, Egger & Moher, 2011).

Quality of evidence
We used a modified version of GRADE (Schünemann et al., 2011)-adjusted to suit crosssectional studies rather than intervention studies -to assess the overall quality of evidence for each outcome. When rating the evidence for each outcome, we started with a high rating. This rating was then downgraded one or two levels (to medium or low) or upgraded one levels for across six factors: 1. Risk of bias: No downgrade (0) if 75% + studies contributing to an outcome are low in majority of biases. Downgrade one level (−1) if 50% to 74% of studies contributing to an outcome are low in majority of biases. Downgrade two levels (−2) if fewer than 50% studies contributing to an outcome are low in majority of biases. 2. Heterogeneity: No downgrade (0) if I 2 less than 70% OR I 2 greater than 70% but assessment of heterogeneity analysis suggests it did not affect the reliability of results. Downgraded one level (−1) if I 2 = 70% to 85% and heterogeneity analysis suggests it does affect reliability of results. Downgraded two levels (−2) if I 2 greater than 85% and heterogeneity analysis suggests it does affect reliability of results.

Certainty of outcomes
We interpreted the certainty of each outcome based on Ryan, Santesso & Hill's (2016) guide for interpreting the certainty of treatment effects based on GRADE ratings. We modified this guide for use with cross-sectional data, and to take statistical significance into account. For outcomes that were statistically significant and based on high quality of evidence, we concluded that an effect was certain (e.g., a moderate association). For outcomes that were statistically significant and based on moderate quality of evidence, we concluded that an effect was probable (e.g., a probable moderate association). For outcomes that were statistically significant but with low quality of evidence, we concluded that the effect was possible (e.g., possible moderate association). For outcomes that were not statistically significant and had low quality of evidence, we concluded the effect was unlikely (e.g., an unlikely moderate association). Figure 1 shows a flow diagram of the search results. Searches of databases of published works identified 6,506 candidate studies. Searches of grey literature identified eight candidate studies. Searches of citations revealed 13 candidate studies. Together the searches revealed 6,527 candidate studies. Removal of duplicate studies resulted in 5,068 candidate studies. Double screening of titles and abstracts of these studies reduced this number to 443. Double examination of the full texts of these studies identified 97 papers. Double review of potential studies from reference lists and citations identified no studies that matched the selection criteria. One study was excluded during the data extraction phase due to lack of accessible data. This left us with 13 accepted studies. Two studies were reported in the same published article (Boetsch, Green & Pennington, 1996): one study focused on adults (hereafter Boetsch, Green & Pennington, 1996 (adults); and one focused on children (hereafter Boetsch, Green & Pennington, 1996 (children)). Thus the 13 accepted studies were reported in 12 research outputs.

Participants
Details of the participants in each included study are shown in Table 1.
Reading ability. The criteria used to recruit poor readers differed between studies. Two studies used a significant difference between actual and expected reading level (Boetsch, Green & Pennington, 1996 (adults); Boetsch, Green & Pennington, 1996 (children)); two studies used bottom 20-25% percentile cut-offs for age or grade (Chapman, Tunmer & Prochnow, 2001;Holmes, 2001); and two studies selected poor readers based on a reading level more than 1 SD below that expected for age (McArthur et al., 2016a;Taylor, Hume & Welsh, 2010). Some studies used a reading grade that was 1 year (Pih, 1984) or 2 grades (Murray, 1978), or ''far lower'' than the expected grade (Gold & Johnson, 1982), or was below the 6th grade (Palmieri, 1981). Other studies identified poor readers if their reading was more than 18 months (Kerwin, 1976) or 3 years (Robinson & Conway, 1990) below the age mean. One study used the British Psychological Society criteria, which was word-reading difficulties that were ''severe and persistent'' (Frederickson & Jacobs, 2001). There are three things to note about these criteria: (1) the various criteria well represent the range of criteria used to identify poor readers in reading research; (2) the criteria used by a study did not determine its inclusion in this review, which had its own criteria for reading (see Participants above); and hence (3) data presented by all these studies showed that the reading scores of these samples fell more than one SD below the level expected for their age.

Types of reading impairment
As is shown in Table 1 (Poor-Reader Type column), no study reported poor reader's type of reading difficulty. Examination of the tests used to assess the reading skills of participants suggests that all samples had a combination of different reading difficulties.

Outcome measures
The measures used by each study to measure primary and secondary outcomes are shown in Table 1 (see Self-Concept Assessment column). Measures used to assess the primary outcome (average self-concept) and secondary outcomes (different domains of self-concept) include the Adult Self Perception Profile (Boetsch, Green & Pennington, 1996 adults), the Self Perception Profile for Learning Disabled Children (Boetsch, Green the study did not assess participants on a quantitative reading test; the study did not include a self-concept assessment with known validity or reliability; the study only assessed reading using a measure of reading comprehension; the study recruited poor readers with comorbid problems the study focused on poor readers that did not speak English.

Primary outcome
Group effect. The outcomes of the random effects model for the primary outcome are shown in Fig. 2 and Table 3. The SMD for average self-concept was calculated from 13 studies and 3,348 participants. The number of domains from which average self-concept was calculated varied between studies: 10 domains for two studies (Boetsch, Green & Pennington, 1996-adults;Boetsch, Green & Pennington, 1996-children); six domains for two studies (Frederickson & Jacobs, 2001;Robinson & Conway, 1990); five domains for one study (Palmieri, 1981); four domains for one study (McArthur et al., 2016a); and one domain for five studies (Chapman, Tunmer & Prochnow, 2001;Holmes, 2001;Kerwin, 1976;Pih, 1984;Murray, 1978;Taylor, Hume & Welsh, 2010). The SMD for average self-concept was −0.57 (95% CI [−0.81 to −0.33]; Z = 4.65; P < 0.001). Note that a negative effect size for self-concept indicates poorer scores in poor readers. Table 2 shows the results of the risk of bias assessments for each study included for the primary outcome. All studies were rated as truly or somewhat representative of the average in the target population. The sample size of the majority of studies was not justified. All bar one study used a standardized reading assessment with data reported; only one study did not. All studies used English poor readers, and around half controlled for additional factors such as attention, age, sex, SES, neurological or medical problem. All studies used a normed index of self-concept, many read items aloud to the participants, and many provided self-report data in addition to parent or teacher report. The total risk of bias scores indicated that five studies had high scores (low risk), and seven studies had medium scores (medium risk). No studies had low scores (high risk).  Heterogeneity. The heterogeneity for the primary outcome was greater than 70% and statistically significant (Chi 2 = 55.66; df = 12; P <.001; I 2 = 78%). Thus, we (1) doublechecked the data, (2) reconsidered the validity and reliability of the measures, and (3) examined outlier studies to see if there was an obvious reason for the outlying result. The last step revealed a single study with a positive effect for average self-concept (0.20; (Palmieri, 1981). When we removed this study from the SMD calculation it strengthened the SMD somewhat (−0.62; 95% CI [0.-0.86 to −0.38]; Z = 5.02; P < 0.001)) but the heterogeneity was not reduced (Chi 2 = 50.35; df = 11; P < 0.001; I 2 = 78%).

Risk of bias in included studies.
Sensitivity. Since all studies had more than 10 participants, the sensitivity analysis involved comparing our planned random effects analysis to a fixed effects analysis (see Table 4). The SMD for average self-concept was stronger than the random effects model (−0.61; 95% CI [−0.72 to −0.50]; Z = 11.35; P <.001) but the heterogeneity remained exactly the same (Chi 2 = 55.66; df = 12; P < 0.001; I 2 = 78%). This sensitivity analysis and the heterogeneity analysis suggested that the primary outcome effect size was reliable despite the heterogeneity, hence we based our conclusions on the random-effects model, since it adjusts estimates to incorporate heterogeneity (Deeks, Higgins & Altman, 2011).
Reporting bias. Since our primary outcome had data for more than 10 studies ( N = 13 studies) which had varying standard errors, we examined the primary outcome for reporting biases using a funnel plot (see Fig. 3). While this plot did not show the approved inverted funnel shape (i.e., studies with greater precision cluster more closely around the SMD than studies with less precision), neither did it illustrate asymmetry due to (1) an absence of imprecise studies with small SMDs, or (2) a preponderance of imprecise studies with large SMDs. Thus, data available -albeit limited -does not suggest reporting bias for the primary outcome.
Quality of evidence. The GRADE ratings for average self-concept is shown in Table 5. The quality of evidence for the primary outcome was reduced to moderate by imprecision due to a large confidence interval [−0.81 −0.33].

Certainty of outcome.
Based on the quality of evidence ratings, SMDs, and statistical significance, we concluded that there is a probable moderate association between poor reading and average self-concept.

Secondary outcomes
Group effect. The number of studies contributing to the secondary outcomes varied considerably, ranging from global self-concept (10 studies) to work and home self-concept (a single study each; see Table 1). The same was true for number of participants, ranging from 2595 participants (academic self-concept; Note: participants include normative sample) to 36 participants (work self-concept). Figure 4 shows the SMDs (with confidence intervals) for each secondary outcome. In decreasing order of strength, the SMDs the self-concept domains were: reading-spelling-  Table 5 Quality of evidence (GRADE) rating table. For each outcome, the initial rating is high. This was increased or decreased according to the ratings of six factors (see following notes). The final rating is high, medium, or low quality of evidence, which defines the certainty of each outcome, which is based on the guidelines of Ryan, Santesso & Hill (2016). The following criteria were used to calculate the ratings (McArthur et al., 2018, Table 6): ''Note. 1. Risk of bias: No downgrade (0) if 75% + studies contributing to an outcome are low in majority of biases. Downgrade one level (−1) if 50% to 74% of studies contributing to an outcome are low in majority of biases. Downgrade two levels (−2) if fewer than 50% studies contributing to an outcome are low in majority of biases. 2. Heterogeneity: No downgrade (0) if I 2 less than 70% or I 2 greater than 70% but assessment of heterogeneity and sensitivity analyses suggest the outcome is reliable. Downgraded one level (−1) if I 2 70% to 85% and heterogeneity and sensitivity analyses suggest that it does affect reliability of results. Downgraded two levels (−2) if I 2 greater than 85% and heterogeneity and sensitivity analyses suggest it does affect reliability of results. 3. Indirectness: No downgrade if study directly measures outcomes of interest in the population of interest. Downgraded by one level if outcome or population are not measured directly. Downgraded two levels (−2) if outcome and population are not measured directly. 4. Imprecision: No downgrade (0) if confidence interval 0 to 0.3. Downgrade one level (−1) if confidence interval 0.3 to 0.6. Downgrade two levels (−2) if confidence interval 0.6 +. 5. Publication bias: No downgrade (0) if funnel plot done on more than 10 studies (Sterne, Egger & Moher, 2011), and no bias detected. Downgrade one level (−1) if funnel plot cannot be constructed (too few studies) but bias not suspected. Downgrade two levels (−2) if funnel plot not possible (too few studies) and bias suspected. 6. Other factors: Upgrade one level (+1) if large effect size (0.8+) or no plausible confounds.

Self-concept Outcome
Risk of Bias 1 Heterogeneity 2 Indirectness 3 Imprecision 4 Publication bias 5 Other 6 GRADE Certainty of outcome

Primary outcome
Average

Secondary outcomes
Reading-Spelling-Writing ); and home self-concept (0.31; 95% CI [−0.29 to 0.91). This order was interesting since it suggested that poor reading is most closely related to self-concept domains that are related to academia (e.g., reading/spelling/writing, academic, math).

Risk of bias.
Studies contributing to each secondary outcome were the same as those to the primary outcome. Hence, the studies contributing to each secondary outcome were a mix of low and medium risk of bias (see Table 2).
Heterogeneity. The heterogeneity of each secondary outcome is shown in Table 3. Heterogeneity was higher than 70% for five of these outcomes (reading-spelling-writing, academic, math, global, and school), so we again (1) double-checked the data, (2) reconsidered the validity and reliability of the measures, and (3)  Sensitivity. Since all studies had more than 10 participants, the sensitivity analysis involved comparing our planned random effects analysis to a fixed effects analysis (see Table 4). The SMDs for the latter were the same or somewhat higher than the random effects analysis, and the heterogeneity remained the same. This suggested that the effect sizes were reliable despite the heterogeneity.
Reporting bias. No secondary outcome had data from more than 10 studies and so none were examined for reporting bias.
Quality of evidence. The GRADE ratings for the different domains of self-concept are shown in Table 5. The quality of evidence for all these outcomes was low, primary due to imprecision of data (i.e., large confidence intervals) and because there were not enough studies to assess any of these outcomes for publication bias.

Certainty of outcomes.
Based on the quality of evidence ratings, SMDs, and statistical significance, we concluded that there was a possible strong association between poor reading and reading-writing-spelling self-concept. We also concluded that there was a possible moderate association between poor reading and self-concept in the academic, mathematic, global, behavioural, and physical appearance domains. Due to low quality of evidence, the small and moderate associations between poor reading and school, work, social, athletic, and home self-concept were concluded to be unlikely.

Subgroup analyses
As outlined above, six subgroup analyses were required for this review. In line with Cochrane guidelines, we planned to compare subgroups if they comprised at least 10 studies. None of the subgroups included this minimum number of studies. It is noteworthy that the heterogeneity of the outcomes for the subgroups with the largest number of studies (9 and 7 for global and academic self-concept, respectively) was high (i.e., I 2 greater than 70%). This review therefore lacked the power and reliability required for any subgroup analyses.

Summary of main results
Inconsistent findings in the existing literature obscure whether poor reading is associated with poor self-concept. In this systematic review, which identified 13 studies comprising 3,348 participants, we examined four explanations for these mixed findings: poor reading is not associated with poor self-concept (Explanation 1); poor reading is only associated with certain types of self-concept (Explanation 2); poor self-concept is only associated with certain types of reading problems (Explanation 3); and poor reading is more strongly associated with poor self-concept in some contexts more than others (Explanation 4).
A meta-analysis of the primary outcome data revealed that the association between poor reading and average self-concept was statistically significant and moderately strong. The reliability of this finding was supported by the association between poor reading and global self-concept, which was almost identical in size. These findings suggest a probable moderate association between poor reading and average self-concept, which fails to support Explanation 1 as an explanation for the mixed findings in the literature.
Subsequent meta-analyses of the secondary outcomes revealed that the association between poor reading and reading/writing/spelling self-concept was statistically significant and large, and that there were statistically significant and moderate associations between poor reading and self-concept domains of academia, mathematics, behaviour, and physical-appearance. In contrast, the evidence for associations between poor reading and self-concept domains of school, work, social life, athletics, and home was of poor quality. A lack of studies (i.e., at least ten per subgroup) prevented a statistical comparison of the associations between poor reading and different self-concept domains, which were planned to address Explanation 2. However, ranking these associations in order of strength suggested that poor reading was most closely associated with the domains of self-concept that focus most on reading and academia (i.e., reading-spelling-writing, academia, math). More studies of sufficient quality are needed to test this suggestion statistically.
Unfortunately, no study reported the specific type or types of poor reading that challenged the poor readers in their samples. Similarly, all bar one study failed to report on the type of reading instruction received by poor readers. The majority of included studies did report on the age, gender, and school environment of their poor-reading samples, but no subgroup within these contextual factors comprised 10 studies, prohibiting statistical comparisons between subgroups. Thus, we could not assess if poor self-concept is associated with some types of reading impairment and not others (Explanation 3). Nor could we assess if the contextual factors of age, gender, reading instruction, or school environment influenced the strength of the association between poor reading and poor self-concept. Many more quality studies are required to determine if type of reading impairment or contextual factors affect the association between poor reading and poor self-concept.

Overall completeness and applicability of evidence
The outcomes of the 13 studies in this review appear applicable to English-speaking poor readers for at least four reasons. First, studies were conducted in each of the major English-speaking countries in the world -specifically (in alphabetical order), Australia (two studies), New Zealand (one study), the USA (seven studies), and the UK (two studies).
Second, eight of the nine studies that reported statistics for gender recruited more males than females. This is representative of studies of poor readers in general (Miles, Haslum & Wheeler, 1998). Some researchers claim that this recruitment bias reflects a higher incidence of reading difficulties in boys than girls (e.g., Miles, Haslum & Wheeler, 1998). However, others have suggested that more boys than girls are recruited for studies because (1) boys with poor reading are more likely to misbehave when they are frustrated or bored than girls, and hence their failure is more apparent (Shaywitz et al., 1990); and (2) societies are more concerned about the academic success of boys than girls, raising awareness of failure in boys relative to girls (Sadker & Sadker, 2010). These suggestions are supported by studies reporting that girls and boys are equally likely to have poor reading (e.g., Shaywitz et al., 1990), and that girls and boys do not differ in their reading-related cognitive processes (e.g., Jiménez et al., 2011). Thus, while this review is representative of a gender recruitment bias in studies of poor readers, this bias needs to be avoided in future studies.
Third, many, but not all, poor readers in the included studies were reported to have IQ scores within or above the mean range. This reflects the type of poor reader who gains the most attention in reading research, namely, people with poor reading despite average intelligence (a condition that has been referred to as ''specific reading disability'' or ''developmental dyslexia''). However, as mentioned in the Methods, IQ is no longer used as a diagnostic criterion for learning difficulties in reading. Thus, the outcomes of this review are applicable to poor readers with various levels of IQ.
Finally, four studies in this review recruited adult poor readers, and nine studies recruited children with poor reading. This is representative of research on poor reading, which typically focuses on children. However, many children with poor reading carry their reading challenges into adulthood. Hence, it would be helpful if more studies of adult poor readers included measures of self-concept and its domains.

Quality of the evidence
As shown in Table 5, the quality of evidence in this review was based on five factors. The first was risk of bias. As illustrated by Table 2, all studies had a low-risk judgement for the majority of the biases assessed in this review. The second factor was heterogeneity (see Table 3), which was high (i.e., above 70%) for the primary outcome (average self-concept) and five of the 11 secondary outcomes. This was unsurprising given the limited number of studies that met the basic research criteria required for studies of poor readers. To determine the degree to which this heterogeneity may compromise the reliability of each outcome, we conducted heterogeneity and sensitivity analyses. The outcomes suggested that the outcomes with high heterogeneity were indeed reliable, and hence this did not compromise the quality of evidence of our outcomes.
The third factor was the directness of measures, which could not compromise the results because this review's criteria dictated that only studies using direct assessments of both reading and self-concept were included. In contrast, the fourth factor (imprecision) did affect the quality of the results because all outcomes had confidence intervals that were rated as wide or very wide. The fifth factor-reporting bias-was not an issue for the primary outcome (average self-concept). However, it was an unknown factor for the secondary outcomes which did not have enough studies to produce valid funnel plots.
In sum, the quality of evidence for this review was supported by analyses of risk of bias (low to moderate), heterogeneity (existent but not a threat to reliability), and reporting bias (for the primary outcome), but was challenged somewhat by imprecision (i.e., large confidence intervals) and by unknown reporting bias for secondary outcomes.

Potential biases in the review process
There are three reasons why there appeared to be minimal bias affecting the results of this review. First, the funnel plot of the outcome with the requisite number of studies (N > 10; average self-concept) suggested no evidence of reporting bias or bias owing to outliers. Second, a comparison of effects using fixed-and random-effects analyses revealed very similar results for all primary and secondary outcomes, suggesting statistical reliability. Third, sensitivity analyses for all outcomes produced similar results to the original analyses, again supporting the reliability of the outcomes.

Agreements and disagreements with other studies or reviews
One aim of this review was to determine if there was conflicting evidence for poor self-concept in poor readers because there is no reliable association between poor reading and poor self-concept (Explanation 1). The outcomes of this review failed to support this hypothesis, instead finding a statistically-significant moderate association between poor reading and average self-concept and global self-concept. These outcomes favour previous studies that found an association between poor reading and poor self-concept (e.g., Alexander-Passe, 2006) over those who did not (e.g., Tam & Hawkins, 2012).
At the same time, this review validates the conflict between studies that did and did not find poor self-concept in poor readers, since the secondary outcomes (i.e., the different domains of self-concept) produced inconsistent findings. Ranking these domains in order of strength of association with poor reading revealed that this association was strongest for domains of self-concept that focused on self-perceptions relating to reading and academia (i.e., reading/writing/spelling, academic, math). While this result may appear utterly predictable, it was not predicted by numerous studies in this review that did not assess poor readers for self-concept domains related to reading or academia (see Table 1). This finding provides preliminary support for the idea that the existing literature comprises mixed findings about self-concept in poor readers because poor reading is more closely associated with some types of self-concept than others (Explanation 2).

Implications for research and theory
In terms of future research, this review revealed that many more studies are needed to understand the association between poor reading and self-concept. It is important that these studies actually test the reading skills of their poor readers to confirm that they have reading problems. We were surprised by the number of studies that we had to exclude because participants were not actually tested objectively and/or recently for their reading ability.
It is noteworthy that we also excluded a handful of studies that assessed self-concept with non-standardised measures with unknown reliability and validity. It might be argued that this decision was too stringent, given that some non-standardised and non-validated assessments comprise items similar to standardised and validated measures. Our decision was guided by the same principle as the exclusion of studies that did not test participants' reading: maximising quality of data. Now that a statistically reliable association between poor reading and poor self-concept has been supported by good quality data, it might be of use to see if the same results emerged including studies that used non-validated self-concept measures employing similar items to validated measures.
Another observation made during this review was the number of studies that used a general measure of self-concept without measuring specific domains of self-concept. Given the review outcomes, which suggest that poor reading is most strongly related to self-concept in the domains of reading and academia, we would suggest that future research consider different domains of self-concept in addition to, or even rather than, a global or average measure of self-concept.
We were not surprised to find that the studies included in this review failed to report the type of reading impairments that characterised their sample of poor readers. Unfortunately, a small minority of reading studies recruit or report on the specific types of poor reading that characterise the poor readers in their samples. Future research focusing on any aspect of poor readers -including their self-concept-would do well to report this information.
It would also be helpful if future research conducted reviews like this for other languages. For reasons outlined under Types of Participants above, this review focused solely on English-speaking poor readers. It would be interesting to see if similar reviews done in different languages produced different outcomes, since these may provide clues about the mechanisms responsible for an association between poor reading and poor self-concept. For example, if this association was significantly weaker in poor readers who read a language that was quicker and easier to learn than English (e.g., Italian), we might hypothesise that poor reading is more obvious in English than Italian, which may lead to more negative feedback about poor reading in English than Spanish, and hence poorer self-concept.
Finally, as outlined in the Introduction, one theoretical impetus of this review was to determine if self-concept might be a mechanism linking poor reading to anxiety, which we found to be moderately and statistically-significantly associated with poor reading in a previous systematic review (Francis et al., 2019). The current review similarly found a statistically-significant moderate association between poor reading and poor self-concept. Whilst these moderate and significant associations are by no means evidence for a causal relationship between poor reading, poor self-concept, and anxiety, these associations do support the further investigation of self-concept as a potential factor linking poor reading to anxiety. We are currently conducting a case series intervention study and a cross-sectional study to explore this possibility.

CONCLUSIONS
In sum, this review assessed four possible explanations for the mixed evidence for an association between poor reading and poor self-concept: (1) poor reading is not reliably associated with poor self-concept; (2) poor reading is associated with some types of self-concept but not others; (3) poor self-concept is associated with some types of reading impairment but not others; and (4) the strength of the association between poor reading and poor self-concept may be affected by contextual factors such age, gender, reading instruction, and school environment. The outcomes of this review and meta-analyses failed to support the first explanation: there was a statistically-significant moderate association between poor reading and average self-concept as well as global self-concept. The outcomes provided preliminary support for the second explanation: self-concept in domains of reading and academia were more strongly associated with poor reading than other domains. Unfortunately, due to lack of reporting or lack of studies, this review was unable to assess if type of reading impairment or contextual factors (age, gender, reading instruction, school environment) influence the strength of the association between poor reading and poor self-concept.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
One author (Deanna A. Francis) on this manuscript received a Macquarie University Research Excellence Scholarship (MQRES). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Grant Disclosures
The following grant information was disclosed by the authors: Macquarie University Research Excellence Scholarship (MQRES).