The implementation of widely used multi-tiered service delivery models, such as response to intervention (RTI), can provide increasing numbers of students with access to evidence-based instructional practices, universal and systematic screenings, and progress monitoring (e.g., Fletcher and Vaughn 2009; Fuchs and Fuchs 2006). Research conducted on the effectiveness of multi-tiered interventions in reading has shown overall improvements in reading outcomes for participating students (Mathes et al. 2005; O’Connor et al. 2005; Vaughn et al. 2009; Vellutino et al. 1996), as well as evidence that the incidence of reading disability may be reduced (Bollman et al. 2007; Carney and Stiefel 2008; O’Connor et al. 2005; O'Connor et al. 2013; Wanzek and Vaughn 2011), particularly for students in kindergarten and first grade.

Supplemental reading interventions implemented within RTI models are intended to provide targeted reading instruction to meet the needs of students who are at risk for or demonstrate reading difficulties. Less extensive, or Tier 2 type, interventions provide additional instruction for students who are not making adequate progress within the core, or Tier 1 type, instruction. Though Tier 2 type interventions may play a different role in the upper grades when beginning reading instruction diminishes in the core curriculum (Vaughn et al. 2010), at the early elementary level these less extensive interventions are preventative in nature, with the goals of early identification of children at risk for reading failure, implementation of a relatively brief dosage of intervention to allow these students to get on track with reading achievement, and identification of students who have more significant difficulties that may require more extensive interventions.

In the early grades, several meta-analyses have confirmed the value of foundational skills such as phonological awareness, phonics and word recognition, and reading fluency along with attention to higher order instruction in language and comprehension in helping students learn to read (National Early Literacy Panel 2008; National Reading Panel 2000). Examining studies across the elementary and secondary grades, Swanson and colleagues (Swanson 1999; Swanson and Hoskyn 2000; Swanson et al. 1999) noted higher effect sizes on word recognition measures when interventions used direct instruction, whereas effects were higher on comprehension measures when both strategy and direct instruction were used. Interventions provided in small groups with task scaffolding and student interaction also yielded higher effects on reading outcomes. A more recent meta-analysis of reading interventions examined effects related to responders and low responders in intervention (Tran et al. 2011). Thirteen studies were located, all conducted at the early or upper elementary levels. Effect sizes were moderated by pretest scores, but there were no moderating effects for the intervention intensity related variables of duration of intervention, length of sessions, number of sessions, or size of instructional group.

The intent of these interventions is to accelerate student reading achievement and assist students in meeting grade level expectations. Thus, the tiers of intervention in an RTI model are designed to increase in intensity according to student need, often with consideration for the type of instruction, instructional group size, and the dosage of intervention (Vaughn et al. 2012). Furthermore, the intent is that with increasingly intensive tiers of intervention, students are first provided opportunities to respond to interventions that are less intensive (Tier 2) before more intensive and extensive interventions (Tier 3) are implemented. Each of the previous meta-analyses mentioned earlier included interventions at all levels of instruction, including Tier 1 core reading instruction and supplemental Tier 2 or Tier 3 type interventions. However, recent syntheses of research have addressed the features, components, and associated student outcomes of the more intensive or extensive, Tier 3 type reading interventions at the early elementary (Wanzek and Vaughn 2007) and upper elementary/secondary levels (Wanzek et al. 2013), but a systematic review of less extensive, Tier 2 type interventions at the early elementary level has not been conducted. Yet, the research on these less extensive interventions at the early elementary level is more prevalent than the research on extensive interventions.

In their synthesis of early elementary (K-3) studies, Wanzek and Vaughn (2007) used interventions provided for 100 or more sessions (the equivalent of 20 weeks of daily intervention) as a proxy for intensiveness, explaining that it was the most reliable method of identifying and coding articles. The authors reported reading outcomes for study participants in the 18 studies identified, as well as the intensity features of these extensive interventions (i.e., duration of intervention, instructional group size, grade level, level of standardization) associated with high effect sizes. Findings revealed positive outcomes for students with reading difficulties and disabilities who participated in extensive interventions, with mean effect sizes ranging from 0.34 to 0.56 across various reading constructs. Effect sizes were larger if the intervention involved students in kindergarten or first grade and when the intervention was administered in the smallest group sizes (Wanzek and Vaughn 2007). With its emphasis on extensive, Tier III type interventions, studies were also coded for the level of standardization in the intervention approach. Standardized interventions specified the elements of reading instruction with well-defined daily lessons and materials selection. Conversely, problem-solving (non-standardized) interventions were defined as more individualized, with daily lessons planned based on student needs. Studies examining the effects of non-standardized interventions were not available in the corpus of studies included in the synthesis; thus, all findings represented standardized studies. However, the authors reported no differences in effect between highly standardized interventions (i.e., few or no modifications to the curricula) and those with less standardization (i.e., opportunities for the teacher to respond to students’ needs in the skills and strategies taught).

Wanzek and colleagues (Wanzek et al. 2013) extended the 2007 examination of extensive, Tier 3 type reading interventions with early elementary students to include students in the upper grades (Grades 4 through 12). Instruction in the foundational reading skills tends to fade in the general education reading instruction in these upper grades (Authors; Kent 2014). The criterion for extensive interventions for this synthesis was 75 sessions instead of 100, due to the type of instruction provided for secondary students; however, data for interventions of 100 or more sessions were disaggregated to contrast findings with the previous early elementary synthesis. Overall, the findings of the 19 studies indicated a small, positive effect for extensive interventions on reading comprehension, word reading, fluency, and spelling outcome measures. No evidence was found that intervention effectiveness differed by instructional group size, relative number of hours of intervention, or grade level of intervention, though only a small number of studies could be included in the moderator analyses.

The findings related to extensive interventions from kindergarten to twelfth grade have provided pertinent information for research-based decisions related to reading interventions. However, the larger corpus of less extensive interventions that are more typical of Tier 2 have not been synthesized. These less intensive interventions are perhaps more frequently implemented, particularly in the earliest grades (Gersten et al. 2008; Wanzek and Cavanaugh 2012), because they allow for initial examination of students’ response to intervention and the identification of students in need of more intensive intervention. Additionally, in the early elementary grades, all students are learning to read with general education instruction expected to include both foundational types of skills as well as higher level language and comprehension concepts (National Governors Association and Council of Chief School Officers 2010). Educators are continually faced with decisions regarding the most effective ways to implement these interventions. As with extensive interventions, implementation decisions such as the focus of the instruction that should be provided, the time that should be allocated, the most effective and feasible implementers, and the size of the instructional group arise when considering less extensive interventions. In an RTI model, these decisions are fundamental to intervention implementation, and in the case of Tier 2 type, less extensive interventions, they may ultimately determine who will be referred for more intensive interventions and/or special education.

The purpose of the meta-analyses reported in this paper is to extend the previous work on extensive, Tier 3 type interventions at the early elementary grades (Wanzek and Vaughn 2007) to the Tier 2 type interventions that were not included in the previous synthesis by examining the effects of less extensive interventions—occurring for 15–99 sessions—for students with or at risk for reading difficulties in kindergarten through third grade. We sought to identify the overall effects of these interventions on students’ foundational skills, language, and comprehension, as well as the intervention features that may be associated with improved outcomes.

We see this synthesis as filling the gap on reading intervention research examining the features, components, and outcomes related specifically to less extensive interventions that meet the Institute of Education Sciences (IES) RTI and multi-tier intervention practice guide criteria for a Tier 2 intervention. According to the IES practice guide, Tier 2 interventions typically meet for 20 to 40 min, between three and five times a week for a minimum of 5 weeks (Gersten et al. 2008). IES reading practice guide recommendations also include that Tier 2 instruction should be highly systematic and interactive, and instruction should focus on vocabulary and comprehension components in addition to phonemic awareness, decoding, and fluency. These recommendations were based on a summary of 11 high-quality research studies as well as panel expertise; however, a systematic literature search and meta-analysis of all available studies has not been conducted. We review all of the research meeting our criteria for less extensive interventions as a means of elaborating the knowledge base of the effectiveness of these Tier 2 type interventions.

Research Questions

None of the previous syntheses have provided an examination of the features and components of less extensive reading interventions for struggling readers in kindergarten to third grade. Therefore, these meta-analyses address the following questions:

  1. 1.

    What are the effects of less extensive reading interventions (i.e., 15–99 sessions) for students with reading difficulties?

  2. 2.

    What features (e.g., focus of instruction, group size) of these less extensive interventions are related to student outcomes?

Method

Studies were identified through a comprehensive search of the literature. First, we conducted an electronic search of ERIC and PsycINFO to identify studies published between 1995 and 2013, the same starting year (1995) as used in the previous, related synthesis of extensive reading interventions (Wanzek and Vaughn 2007), and extended to 2013 to reflect the most current research. We searched abstracts for key population search terms and roots (reading difficult*, learning disabil*, at-risk, dyslex*) in conjunction with reading search terms and roots (reading, interven*, phon*, fluency, vocab*, comprehen*) to yield the maximum number of potentially relevant articles. Second, a hand search of ten major journals commonly reporting reading intervention research for students with reading difficulties (Exceptional Children, Elementary School Journal, Journal of Educational Psychology, Journal of Learning Disabilities, Journal of Special Education, Learning Disabilities Research and Practice, Reading and Writing, Reading Research Quarterly, Remedial and Special Education, Scientific Studies of Reading, School Psychology Review) was conducted for 2013 to ensure full coverage.

Figure 1 provides an overview of the search process. The initial search yielded 37,523 abstracts for screening. Our keywords identified many abstracts from research in other disciplines (e.g., aphasia, dementia) that are related to terms such at-risk, disability, fluency, and comprehension. Thus, 37,127 were disqualified based on the abstract information. We examined the full text of the remaining articles (n = 396) and found a total of 69 articles describing 72 studies that met all selection criteria for the meta-analyses. We applied criteria similar to Wanzek and Vaughn (2007) except for the difference in the number of sessions:

Fig. 1
figure 1

Manuscript search flow diagram. Articles were excluded during the eligibility phase if they did not meet any of the following criterion: (1) participants identified with or at risk for reading difficulties; (2) participants enrolled in kindergarten through third grade (or ages 5 to 9); (3) intervention targeted early literacy in English, provided between 15 and 99 sessions, and was not part of the general education curriculum; (4) research design included experimental, quasi-experimental, or single subject designs that demonstrate experimental control (AB designs excluded); (5) the dependent variables addressed reading outcomes related to reading

  1. 1.

    The study was published in a peer-reviewed journal and written in English.

  2. 2.

    Participants were students identified with a learning disability, reading difficulty, or as at-risk for reading difficulties (e.g., students with low achievement, low phonemic awareness, low income, language disorders). We included studies with additional participants when more than 50 % of the participants were targeted students or disaggregated data were provided for students identified with learning disabilities, reading difficulties, or as at-risk.

  3. 3.

    The participants were enrolled in grades kindergarten through third grade (ages 5–9). Studies with additional participants were included when more than 50 % of the participants were in kindergarten through third grade or disaggregated data were provided for students in the targeted grade range.

  4. 4.

    Interventions targeted early literacy in English and were provided as part of the school programming (not home, clinic, or camp programs)

  5. 5.

    Interventions were provided for 15 to 99 sessions and were not part of the general education curriculum provided to all students.

  6. 6.

    At least one of the dependent variables addressed a reading outcome in phonological awareness, phonics and word recognition, reading fluency, vocabulary, oral language, or reading comprehension.

  7. 7.

    The research design was experimental or quasi-experimental and data were provided to calculate effect sizes (see “Effect Size Calculation” section).

These criteria were selected to identify studies that had been through the peer review process, had the features required to address the research questions (Tier 2 type supplemental reading interventions for students with reading difficulties in the early elementary grades, reading outcome data), and had sufficient data for conducting a meta-analysis.

Coding Procedures

We utilized the same coding document used by Wanzek and colleagues (Wanzek and Vaughn 2007; Wanzek et al. 2013) to extract and classify pertinent information from each study; the coding document had been developed based on elements specified in the What Works Clearinghouse Design and Implementation Assessment Device (IES 2011). We coded seven categories: (a) participants, (b) methodology, (c) intervention and comparison descriptions, (d) clarity of casual inference, (e) measures, and (f) findings. Participant information was coded using four forced-choice items (socioeconomic status, use of criteria for classifying students with disabilities, risk type, and gender) and two open-ended items (age or grades as described in text and risk type as described in text). Similarly, methodology information was gathered using a combination of forced-choice (e.g., research design, assignment method, fidelity of implementation, and pretest scores) and open-ended items (selection criteria). Intervention/comparison group information was coded using nine open-ended items (e.g., site of intervention, role of person implementing intervention, hours of intervention, duration of intervention). A written description of the treatment and comparison conditions was also provided. Information on clarity of causal inferences was gathered using six items for studies with random assignment (e.g., sample sizes, attrition rates, statistical assumptions) and nine items for quasi-experimental designs (e.g., equating procedures, attrition rates, statistical assumptions). Additional items allowed coders to describe the measures, indicate measurement contaminants, and record findings including data for effect size calculation.

Three people received four-part coding training: (a) instruction on the meaning of each item with several examples provided, (b) modeling of the processes by the trainer (researcher with experience coding), (c) practice coding with discussion of discrepancies among coders, and (d) a reliability test with the three coders coding the same article independently, compared to the trainer. Responses from each coder were used to calculate percentage of agreement (i.e., the agreements divided by the agreements plus disagreements). An interrater reliability of 90 % was established as the lowest allowable threshold for each coder; actual reliabilities ranged from 92 to 97 % for each of the seven categories. In addition, two raters independently coded each study. When discrepancies occurred, meetings took place to discuss the coding and reach consensus.

Effect Size Calculation

For all studies, the Hedges (1981) procedure for calculating unbiased effect sizes for Cohen’s d was used (also known as Hedges’s g). Hedges’s g was calculated by using the means and standard deviations for treatment and comparison groups when such data were provided. In some cases, Cohen’s d effect sizes, t test results, or analysis of variance results were reported and means and standard deviations were not available. For these effects, Cohen’s d or the t or F statistics and the treatment and comparison group sample sizes were used to calculate Hedges’s g. Each estimate of Hedges’s g was weighted by the inverse of its variance to account for potential bias in studies with smaller samples. All effects were computed using the Comprehensive Meta-Analysis (version 2.2.064) software (Borenstein et al. 2011). See Borenstein et al. (2009) for formulas implemented in the Comprehensive Meta-Analysis software for computing mean effects and their variance, Q statistics, and tests of the effects of moderators,

Meta-Analysis Procedures

Studies were included in the meta-analyses if they used a treatment-comparison experimental or quasi-experimental design and reported sufficient information to allow effect sizes to be computed. Nearly all studies used multiple outcome measures. These measures were coded as standardized (e.g., norm-referenced measures) or not-standardized (e.g., intervention or researcher-developed measures without norms) and by whether they measured foundational reading skills (e.g., phonemic awareness, phonics, word recognition, fluency) or language and/or comprehension. Four separate meta-analyses were conducted for standardized and not-standardized measures of foundational skills and language/comprehension. Standardized and not-standardized measures were meta-analyzed separately due to previous reading intervention research that has shown that effect sizes from standardized and not-standardized measures differ in magnitude (Swanson et al. 1999; Willingham 2007; Scammacca et al. 2013).

As recommended by Borenstein et al. (2009), dependence of effect sizes for studies that included more than one outcome measure that qualified for inclusion in any of the four meta-analyses was resolved by averaging the effect sizes from all measures and including the average and its standard error in the meta-analysis. To resolve the dependence in studies where more than one treatment group was contrasted with the same comparison group, a weighted mean effect size was computed that weighted effects by the sample size of each group (Borenstein et al. 2009). The variance of this combined effect also was computed taking into account the proportion of all study participants that are shared members of the control group.

A random-effects model was used to analyze the effect sizes and compute estimates of mean effects and standard errors. This model allows for generalizations to be made beyond the studies included in the analysis to the population of studies from which they come and is therefore preferred over a fixed-effects model (Card 2012). Recent methodological innovations in meta-analysis, such as multilevel modeling (Hox 2002) and structural equation modeling (Cheung 2008), were considered as approaches to the random-effects analyses of the effect sizes. However, the categorical nature of the moderators of interest significantly limited the ability to implement multilevel modeling or structural equation modeling, leading us to take a traditional approach to the meta-analysis. Mean effect size statistics and their standard errors were computed and heterogeneity of variance was evaluated by using the Q statistic. When statistically significant variance was found, moderator variables were introduced into the random-effects models, resulting in mixed-effects models. Moderators included (a) intervention type (foundational skills only or multi-component), (b) size of instructional group (one-on-one, group of two to three students, or group of four to five students), (c) grade level of students (kindergarten, first grade, or second and third grades), (d) implementer of the intervention (researcher or school personnel), and (e) total hours of intervention (categorized as 1–10, 11–20, 21–30, 31–40, and 41 or more). Size of instructional group, grade level, and total hours of intervention could not be treated as continuous variables because of the manner in which this information was presented in the studies included in this report. The size of instructional groups typically was reported as a range in the categories listed above. Second and third grade data were combined because in some studies these students were given intervention together, and the number of studies that treated them separately was too small to allow for a meaningful comparison of effect sizes. The total hours of intervention often was reported as a range or mean and standard deviation.

In some cases, studies failed to provide sufficient data to code all moderator variables. These studies were included in the overall estimate of the mean effect size in each meta-analysis, but were dropped from the moderator analysis for the variable(s) where data were missing. Additionally, levels of each moderator were included in the moderator analysis only if k ≥5 for that level because statistical power is very low when fewer than five studies are included in an analysis (Borenstein et al. 2009).

Results

Study Features

Table 1 provides the key features of each study. Of the 72 studies that met criteria for meta-analysis, there were 37 experimental studies, 30 quasi-experimental designs, and five studies with treatment and comparison conditions where assignment of students was unclear. There were 6617 students represented, with sample sizes across studies ranging from 20 to 881 students. Nineteen studies examined interventions provided in kindergarten only, with 27 studies in first grade only, seven in second grade only, two in third grade only, and 17 studies implementing interventions in multiple grades (14 of which included second and/or third grade participants). The samples were largely students identified only as at-risk for reading difficulties (65 %) generally based on deficits in pre-literacy skills. This is probably due to the large number of studies conducted at the early literacy levels (grades K-1). Twenty studies included students with reading difficulties based on deficits noted on print reading measures. Only six studies included samples of students with identified reading disabilities only. The majority (58 %) of the studies examined populations of students with low socioeconomic status. Most of the remaining studies worked with a mix of students, reporting with one third to one half of the sample from low socioeconomic backgrounds. Nine studies did not report information related to the socioeconomic status of the samples. The interventions that were implemented in these studies were provided to participants for between 15 and 99 sessions over approximately 4–32 weeks. Sixty-nine of the studies implemented intervention sessions of between 10 and 60 min with sessions of 20–30 min occurring most frequently (n = 39 studies). There were two summer school studies that implemented sessions of 120 or 190 min. A variety of implementers were noted in the studies, including general education teachers, special education teachers, researchers, and paraprofessionals. Fidelity of implementation was measured and reported in 40 of the studies.

Table 1 Summary of studies in the meta-analyses

Meta-Analytic Findings

Foundational Reading Skills on Standardized Measures

The estimate of the mean effect size across the 63 studies included in the analyses was 0.49 (p < 0.001; 95 % CI = 0.38, 0.59), indicating a moderate positive effect of intervention on students’ foundational reading skills. The variance as measured by the Q-statistic was statistically significant (Q = 187.55, df = 62, p < 0.001).

Analyses were conducted to determine whether differences in mean effect size between studies could be explained by one or more moderator variables. There were 31 effect sizes from foundational skill interventions (mean ES = 0.47) and 32 effect sizes from multi-component interventions (mean ES = 0.50). Thirty effect sizes were from interventions implemented 1:1 (mean ES = 0.50) while ten effect sizes were from small group interventions of two to three students (mean ES = 0.61) and seven from small group interventions of four to five students (mean ES = 0.44). Eighteen effect sizes represented researcher implementation of the intervention (mean ES = 0.52) and 42 effect sizes represented school personnel implementation (mean ES = 0.50). There were 12 effect sizes for the kindergarten level (mean ES = 0.54), 26 effect sizes for first grade (mean ES = 0.50), and 15 effect sizes for second and third grade (mean ES = 0.40). For the hours of intervention moderator, there were 11 effect sizes to represent interventions for 1–10 h (mean ES = 0.60), nine effect sizes each representing 11–20 h (mean ES = 0.36) and 21–30 h (mean ES = 0.50), eight effect sizes representing 31–40 h (mean ES = 0.75), and six effect sizes for interventions greater than 40 h (mean ES = 0.20). No statistically significant differences were found between groups based on any moderator variable, meaning there was no evidence that intervention effectiveness differed by intervention type, size of instructional group, grade level, implementer, or the number of hours of intervention. Table 2 presents the effect sizes by moderator, standard errors, and Q between statistics.

Table 2 Results from moderator analysis of standardized foundational reading skills measures

Foundational Reading Skills on Not-Standardized Measures

The mean effect size estimate for the 33 studies that included not-standardized measures of foundational reading skills was 0.62 (p = 0.004; 95 % CI = 0.47, 0.78), indicating a moderate positive effect of intervention on students’ development of foundational reading skills. The variance associated with the effect sizes was statistically significant (Q = 88.50, df = 32, p < 0.001). Only the moderator variables for intervention type, group size, and implementer type had a sufficient number of studies to allow for analysis. For intervention type, there were 21 effect sizes related to foundational skills interventions (mean ES = 0.59) and 12 effect sizes for multi-component interventions (mean ES = 0.67). The group size moderator was represented by 15 effect sizes for 1:1 intervention (mean ES = 0.56) and eight effect sizes for small groups of two to three students (mean ES = 0.71). Thirteen effect sizes represented researcher implementation (mean ES = 0.55) and 20 effect sizes represented school personnel implementation (mean ES = 0.70). None of the variables explained a statistically significant amount of variance. See Table 3 for effect sizes by moderator, standard errors, and Q between statistics.

Table 3 Results from moderator analysis of not-standardized foundational reading skills measures

Language/Comprehension on Standardized Measures

The 31 studies that included standardized measures of language and comprehension had a mean effect size estimate of 0.38 (p = 0.005; 95 % CI = 0.25, 0.51), indicating a small to moderate positive effect of intervention on students’ language/comprehension. Statistically significant variance was present (Q = 77.00, df = 30, p < 0.001); however, the results of moderator analysis indicated that none of the moderator variables explained a significant amount of the variance. There were ten effect sizes for foundational skill interventions (mean ES = 0.44) and 20 for multi-component interventions (mean ES = 0.35). Seventeen effect sizes represented 1:1 intervention (mean ES = 0.43) with five effect sizes related to small groups of two to three students (mean ES = 0.32) and six effect sizes representing small groups of four to five students (mean ES = 0.18). Six effect sizes came from studies with researcher implemented interventions (mean ES = 0.16), and 24 effect sizes were from studies with school personnel implementation (mean ES = 0.45). Six effect sizes were at the kindergarten level (mean ES = 0.34). Eleven effect sizes were at the first grade level (mean ES = 0.25), and eight effect sizes were at the second and third grade level (mean ES = 0.51). No moderator analysis for total hours of intervention could be conducted because the number of studies was fewer than five at each level with the exception of studies that provided more than 40 h of intervention. See Table 4 for effect sizes by moderator with standard errors and Q between statistics.

Table 4 Results from moderator analysis of standardized language/comprehension measures

Language/Comprehension on Not-Standardized Measures

Only six studies provided effect sizes for not-standardized measures of language and comprehension. The mean effect size estimate was 1.03 (p < 0.001; 95 % CI = 0.52, 1.53), indicating a large positive effect of intervention on students’ language/comprehension ability. The variance was statistically significant (Q = 17.64, df = 5, p = 0.003); however, given the small number of studies in this meta-analysis, no moderator analyses could be conducted.

Publication Bias

Publication bias was evaluated by using the trim-and-fill approach (Card 2012). This approach builds on a visual inspection of a funnel plot of effect sizes for asymmetry through an iterative process that seeks to correct any asymmetries found. Asymmetry can be evidence of the omission of null or very small effect sizes in studies that were conducted but not published. Trim-and-fill analysis deletes the effect sizes causing the asymmetry, calculates a mean effect size, and then returns the deleted effect sizes. Effect sizes for unpublished studies that may have been omitted are imputed, and the analysis repeats until the plot is symmetrical. The results indicate whether estimates of mean effect size may be biased by the exclusion of effect sizes from unpublished research.

In the present meta-analyses, results indicated that publication bias affected the mean effect size estimates for three of the four meta-analyses, suggesting there may be studies missing from these meta-analyses that were never published or that were not electronically identified through the search systems. In the meta-analysis of standardized foundational reading skills outcome measures, the trim-and-fill analyses found evidence of publication bias that suggested that 16 studies may be missing. The mean effect size calculated using imputed values for the missing studies was 0.32 (95 % CI = 0.21, 0.43). The meta-analysis of not-standardized foundational reading skills outcome measures indicated that publication bias did not affect the mean effect size estimate. For the meta-analyses of language/comprehension outcome measures, the trim-and-fill analysis suggested that eight studies are missing from the standardized language/comprehension meta-analysis and one study is missing from the not-standardized language/comprehension meta-analysis. The mean effect size estimate for standardized measures, including imputed values for missing studies, is 0.22 (95 % CI = 0.08, 0.36). For the not-standardized measures, the estimated mean effect size, including imputed values for the missing study, was 0.75 (95 % CI = 0.22, 1.28). Given these results from the publication bias analysis, the true mean effects may be somewhat lower than reported in the original analyses.

Discussion

These meta-analyses are the first to provide a summary of the effectiveness of less extensive (Tier 2 type) interventions for students with reading difficulties in the early elementary grades (kindergarten through third grade). In an RTI model, less extensive interventions may be implemented to examine students’ initial response to intervention and/or need for more intensive or extensive interventions. Thus, more students with reading difficulties are likely to receive Tier 2 type interventions than more intensive or extensive interventions. Overall, the research demonstrated moderate, positive effects of less extensive interventions on both standardized and not-standardized measures of foundational reading skills such as phonemic awareness, decoding, word identification, decoding fluency, word identification fluency, and text reading fluency. Smaller effects were noted for less extensive interventions on standardized measures of language/comprehension, with the majority of the standardized measures assessing reading comprehension. The highest effects in these studies of less extensive interventions were found on not-standardized language/comprehension measures; however, there were only six studies that incorporated these types of measures. Thus, there is evidence that less extensive interventions may positively affect student reading outcomes in a variety of domains, with the highest effects and confidence for foundational reading skills.

The small to moderate effects of less extensive interventions for students in kindergarten through third grade are similar to the findings of the previous synthesis on extensive (100 or more sessions) interventions at these grade levels (Wanzek and Vaughn 2007). Wanzek and Vaughn found mean effect sizes from 0.34 to 0.56 on measures of foundational reading skills following extensive intervention. A mean effect size of 0.46 was also noted in the 2007 synthesis on measures of reading comprehension, though the authors did not separate the effects of standardized measures. Thus, a variety of reading intervention implementations have been shown to improve student reading outcomes in the earliest grades.

Significant variance was found in each of the current meta-analyses suggesting the outcomes in the corpus of studies varied. The noted variance among the studies was not significantly explained by intervention type, instructional group size, grade level, implementer, or total hours of intervention provided in the studies. In terms of our research questions, these findings would suggest that the intervention main effects held consistent even when these implementation features were considered, at least on the standardized outcome measures where the number of effect sizes available allowed us to statistically examine the greatest number of moderators. However, as Borenstein et al. (2009) note, inadequate statistical power due to small ks will lead to findings of no differences in moderator analyses when in fact significant differences actually exist. Statistical power was below 0.90 for many of the moderator analyses. Therefore, our discussion of the implications of the moderator findings should be considered in light of the related limitations within the available research, including insufficient power.

In terms of intervention type, studies that implemented only foundational reading skills as well as studies that implemented multi-component interventions measured effects on outcomes for both foundational reading skills and language/comprehension. We examined whether variance in outcomes would be explained by these two types of interventions. Findings revealed that intervention type did not significantly explain variance in effects on these outcome measures. At least for students in the early stages of reading (kindergarten through third grade), these findings may indicate that there were no differences in immediate effects related to whether students received intervention in foundational reading skills instruction only or whether they received a multi-component intervention with both foundational reading skills and comprehension/language instruction. Nearly all of the multi-component interventions included comprehension instruction, whereas only about half of the studies included vocabulary instruction. This finding differs from research in the upper elementary and secondary grades where the highest effect sizes have been noted with multi-component interventions (Kamil et al. 2008; Scammacca et al. 2007; Torgesen et al. 2007; Wanzek et al. 2010). Approximately two thirds of the studies implementing the multi-component interventions were implemented in Grades K-1 only. Only seven studies implemented multi-component interventions with second and third graders only. As a result, our findings may be weighted towards studies in the earliest grade levels. It may be that for the youngest students, an intervention emphasis on foundational reading skills yields positive effects on comprehension due to their very beginning reading level, and the addition of comprehension instruction does not significantly increase immediate outcomes for students at this early level. More multi-component studies at second and third grade could provide further information on whether the addition of vocabulary and comprehension instruction in a less extensive intervention would differentially impact any of the outcomes when compared to less extensive interventions with a foundational reading skill emphasis only. There were too few studies (n = 3) implementing vocabulary or comprehension intervention only to be able to further compare differential effects of these reading intervention components. Nonetheless, these findings are aligned with IES practice guide recommendations for implementing Tier 2 interventions in three or more critical reading areas (Gersten et al. 2008).

The findings related to intervention type also align with the simple view of reading (Gough and Tunmer 1986; Tunmer and Chapman 2012) emphasizing the importance of foundational reading skills in the early acquisition of reading with higher level processes such as listening or language comprehension increasing in importance as students progress as readers (Kershaw and Schatschneider 2012; Tilstra et al. 2009). We do not interpret the findings from this synthesis as suggesting that the emphasis on multi-component reading interventions that include comprehension and language are unnecessary in the early grades, for two reasons: (a) the effects on outcomes were not statistically different between the intervention types, and (b) the effects of components emphasizing language and comprehension require more extensive time and may yield benefits that are realized later.

Group size also did not significantly explain variance in the effect sizes. In other words, similar student outcomes were noted for intervention that was provided 1:1, in groups of two to three students, and in groups of four to five students. We note that the sample sizes for the studies available for this synthesis allowed us to examine only 1:1 and small group instruction (either two to three students or four to five students), group sizes that have been found to improve student outcomes in previous research (Elbaum et al. 2000; Lou et al. 1996; Swanson et al. 1999; Vaughn et al. 2003). There were only three studies with larger group sizes (greater than five students), preventing us from examining a large group category in the meta-analyses. By way of comparison, the study-level effect sizes for the three studies including larger group sizes indicated that two of the studies had negative effects on foundational reading skills (−0.35, −0.04) and one study had a large positive effect (0.87) compared to the mean effect sizes ranging from 0.44 to 0.61 for the studies with smaller group sizes. No data for other types of measures were given for these three studies implementing larger group sizes. The mean effect size for 1:1 instruction in the previous synthesis of extensive interventions at the early elementary level was 0.51 (Wanzek and Vaughn 2007), very similar to the mean effect sizes of 0.53 to 0.59 found in the current meta-analyses. Thus, fairly consistent results are noted for 1:1 instruction across less extensive and extensive types of intervention. In contrast to the current study, the synthesis of extensive interventions could not examine the effects of small group due to a lack of studies (Wanzek and Vaughn 2007). Consequently, we cannot compare findings between the less extensive (Tier 2 type) interventions and previous research on extensive (Tier 3 type) interventions related to small group instruction, though Wanzek and Vaughn noted the three extensive intervention studies with the largest group sizes also reported the lowest effects. We also note that Tran et al. (2011) found no moderating effects for 1:1 versus small group instruction in a meta-analysis of student response to interventions.

We did not find that grade level was a significant moderator of effects. There were no differences in student outcomes for these less extensive interventions based on grade level for any of the measure types. Although the studies that included only kindergarten students focused largely on foundational reading skills in the interventions, all other grade levels had a mix of studies either focusing on foundational skills instruction or providing multi-component interventions. These results may differ from the findings on more extensive interventions. In the synthesis of extensive interventions, there was a trend in the effect sizes for larger effects in first grade compared to second and third grade, suggesting the benefits of early extensive intervention (Wanzek and Vaughn 2007); however, the moderation of this variable was not examined for statistical significance in that synthesis.

Across studies, we noted researchers, general education teachers, special education teachers, reading specialists, and paraprofessionals implementing the interventions. Examination of researcher versus school personnel implementers yielded no differences in effects on student reading achievement. These less extensive interventions appeared to be feasibly implemented by a variety of implementers. Fidelity information was reported in only about half of the studies, but was generally high when reported, perhaps accounting for the lack of differences in research staff and school staff implementation. Unfortunately, we were unable to further examine whether there were differences in student outcomes based on the qualifications of the school personnel implementing the less extensive intervention (e.g., general education teacher, special education teacher, paraprofessional) due to the small number of studies utilizing each type of personnel. Additional studies examining variations in school staff implementation would allow for a more nuanced analysis that could provide more detailed implementation findings for educators.

The variable related to the total hours of intervention could be examined only for the standardized foundational reading skills meta-analysis. Interventions in this corpus were implemented for 4–80 h (session lengths of 10–60 min with 30-min sessions as the most frequent) with no differences in effects on student foundational reading skills, suggesting these foundational skills may be positively affected in a relatively short amount of time. Tran and colleagues (2011) also noted no moderating effects for dosage variables (number of weeks, number of sessions) when examining student response to interventions. However, the lack of precise information on dosage for most of the studies prevented us from examining this potential moderator for other outcomes and also prevented modeling it as a continuous variable, which would have provided a stronger analysis and relevant implications. The lack of detail provided in most manuscripts on total dosage of intervention for participants has been noted previously (Wanzek and Vaughn 2007, Wanzek et al. 2013). This is a variable that could be examined in more detail with clear practical implications if future publications incorporate more specific dosage information for the participants.

Limitations and Future Research

These meta-analyses reveal a relatively large number of studies examining less extensive interventions for the early elementary grades (n = 72) compared to previous work on extensive interventions at these grade levels (n = 18; Wanzek and Vaughn 2007); however, even with this relatively large sample, there were an insufficient number of studies to adequately examine the effects of various moderators on student outcomes. The majority of studies in the current meta-analyses were at the kindergarten or first grade level, indicating better understanding of Tier 2 type interventions in those grades and less knowledge about the efficacy of Tier 2 type interventions in Grades 2 or 3. Additional studies with students in second and third grade would allow improved understanding of the impact on language and comprehension outcomes. The findings of these meta-analyses suggest confidence in less extensive interventions to improve foundational skills such as phonological awareness, phonics, and word recognition. Smaller effects were noted for standardized language/comprehension measures. There were large effects noted on the not-standardized language/comprehension measures, though the confidence interval demonstrated small to large effects once publication bias was taken into account. These findings signify opportunities for future research in the development of high-impact interventions for improving reading comprehension and also in the development and use of standardized comprehension measures. This future research could assist educators in decision-making for students with reading difficulties at the early grades. In addition, the publication bias findings of smaller possible effects for standardized foundational skill measures as well as standardized and not-standardized language/comprehension measures suggests it is possible that additional, unsuccessful, non-published research exists on this topic. Although it is not possible to access most manuscripts that authors do not publish, the smaller effects and confidence intervals noted in the findings should be considered. As noted earlier, our findings suggest the highest confidence for early elementary, less extensive interventions resulting in improved foundational skills.

In practice, there are large numbers of general education classroom teachers providing less extensive interventions in the schools (Kent 2014; Wanzek and Cavanaugh 2012). However, there is limited information on the effects of the implementation for these general education teachers who often have to simultaneously provide appropriate educational activities for other students who are not participating in the intervention. We were unable to examine differences among the type of school personnel implementing interventions due to small numbers of effect sizes available for each type of personnel. Further research on the effects of interventions provided by classroom teachers versus supplemental personnel is needed to improve our understanding of these treatments within the realities of classroom instruction.

Finally, we noted that variance in student outcomes was not explained by the moderators included in our analyses. Increased detail in publications regarding the instructional implementation and intensity of intervention implementation would help researchers in examining differences among studies that may be relevant to student outcomes and further contribute to decision-making in practice. For example, lack of detail in the number of hours of intervention in the corpus of studies limited the number of studies that could be included in the moderator analyses. Yet, hours of intervention is a key variable that schools currently consider when designing appropriate interventions for students. Additionally, there is not a universally accepted method for identifying students with reading difficulties, and, often, the description of the method for selecting students with reading difficulties lacks detail. These issues result in a wide variation of samples among the studies that cannot be controlled in the models. Additional detail regarding Tier 1 instruction is also limited in many studies. As has been noted in previous research, information on the quality of the core classroom reading instruction that students receive along with the targeted intervention would also help differentiate instructional characteristics that are most correlated with improved student achievement (Hill et al. 2012).

Summary

The research on less extensive interventions for early elementary students suggests interventions that focus on the foundational reading skills as well as multi-component interventions that also include comprehension instruction are effective in increasing reading outcomes, particularly in the area of foundational skills, for students at risk for or with reading difficulties. These interventions are effective at each of the early grade levels (K-3) and can be feasibly implemented by a variety of implementers. In addition, the research supports intervention provided 1:1 and in small groups of five or fewer students.