Evaluation of a two-phase experimental study of a small group (“MultiLit”) reading intervention for older low-progress readers

The study reported here examined the efficacy of a small group (Tier 2 in a three-tier Response to Intervention model) literacy intervention for older low-progress readers (in Years 3–6). This article focuses on the second phase of a two-phase, crossover randomized control trial involving 26 students. In Phase 1, the experimental group (E1) received the 1 h literacy intervention daily for three school terms. The control group received regular classroom instruction. In Phase 2, the original control group received the intervention (E2). At the end of Phase 1, there was a statistically significant difference between groups and a large treatment effect on one of five measures—the Martin and Pratt Non-word Reading Test of phonological recoding. At the end of Phase 2, the large effect on phonological recoding was confirmed for the second experimental group, and there were also statistically significant differences with moderate or large effect sizes on four other measures—single word reading, fluency, passage reading accuracy, and comprehension. Subjects: Classroom Practice, Educational Research, Inclusion and Special Educational Needs, Middle School Education, Primary/Elementary Education, Research Methods in Education


Introduction
A large number of children in developed, English-speaking countries struggle to learn to read at even a functional level. The Progress in Reading Literacy Survey (PIRLS) is an international assessment of literacy of Year 4 students. In PIRLS 2011, the proportion of students who achieved at the minimum literacy benchmark or below ranged from 16% in the United States and Canada to 24% in Australia and

PUBLIC INTEREST STATEMENT
This study contributes to the evidence that effective reading instruction based on scientific research can accelerate the reading skills of low-achieving students from socially disadvantaged backgrounds. It also shows that older students (Years 3-6) respond well to comprehensive reading interventions designed for their age group. After students had three terms of a literacy intervention, their reading skills had increased significantly on five measures-decoding (phonics), word reading, fluency, accuracy, and comprehension. 25% in New Zealand (Thomson et al., 2012). Another international survey, Program for International Student Assessment (PISA), tests student literacy at age 15 years. In PISA 2012, the proportion of students achieving at the lowest literacy level or below ranged from 10.9% in Canada to 14.2% in Australia and 16.7% in the United Kingdom (Organisation for Economic Cooperation and Development, 2013).
Students from socioeconomically disadvantaged backgrounds are more likely to have low literacy achievement (Australian Curriculum, Assessment and Reporting Authority, 2013; Thomson, De Bortoli, & Buckley, 2013). The quality of reading instruction and intervention is a strong mediating factor in the literacy gap associated with socioeconomic status (Buckingham, Beaman, & Wheldall, 2013;Buckingham, Wheldall, & Beaman-Wheldall, 2013). Large-scale surveys of literacy research in the United States (National Institute of Child Health and Human Development, 2000), Australia (Department of Education, Science and Training [DEST], 2005), and England (Rose, 2006) have concluded that the best scientific evidence supports the finding that effective reading instruction has five "pillars": phonemic awareness, phonics, fluency, vocabulary, and comprehension. Each of these elements is necessary for the successful, early acquisition of reading skills and general literacy development. They are essential components of both effective classroom teaching and reading interventions for struggling readers.
The importance of early intervention for struggling readers cannot be overstated and is well recognized (Feinstein, 2007;Reynolds, Wheldall, & Madelaine, 2011;Stanovich, 1986;Torgesen, 2005). Many schools have at least one formal early reading intervention program, such as Reading Recovery, which targets Year 1 students (Clay, 1993;New South Wales Department of Education and Communities, 2013;Reading Recovery Council of North America, 2013;Tanner et al., 2011). Yet the statistics presented above indicate that at Year 4, substantial numbers of students are still in need of literacy support, whether because they missed out on early reading intervention, the intervention was ineffective, their reading difficulties were identified later, or they are students who require ongoing literacy support. There is therefore a need for literacy interventions aimed at older (Year 3 and above), low-progress readers. Low-progress readers are students whose literacy skills are well below those of their classmates'-around the lowest 25% of their age cohort (Pogorzelski & Wheldall, 2005).
The MultiLit reading intervention was designed specifically for older, low-progress readers. It exists in a number of formats. The MultiLit Reading Tutor Program is a 30-40 min a day, one-to-one format program, which is implemented in schools and at the MultiLit Literacy Centre. The MultiLit "Schoolwise" Program is conducted in tutorial centers which students attend for 3 h a day, five days a week. Students work in groups and individually with teachers. Evaluation of these programs has shown them to be highly effective (Wheldall, 2009;Wheldall & Beaman, 2000, 2010. The growing body of research supporting a Response to Intervention (RtI) approach to teaching and assessment indicates that there is a missing step in reading intervention offered in schools. In an RtI model, students are provided with increasingly intensive "tiers" or levels of instruction, depending on their reading progress. In a three-tier RtI model, Tier 1 is whole class instruction, Tier 2 is small group instruction, and Tier 3 is individual instruction. Students who are not making good progress in reading in class are provided with supplementary instruction in a small group. Students who are still struggling to make reading progress in the small group are provided with specialist one-to-one instruction (Gersten et al., 2009). A review of reading interventions by Slavin, Lake, Davis, and Madden (2011) found that small group instruction with a strong phonics emphasis can be beneficial to students whose reading difficulties are not extreme. The RtI approach is therefore both effective and cost-effective. Small group interventions allow more students to be given extra reading support, reserving the most intensive (and expensive) one-to-one instruction for the few students with serious reading difficulties.
The MultiLit small group program was developed as a Tier 2 reading intervention for students in Year 3 and above. A randomised control trial of the small group MultiLit program over three terms is described in Buckingham, Beaman, and Wheldall (2012). Classroom teachers identified the lowest 20 readers in each year (a total of 80 students), who were then given screening tests by trained testers. The 12 students with the lowest screening test scores from each year were selected for participation in the trial and randomly assigned into either the experimental or the control group. The control group had their usual classroom literacy instruction, while the experimental group attended MultiLit lessons for 1 h a day, four days a week, for three terms. All students in the control and experimental group were given a battery of tests pre-intervention, after two terms of intervention, and after three terms of intervention, and the results were compared.
At the end of three terms, the initial trial showed strong, statistically significant, positive results in phonological recoding only, with a very large treatment effect size (partial η 2 = .520). There were small treatment effects on single word reading (.057) and spelling (.037). Treatment fidelity had not reached an optimal level until the 14th week of the intervention and so on this basis, the school and researchers decided a second implementation would be worthwhile. In the second phase of the intervention, the original experimental group returned to their usual classroom literacy lessons and the control group replaced them in the small group MultiLit program, becoming a new experimental group. As there was some attrition of students from the school after the initial trial, 12 of whom were Year 6 students leaving for high school and another five of whom were students who moved away during the trial, the sample for the two-phase crossover study is smaller. This article focuses on the subset of students who participated in both phases of the trial. It compares the findings of the first and second implementations of the MultiLit intervention, evaluated as a two-phase, crossover study over six school terms.

Participants
Participants were 26 students from Years 3 to 6 in a public primary school with a high proportion of socioeconomically disadvantaged students, located in a large New South Wales regional town. Participants in the two-phase, crossover study are a subset of the participants in the initial (three-term) randomised control trial (each school term is approximately 10 weeks). There were 30 participants in the initial three-term trial-15 in the first experimental group and 15 in the first control group. Several students left the school during the second three-term phase of the study-one from the first experimental group (E1) and two from the second experimental group (E2). In order to maintain comparability of the two groups, the data from their matched pairs have also been excluded. As a result of these departures and exclusions, a total of 26 students in two randomised, matched groups participated in the six-term, two phase crossover study.

Procedure
In Phase 1 of the study, students in the first experimental group (E1) were withdrawn from class to participate in the group MultiLit program for 1 h a day, four days a week, for three terms (27 weeks) during class literacy time. Students in the control group remained in their usual classrooms (detailed in Buckingham et al., 2012). In Phase 2 of the study, which took place over the next three terms, the first control group became the second experimental group (E2) and participated in the group MultiLit program. The first experimental group returned to their usual classroom literacy lessons and became the Phase 2 control group. All participants were given a battery of tests before commencement of the intervention, again at the end of Phase 1 when the first experimental group (E1) had completed the intervention, and a third time at the end of Phase 2, when the second experimental group (E2) had completed the intervention. At the end of the study, both groups had participated in the group MultiLit program for three terms.

Intervention
The MultiLit program components are described in Buckingham et al. (2012). They were Word Attack (Accuracy), Word Attack (Fluency), Sight Words, and Reinforced Reading. The content and delivery of each component of the program were basically the same in Phases 1 and 2 of the study. However, some small changes in program implementation took place in Phase 2, including smaller group size and changes to the placement procedure. A brief description of the program is provided in Appendix 2.

Group size
In Phase 1, students in the MultiLit program were in instructional groups of six students for the first two terms, reduced to four students for the third term when the Year 6 students had left the school (Year 6 is the final year of primary school in NSW). In Phase 2, students in the MultiLit program were in instructional groups of four students for all three terms of the intervention.

Placement
In Phase 1, before beginning the MultiLit intervention, students were given the MultiLit Placement Test in order to determine the appropriate starting level of Word Attack (Accuracy) instruction. Students were allocated to instructional groups according to their starting level. The MultiLit Placement Test procedure in Phase 1 was to start instruction for each group at the lowest level required by any one group member, and then to proceed with instruction through each consecutive level. In Phase 1, almost all students were placed at the lowest level to start the program. It became apparent that this was too low for some of the older students in particular-their knowledge of phonics was uneven rather than consistently low.
In Phase 2, the placement procedure was changed to take this into account. Phase 2 MultiLit students were taught only the individual levels of the Word Attack program each group member had failed. Consecutive instruction of each level continued from the level failed by all group members. This change in procedure allowed some groups to quickly progress through the most basic Word Attack levels and move to their substantive instructional level. As a consequence, Phase 2 MultiLit students completed the Word Attack components more quickly and moved onto the additional program components developing fluency and comprehension.

Analysis
To compare the progress made by the experimental and their respective control or comparison groups, analyses of covariance (ANCOVA) were employed for each measure at post-test 1 (after three terms), with pre-test scores as the covariate in each analysis (with some exceptions detailed below). The alpha level was set at 1% (p < .01) to allow for family-wise comparisons in lieu of the use of a Bonferroni correction (Howell, 2008). In the second phase of the study, culminating in post-test 2, repeated measures t-tests were employed to evaluate the gains made by each group separately, again employing an alpha level of 1% (p < .01).
Treatment effects were also calculated for each measure in each phase of the study, using Cohen's d.

Results
The Phase 1 experimental group/Phase 2 control group will be called "E1," and the Phase 1 control group/Phase 2 experimental group will be called "E2." Means and standard deviations for all measures (raw scores) for the Phase 1 experimental group (E1) and Phase 2 experimental group (E2) at pre-test, post-test 1 (after three terms), and post-test 2 (after six terms) are shown in Table 1. Table 1 shows that the E1 group means were slightly lower than those for the E2 group's at pre-test on all measures but none of these differences was statistically significant. (The subsequent ANCOVA take these initial small group differences into account.) ANCOVA were conducted on the scores for each measure separately at post-test 1 with pre-test scores as the covariate for all measures except for the Neale Analysis of Reading Ability Accuracy and Comprehension components, as this test was not administered in the pre-test battery. Pre-test scores for the Burt (which correlates highly with the Neale) were used as the covariate for the two Neale measures.
For the second phase of the study, analyses of gains were evaluated using repeated measures t-tests, for the E1 and E2 groups separately.
Cohen's d was calculated for each measure at post-test 1 and post-test 2 to determine the size of the treatment effect. Results of all of these analyses are summarized in Table 1.

Results at the end of Phase 1-group means and treatment effects
Statistically significant, positive treatment effects at the stated alpha level (p < .01) were found for one measure-the Martin and Pratt Nonword Reading Test. The treatment effect size for this measure was large (d = .94). No significant differences were found between the group means for the other measures and there were negligible treatment effects. These analyses confirmed for the reduced groups the results reported for the original group by Buckingham et al. (2012).

Results at the end of Phase 2-group means and treatment effects
For Phase 2, the two groups, E1 and E2, were analysed separately. The E1 group (the original group who received treatment in Phase 1 and who were now returned to regular classroom instruction) failed to make further gains that were statistically significant on any measure at the stated alpha level. Effect sizes for all but one measure, the Wheldall Assessment of Reading Passages (WARP), were small or, in two cases, indicated a loss i.e. mean scores were lower at post-test 2 compared with post-test 1. In the case of the WARP, the effect size was large (d = .92) but it should be emphasized that the gain was not statistically significant.
For the E2 group, however, who received treatment (MultiLit instruction) during Phase 2 of the study, the results were very different. Statistically significant gains at the stated alpha level were made on all five measures. Moreover, the effect sizes were all large, ranging from 1.27 to 3.80 (see Table 1). In summary, at the end of Phase 2, the E2 group had made significant gains on all measures, with large effect sizes, whereas the E1 group made no significant gains.

Discussion
In this two-phase crossover study of a small group literacy intervention for older, low-progress readers (MultiLit), the second experimental group to receive the intervention appeared to display a much stronger response than the first experimental group. At the end of Phase 1, there was a statistically significant, positive effect of the intervention on only one of the five measures-the Martin and Pratt Nonword Reading Test. At the end of Phase 2, analyses of gains for the two groups separately showed statistically significant, positive gains for the second experimental group on all five measures (with large effect sizes) whereas the original experimental group (E1) (now experiencing control conditions) made no statistically significant gains.
It should be noted that a powerful effect of the program on phonological recoding was demonstrable in both phases of the intervention. At the commencement of the study, the two groups were very similar in terms of pre-test scores on the Martin and Pratt. At post-test 1, there was a statistically significant difference between the means and evidence for a large effect size (d = .94). In the second phase of the study, the initial control group, now experimental group 2, received the program and made large gains, as a result, reaching the gains made by the original experimental group. This provides strong evidence for the particular efficacy of the program on phonological recoding, arguably the most important skill to be mastered by older low-progress readers.
In Phase 1 of the study, there was a strong emphasis on phonics. All groups started at the lowest level of the Word Attack program. Approximately, half of each lesson was spent on phonics and this component of the program was the earliest to achieve treatment fidelity (Week 10). The skills learnt in the Word Attack component on the program most closely relate to those measured in the Martin and Pratt Nonword Reading Test; this is likely to explain the strong results on this measure of phonological recoding.
The results of Phase 2 are similarly strong for phonological recoding, but were also strong and significant for single word reading (Burt), fluency (WARP), passage reading accuracy (Neale), and comprehension (Neale). The powerful results in Phase 2 might be attributed to a number of factors.
Group size was lower in Phase 2 than in most of Phase 1. For two terms of the three-term Phase 1 intervention, the MultiLit students were in groups of six, decreasing to groups of four when the Year 6 students left the school. In Phase 2, MultiLit students were in groups of four for the entire intervention. In the smaller groups, testing time was shorter (allowing increased teaching time), there was a narrower range of ability levels in each group and, perhaps most importantly, the amount of time for each student to do reinforced reading was greater. All of these had a potentially positive impact on the program's efficacy.
Changes to MultiLit program implementation in Phase 2 may also have influenced the results. As noted in the method section, the placement of groups on the Word Attack component of the program was revised in Phase 2 so that all students would reach their substantive instructional level more quickly. This change in the placement procedure allowed students to progress through and complete the Word Attack component earlier in the intervention. Students were then able to expend more time on activities designed to develop the higher order skills of fluency and comprehension. The Phase 2 results provide evidence of the positive effect of this change in instructional focus.
There was also a noticeable difference in the behaviour of the Phase 1 and Phase 2 experimental groups. It is not clear whether this was a cohort effect, a function of the changes in group size and program implementation, or perhaps a third factor-instructional quality. Although treatment fidelity data was not collected in the Phase 2 implementation, it is likely that the MultiLit instructors were more proficient in teaching the program in Phase 2, and this positively influenced both behaviour and learning.
The lack of significant differences between the two groups at the end of Phase 1 in all measures except the Martin and Pratt was not because the experimental group had not made progress over the period of the intervention, rather it was because both groups made similar progress in this period. This suggests that all students had benefitted from another, unmeasured factor. During Phase 1, the school was undergoing substantial reforms to its teaching processes in all classrooms, including adopting explicit teaching methods in literacy (but without an increase in phonics instruction). This may have contributed to the control group's improved performance. Furthermore, during the trial all classrooms had fewer low-progress readers (half of whom were in MultiLit for 1 h each day during literacy time), which may have positively affected the instruction received by control students.
This two-phase, crossover study of the small group MultiLit program had some limitations. The final sample size was not large (26 students) and confined to one school, and the measures used do not cover the full range of literacy skills. Nonetheless, the study has provided evidence of the efficacy of the intervention, showing statistically significant and educationally important increases in both first-and second-order reading skills, especially in the second implementation. These results contribute to the research evidence on reading interventions for older students.
The study also reinforces the necessity of good experimental trials. The randomised control trial methodology is an important feature of this study. Without a control group for comparison, the Phase 1 results would have appeared to be stronger than they really were. It also demonstrates the benefits of trialling new programs over a realistic period of time. Even though the MultiLit program was based on the best available research and had good evidence of efficacy in other formats and in other settings, the initial results of the small-group school program trialled in this study were strong only in phonological recoding. It was not until the second phase that highly positive results were yielded, indicating that abandoning programs too early can be imprudent. Ethical judgements are required-if the Phase 1 implementation had shown an adverse effect on students' reading skills it would not have been repeated-but it may be too much to expect immediate strong benefits of even the most well-designed program.
Further research on this program to support the results would be ideal, but this study adds to the evidence that a comprehensive literacy intervention which explicitly develops the five essential skills of reading can markedly improve literacy skills among older low-progress readers.