An Open, Pilot Study of the Understanding Words Reading Intervention Program

The aim of this study was to assess the clinical efficacy of a new reading intervention program, Understanding Words, for struggling readers in an open trial design. Twenty-five participants who had poor reading skills and typically had a mix of coexisting developmental disorders completed the 40-hr program over 20 weeks. Significant gains were achieved on measures of word identification, phonological decoding, and reading comprehension. Growth in reading ability per hour of intervention matched the average reported in the literature. Individual analysis showed that 84% of the sample returned to the average range on a measure of phonological decoding and 52% to 56% achieved the same gains in reading comprehension. Limitations of study design and future research directions are also discussed.

have claimed that this knowledge is not being shared sufficiently in teacher training programs or applied effectively in schools (Coltheart & Prior, 2007;Pressley, 2002;Torgesen, Wagner, Rashotte, Herron, & Lindamood, 2010). A recent study has provided experimental support for these claims (Wright & Conlon, in press). The study evaluated the effectiveness of learning support services designed to target wordlevel reading skills in eight Australian primary schools. Student growth in word identification, phonological decoding, prose reading accuracy, phonological awareness, and pseudohomophone discrimination was measured at each of four school terms in a group of students with word-level reading difficulties (WLRD; n = 61) and a group of good readers (n = 52). The WLRD group was compared with good readers for two reasons. First, the "response-to-intervention" study was part of a broader study that required a good reader comparison to investigate sensory processing mechanisms in WLRD. Second, and most important, the aim of the study was to determine how effective learning support services were for students with WLRD. It was not an investigation of the efficacy of a particular program or type of instruction. An effectiveness study differs from an efficacy study in that the aim is to determine if an intervention can be delivered effectively under typical school conditions, which individuals with what characteristics are likely to respond positively, and how likely it is that the response will be clinically significant (Chambless & Hollon, 1998;Kendall, 1999;Seligman, 1995).
As the WLRD group was compared with a group of skilled readers, the analyses of interest were the time by group interactions. Nonsignificant interactions were found on a combined measure of word identification and nonword reading (Basic Reading Cluster from the Woodcock Diagnostic Reading Battery [WDRB]; Woodcock, 1997) and prose reading accuracy (Neale Analysis of Reading Ability-III; Neale, 1999). A significant interaction was found for a pseudohomophone discrimination task; however, the effect was in the negative direction, indicating that the WLRD became relatively worse than the control group over time. Furthermore, just 3 out of 61 students (4% of the WLRD sample) made sufficient word-level reading improvements to meet the preset criterion for clinical significance (posttest standard reading score of ≥92) and 7 out of 61 students (11% of the WLRD sample) met the same criterion for clinically significant change in prose reading accuracy.
Although the authors acknowledged the difficulty posed by potential differences in the type of instruction provided by individual schools, it was concluded that the data showed that the learning support services provided in some Australian schools were insufficient to produce real and meaningful change in reading skills for students who have reading difficulties. The data also point to the need for continued investigation into reading programs that can be effectively and cost-effectively employed in real-world settings. This article reports on the development and an open trial of a reading intervention program, Understanding Words (Wright, 2005), that was designed for such a purpose.

Program Development: The Essential Elements of Instruction
As noted earlier, best-practice guidelines recommend inclusion of seven types of instruction in reading intervention (e.g., NICHHD, 2000;Reynolds et al., 2010). Each type of instruction will be reviewed briefly to provide a rationale for their inclusion in Understanding Words (Wright, 2005).
Phonics. Experimental data (e.g., De Graff, Bosman, Hasselman, & Verhoeven, 2009;Hatcher et al., 1994;Johnson & Watson, 2006;Torgesen et al., 2001) and reviews (Bowey, 2006;Bus & Van Ijzendoorn, 1999;Castles & Coltheart, 2004;Ehri et al., 2001;NICHHD, 2000;Swanson, 1999;Torgerson et al., 2006) show that systematic teaching of synthetic phonics is a necessary component of intervention for struggling readers. Synthetic phonics explicitly teaches GPCs and encourages the child to use that knowledge to identify novel words by "decoding" the sounds made by each letter and thereafter blending the sounds into the whole word. Instruction works best when it is both systematic and cumulative. Systematic refers to approaches where GPCs are taught in a prespecified sequence. Cumulative implies that new knowledge in the teaching sequence builds on the previous and that practicing new skills includes review of previous knowledge (Torgerson et al., 2006).
Spelling. Spelling training can lead to improvements in reading for taught rules (e.g., Ehri & Wilce, 1980Kohnen, Nickels, Brunsdon, & Coltheart, 2008). For example, Ehri and Wilce (1987) gave spelling training to kindergarten students who had letter-name knowledge but who could not spell words with consonant clusters. The training consisted of teaching students to spell CV, VC, CVC, CCV, CCVC, VCC, and CVCC words (mostly nonwords) made of the consonants T, S, N, L, K, P, and the long vowels (A, E, I, and O) using letter tiles. At posttest, trained students were better at reading words that contained the same letter-sounds as the trained spelling words relative to control participants who were taught letter-sound knowledge but not spelling. There is also considerable data from neuropsychological case studies showing that spelling training leads to improvements in reading skill (e.g., Brunsdon, Coltheart, & Nickels, 2005;Brunsdon, Hannan, Coltheart, & Nickels, 2002;Kohnen et al., 2008).
Irregular words. Evidence supports inclusion of irregular words as a separate component of reading teaching for several reasons (Reynolds et al., 2010). First, one study has recently suggested that a bank of high-frequency words taught in the context of a phonics program will allow students to read approximately 90% of monosyllabic words encountered in texts (Solity & Vousden, 2009). Second, some English words can only be read via a direct lexical representation (e.g., yacht), and models of skilled reading (e.g., Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001) have shown the importance of having direct access to lexical representations of irregular words. Third, some students have specific difficulties with reading irregular words (e.g., Castles & Coltheart, 1993), and neuropsychological studies have shown that lexical reading can be improved with training (Brunsdon et al., 2002;Rowse & Wilshire, 2007). A recent review has suggested there is sufficient evidence to advocate for the inclusion of at least the 100 most frequent irregular words in early reading programs (Reynolds et al., 2010).
Fluency. Although reading fluency is recognized as being an important skill and an important goal of reading instruction (e.g., NICHHD, 2000;Reynolds et al., 2010), there is little evidence to suggest how that goal may be achieved. Two instructional practices have typically been used to improve reading fluency: guided repeated oral reading and silent reading. Of the two, the National Reading Panel (NICHHD, 2000) suggested that guided repeated oral reading might have an effect on reading accuracy, fluency, and comprehension.
Vocabulary. There is a reciprocal relationship between vocabulary and reading comprehension (Beck, Perfetti, & McKeown, 1982). Poor comprehenders have relatively low vocabulary (Nation, Clarke, Marshall, & Durand, 2004), and they have relative difficulty using context to infer the meaning of novel words (Cain, Oakhill, & Lemmon, 2004). Vocabulary also predicts individual variation in reading comprehension (Carroll, Snowling, Hulme, & Stevenson, 2003;Muter, Hulme, Snowling, & Stevenson, 2005). A large vocabulary may also benefit word-level skills via better specified phonological representations (Metsala & Walley, 1998). Vocabulary is also associated with irregular word reading (Byrne, Freebody, & Gates, 1992;Ricketts, Nation, & Bishop, 2007) and predicts response to intervention (Al Otaiba & Fuchs, 2002;Hatcher et al., 2006). Vocabulary can be taught via a direct, definition method or by using the contextual method (teaching a strategy to derive meanings of words from written context; for example, Beck et al., 1982;Nash & Snowling, 2006). Current evidence suggests that the contextual method works relatively better than the definition method (Nash & Snowling, 2006). However, using this method presumes a level of text-reading accuracy that many struggling readers do not have, and there is no existing evidence from controlled studies to indicate how to best teach vocabulary to this population.
Comprehension strategies. In addition to vocabulary, text comprehension strategies have been identified as an important factor in reading comprehension (e.g., NICHHD, 2000). Text comprehension is improved when teaching uses a variety of strategies such as question answering, summarizing, and question generation (NICHHD, 2000).

Understanding Words
The intervention program described in the current study was designed to meet the needs of Australian schools for effective and cost-effective intervention for struggling readers (Wright & Conlon, 2009;Wright & Conlon, in press). With effectiveness in mind, all seven types of reading instruction reviewed earlier were included in the program curriculum. With cost-effectiveness in mind, standardized administration scripts were written to allow paraprofessionals to deliver the program in small-group format. The purpose of the current study was to run an open trial to determine the feasibility of using the program with the types of struggling readers typically seen in real-world learning support settings.

Method Participants
Participants were 25 Australian schoolchildren (18 boys) referred by medical practitioners or teachers to a private clinic due to concerns regarding their reading. The mean age of participants was 8.76 years (SD = .96) and ages ranged from 7.0 to 9.9 years. Twelve participants had one or more disorders in addition to a reading problem: 11 had a diagnosis of ADHD and 1 of Asperger's syndrome. Of the 11 children with ADHD, 2 were funded under the speech/language impairment (SLI) category by their local school district and 1 had Asperger's disorder. One of the two children with SLI also had a phonological disorder that affected articulation in addition to semantic and grammatical weaknesses. The second child also had semantic-grammatical impairment, although receptive vocabulary was within normal limits. One additional child had subthreshold ADHD, that is, the child did not satisfy Diagnostic and Statistical Manual of Mental Disorders (4th ed., American Psychiatric Association, 1994; DSM-IV) criteria for the disorder but did have clinically significant problems regulating cognition and behavior (Barkley, 2006). Seven of the children with ADHD were taking a stimulant medication.
Parents gave informed written consent consistent with the guidelines of the National Health and Medical Research Council of Australia and research use of the data was approved by Griffith University Human Research Ethics Committee.

Measures
Intellectual ability. The Picture Concepts and Matrix Reasoning subtests of the Wechsler Intelligence Scale for Children-Fourth Edition (WISC-IV; Wechsler, 2003) were used to estimate intellectual ability. Picture Concepts requires the child to select one picture from each of two to three rows of pictures to form a group with common characteristics (α = .86 for internal consistency; r = .76 for test-retest reliability). Matrix Reasoning requires the child to identify the missing portion of an incomplete visual matrix (α = .89 for internal consistency; r = .85 for test-retest reliability). The scaled scores were combined to form a nonverbal fluid reasoning index (Flanagan & Kaufman, 2009). IQ-achievement discrepancy was not used to identify children for intervention because existing evidence shows limited differences between poor readers with and without low IQ on reading-related factors (e.g., Fletcher et al., 1994;Flowers, Meyer, Lovato, Wood, & Felton, 2001;Pennington, Gilger, Olson, & DeFries, 1992;Stanovich & Siegel, 1994). However, an estimate of IQ was necessary for description of the sample (Horner et al., 2005). Nonverbal IQ was used because verbal IQ scores are confounded with reading scores (Stanovich, 1986;Sternberg & Grigorenko, 2002).
Oral language skill. The Listening Comprehension subtest from the Wechsler Individual Achievement Test -Second Edition (WIAT-II; Wechsler, 2001) was used as a screening measure of oral language skill (α = .8 for internal consistency; r = .91 for test-retest reliability). Participants were required to complete three subtests. In Receptive Vocabulary, the examiner gave the participant a word and the child was required to select one of four pictures that best matched the word. The Expressive Vocabulary subtest required the child to look at a picture and when given a verbal prompt provided by the examiner to provide a single word that meant the same thing. The Sentence Comprehension subtest required the participant to select one of four pictures that matched a spoken sentence. The overall Listening Comprehension score was expressed in standard score units with a mean of 100 and a standard deviation of 15.
The Vocabulary subtest from WISC-IV (Wechsler, 2003) was used to provide a measure of expressive vocabulary. Participants were required to define orally presented words that increase in difficulty. The Vocabulary scaled score distribution has a mean of 10 and a standard deviation of 3 (α = .89 for internal consistency; r = .92 for test-retest reliability). These data were obtained for adequate sample description (Horner et al., 2005).
Word recognition. The Word Reading subtest from the WIAT-II (Wechsler, 2001) was used to assess single word recognition skill. The task required participants to pronounce single letters and single words. Word reading ability was expressed as a standard score with a mean of 100 and a standard deviation of 10 (r xx = .97 for internal consistency; r = .98 for test-retest reliability; maximum raw score = 131).
Word decoding. The ability to use phonological information to decode words was assessed with the Pseudoword Decoding Word Attack subtest from the WIAT-II (Wechsler, 2001). Pseudowords or "nonwords" are used to determine how well the individual can decode and pronounce words they have not seen before. Nonwords are letter strings that resemble English words and conform to the spelling and sound structure of English but do not make sense (e.g., leb, ruckid, and unfrodding). Scores were expressed in standard units with a mean of 100 and a standard deviation of 10 (r xx = .97 for internal consistency; r = .95 for test-retest reliability; maximum raw score = 55).
Reading comprehension. The ability to comprehend written text was assessed using the Reading Comprehension subtest from the WIAT-II (Wechsler, 2001). Participants were required to read a series of graded passages and sentences and answer orally presented questions about the text. Incorrect words were not supplied but the child could refer back to the passage while answering the questions. The unit of measurement was a standard score with a mean of 100 and a standard deviation of 10 (r xx = .95 for internal consistency; r = .93 for test-retest reliability). It is recognized that reading comprehension is a complex task and no test adequately samples all aspects of comprehension (Bowyer-Crane & Cutting & Scarborough, 2006). The WIAT-II was chosen because it samples the widest range of comprehension skills of tests that have Australian norms.
Phonological awareness. The Elision and Blending Words subtests from the Comprehensive Test of Phonological Processing (CTOPP; Wagner, Torgesen, & Rashotte, 1999) were used to measure phonological awareness. The subtests were combined to form a phonological awareness composite (mean = 100; SD = 15; average internal consistency across all ages = .95).

Procedure
Upon presentation to the clinic, all participants were assessed on the cognitive and reading measures in a single session. Testing was conducted by the first author and took approximately 1.5 hr. Each child then received 40 sessions of reading intervention using the Understanding Words materials over 20 weeks. The sessions were taught one-on-one and lasted for 1 hr. At the conclusion of the intervention, each child was assessed with the same battery of tests as at pretest, except for the measures of intellectual ability and oral language skill.

Teaching Procedures
The teaching curriculum of Understanding Words (Wright, 2005) contains seven strands: phonological awareness, phonics, spelling, fluency, irregular words, vocabulary, and comprehension strategies. A brief summary of the types of activities used in each strand is provided below.
Phonological awareness. Phonological awareness took up approximately the first 5 min of the initial sessions. Teaching ceased when children could identify initial, medial, and final phonemes in cvc, ccvc, and cvcc words and when they could blend and segment phonemes in VC, CVC, CCVC, CVCC, and CCVCC words.
Phonics. Approximately 15 min of each session was devoted to phonics. A maximum of one new GPC was introduced per session. Students were explicitly taught the new GPC, and the GPC and the act of phonological decoding were reinforced via reading of words lists. The words in each list consisted of the new GPC and GPCs previously mastered. Errors were corrected using a cascade of prompts. First, the student was stopped and asked to look carefully the word and to decode. If they failed to produce the correct phoneme for a grapheme, the instructor asked "look at (grapheme); what sound does it make?" If correct, the student was directed to decode the whole word and descriptive praise was provided on success (e.g., "Well done. I like how you looked carefully at all the letters that time and decoded it. Good job."). If incorrect, the instructor supplied the GPC and then prompted the student to decode the word (e.g., "That letter makes/t/; What sound?; Now decode the word on your own"). Descriptive praise was provided on success. If the student could not read the word after the preceding prompts, the word name was provided to the student (e.g., "That word is tap; What word?"). If the student could not successfully blend a word after producing the correct lettersounds, they were provided with descriptive praise (e.g., "Well done. You said all the letter sounds"). The instructor then modeled the letter-sounds and directed the student to blend (e.g., "Listen as I say the sounds. I'll say the sounds; you say the word; /t//a/ /p/"). If the student could not successfully blend after two attempts, the whole word was provided to the student (e.g., "That word is tap; What word?").
Irregular words. Approximately 5 min of each session was devoted to high-frequency irregular words (those that cannot be or can only partially be identified using phonological decoding strategies, e.g., "put"). The words were selected from the Children's Printed Word Database (Masterton, Stuart, Dixon, & Lovejoy, 2003) and were taught using a combination of flashcards and spelling, methods that have been shown to be effective in improving lexical processing in single cases (e.g., Kohnen et al., 2008). Errors were corrected immediately with a verbal prompt (e.g., "That word is where; What word?").
Spelling. Spelling activities occupied approximately 5 min. Spelling served three purposes. First, having students spell familiar regular words required them to segment the word into phonemes; thus, phonemic awareness was continued throughout the program without having to spend time on stand-alone activities. Second, spelling was used to reinforce new and old GPCs. When a new GPC was introduced, students are asked to spell unfamiliar regular words that include the new GPC and previously mastered GPCs. Third, spelling was used to reinforce the orthographic patterns in irregular words (see above).
Fluency. Repeated oral reading of sentences and stories was used to address fluency. The sentences and stories were all part of the program. The sentences and stories were written to be as decodable as possible and to contain as many of the irregular words as possible. For example, if students had learned all of the single-letter sounds that the digraph "ai" represents /ae/ and the irregular word "put" they might read the sentence: Ted put his bag on the train. These activities comprised 10 to 15 min of session time. No prompts were provided unless the student (a) made an error, (b) did not attend to punctuation (e.g., ran sentences together), or (c) read accurately but not fluently. In case of an error, the instructor waited until the end of the sentence to see if the student self-corrected. If they did not, irregular words were corrected immediately with a verbal prompt (e.g., "That word is where; What word?"). Regular words were prompted as per the procedure described in the section titled "Phonics" discussed previously. When students did not attend to punctuation, the instructor stopped them and modeled reading the sentence appropriately. The students then reread the section of text until they attained mastery. If reading was accurate but not fluent, the instructor asked explicit questions about sentence content (e.g., Who ran? What did the boy do? How did the boy run? To where did he run? for the sentence "The boy ran quickly to the shop"). The student then reread the section of text until they attained mastery.
Vocabulary. The teacher defines a new word for the student(s). For example: A different word for end is finish. What is a different word for end? (Finish) What is a different word for finish? (End). The original word is then presented in an oral sentence and the students have to repeat the sentence using the synonym. For example, Jack saw Maggie at the end. Say that. Now say that with a different word for end (Jack saw Maggie at the finish). The activity is completed with a discussion of how and where the children might use the new word in daily life. These activities comprise up to 5 min of the teaching session.
Comprehension strategies. Comprehension begins with oral activities as recent research (e.g., Clarke, Snowling, Truelove, & Hulme, 2010) has shown that oral activities have advantages over reading activities in promoting reading comprehension. Students engage in oral comprehension of simple and then complex sentences. Oral sentences are used to teach how to make cohesive and predictive inferences. Finally, consistent with the recommendations of the National Reading Panel (NICHHD, 2000), students are asked explicit questions following reading of texts and engaged in question generation activities following reading. These activities take approximately 5 min of each session.

Fidelity of Treatment Method
Understanding Words (Wright, 2005) contains detailed scripts that guide administration of program. All lessons were provided by the author of the program so no other checks on fidelity were used beyond following the standardized format of the program.

Pretreatment Scores
Mean pretreatment scores on all measures are presented in Table 1. The results show that as a group, the participants had substantial difficulty with reading comprehension, nonword decoding, single word identification skills, and phonological awareness. In terms of intellectual and language abilities, Table 1 shows that the sample is approximately average in intellectual ability and its vocabulary was within the average range.

Reading Gains per Hour of Instruction
An alternative measure of treatment effectiveness that provides information about absolute treatment gains and the efficiency of the treatment is the number of standard reading score units gained per hour of intervention (McGuiness, McGuiness, & McGuiness, 1996). The points-per-hour metric is becoming a common index for comparing program effectiveness (e.g., Duff et al., 2008;Torgesen et al., 2001).
Gain scores were calculated on the basis of 40 hr of instruction. The gain in word identification scores was .23 standard score point/hr, and gains for nonword decoding and reading comprehension were both .39 standard score point/ hr of intervention. These gains are equivalent to other shortterm interventions (e.g., Duff et al., 2008;Torgesen et al., 2001) described in the literature.

Individual Response
Analyses of group data arguably obscure significant effects in individual participants (T. C. Campbell, 2005;Jacobson & Truax, 1991). To avoid this problem, the data from all participants were evaluated individually using reliable change and clinical significance criteria.
The reliable change index (RCI; Christensen & Mendoza, 1986;Jacobson & Truax, 1991) specifies how great a change is required from pre-to posttest for that change to be considered statistically reliable. The RCI takes into account measurement error (reliability of the test) and sample variability (pretest standard deviation).
The clinical significance of change has been defined as "the practical or applied value or importance of the effect of an intervention-that is, whether the intervention makes a real (i.e., genuine, palpable, practical, noticeable) difference in everyday life" (Kazdin, 1999, p. 332). Although some quantitative measures have been used (Kendall, Marrs-Garcia, Nath, & Sheldrick, 1999), preset criteria that are acceptable to client, therapist, researcher, or society are frequently used as benchmarks of successful treatment (T. C. Campbell, 2005). In the reading literature, Torgesen (2000) proposed that a posttreatment standard score of <90 represents inadequate response to intervention. This study therefore adopted a posttest standard score of ≥90 as an indicator of clinically significant response.

Reliable Change
The RCI for each individual was calculated using the following formula.
. X pretest = individual pretest score, X posttest = individual posttest score. The index score for each individual is the difference between the pre-and posttest scores divided by the standard error of measurement. The standard error of measurement is calculated in the following way: . , where SD pre = standard deviation of the group at pretest and r xx = reliability of the measurement instrument (Christensen & Mendoza, 1986). To determine whether a significant change has occurred, a cutoff score is produced at the .05 level of statistical significance, which corresponds to change of 1.96 standard deviations multiplied by the S diff score. Individuals with an RCI score greater than this value are considered to have made a reliable change due to the treatment.

RCI
critical value = 1.96 × S diff where SD pre = standard deviation of the group at pretest and r xx = reliability of the measurement instrument (Christensen & Mendoza, 1986). Table 2 shows the RCI for the six outcome variables. Seventeen participants (68%) achieved

Clinical Significance
Based on our criteria of clinical significance (reliable change and a minimum standard score of 90), a total of 14 participants (56%) achieved clinically significant change on the word identification measure, 21 (84%) made clinically significant improvements in the nonword decoding measure, and 13 (52%) made clinically significant improvements in reading comprehension. Seven of 25 (28%) participants made clinically significant change in all three measures. Fourteen of 25 (56%) made clinically significant change in both the nonword decoding and reading comprehension measures.
It could be argued that the clinical significance criteria set for reading comprehension were too stringent because reading comprehension is heavily influenced by general verbal ability and adopting an arbitrary standard score cutoff or grade-level criterion would assume that all children have average verbal ability (Torgesen, 2000). This is clearly not the case for our sample and, when one takes into account that even special instruction fails to bring the verbal intelligence of some children into the average range (Lee, Brooks-Gunn, Schnur, & Liaw, 1990), the use of such criteria may be seen as unrealistic. An alternative is to adopt criteria that requires reliable change from pre-to posttest but which has as a final outcome reading comprehension consistent with its verbal ability (Torgesen, 2000). For this study, we used pretest Listening Comprehension scores from the WIAT-II as a measure of verbal ability. A posttest reading comprehension standardized score at or within .5 standard deviations of the child's listening comprehension was then used as an alternative indicator of meaningful improvement. Although many arguments could be made about how to measure verbal ability, the advantage of using the WIAT-II Listening Comprehension measure was that it is part of the same test battery (and therefore conormed) with the reading comprehension measure.
When these criteria were used, 56% of the sample (14 in total) made clinically meaningful response. Thus, changing the criteria did not lead to an appreciable change to the number of children classified as having made clinically meaningful response to intervention.

Discussion
The aim of the current study was to assess the clinical efficacy of a reading intervention program for struggling readers in an open trial design. Twenty-five participants who had poor reading skills and typically had a mix of coexisting developmental disorders completed the 40-hr program over 20 weeks.
The data showed substantial gains in measures of word identification, phonological decoding, and reading comprehension from pre-to posttest. All gains were statistically significant. The data show that a reading treatment program with a heavy word-level emphasis on phonological skills and phonics can produce rapid gains in phonological decoding ability in most children, including those with complex developmental profiles. This finding is consistent with previous research (e.g., Duff et al., 2008;Hatcher et al., 1994Hatcher et al., , 2006Torgesen et al., 2001;Wright & Conlon, 2009). The data also show that a program that combines oral language activities along with word-level training can lead to significant gains in reading comprehension in struggling readers.
Additional evidence for the effectiveness of the intervention was provided by a measure of standard score gains per hour of intervention. According to a review by Torgesen et al. (2001), the average gain per hour in terms of wordreading ability in published studies is around .20 of a standard score point. It is important to note that the studies reviewed by Torgesen et al. were mostly typical of randomized controlled trials in that they included more homogeneous groups and had tighter controls on participants' abilities than used in the current open trial design. Despite the fact that this study arguably used children with more complex needs than the typical trial reviewed by Torgesen et al., a standard score gain/hr of .23 was found for word identification and gains of .30 of standard score unit/hr was found for both phonological decoding and reading comprehension.
As analyses of group data may obscure significant effects in individual participants (T. C. Campbell, 2005;Jacobson &  The RCI is expressed in standard score units and is the amount of change participants had to make from pre-to posttest in standard score units to meet the reliable change criterion. Truax, 1991); we also analyzed individual data using reliable change and clinical significance criteria. The data showed that 56% of the sample made meaningful improvements (posttest standard score of ≥90) on the word identification measure. That more children did not respond meaningfully is probably more a reflection on the test itself, rather than on the intervention. The word identification measure is heavy on irregular words that can only be read via a lexical procedure (Coltheart et al., 2001) and improvements are likely only with word-specific training (which did not occur) or with greater exposure to text. Given the short-term nature of the intervention, the reading volume of children was unlikely to change substantially, so that we did not have an a priori expectation of finding a large change on this measure. In contrast, it was fully expected that the largest word-level gains would be in phonological decoding, as the largest component of the intervention involved phonics, a method which teaches phonological decoding explicitly. That 84% of the sample returned to an average level of reading ability following intervention was therefore very pleasing. It is sometimes suggested (e.g., Wright & Conlon, 2009) that improving word-level skills in struggling readers will lead indirectly to improvements in reading comprehension with little or no oral language or specific comprehension training. The current data do not support this view. About one sixth of intervention time was spent on vocabulary, oral language, and reading comprehension activities. That only 52% to 56% of the sample made clinically significant change suggests that substantial oral language and comprehension work is required to make meaningful improvements in reading comprehension in complex populations. Future iterations of the Understanding Words program (Wright, 2005), and perhaps others, will need to involve more time spent on vocabulary and other oral language comprehension tasks.

Limitations
The current data indicate that Understanding Words (Wright, 2005) shows promise as an intervention for struggling readers and it is worthwhile performing additional research. However, several sources of variance were not controlled in this study.
Maturation is a normal part of child development and children typically become better at most things as they age. It is possible that the reading improvements seen in this study were due to normal maturation rather than to the treatment. Evidence contra to maturation being a factor is that improvements occurred in age-standardized scores. Nevertheless, age-standardized scores are not a substitute for a control group and maturation effects will need to be accounted for in future research.
Regression to the mean refers to a statistical phenomenon in which scores are likely to regress toward the mean on repeat testing (D. Campbell & Kenny, 1999). Because of regression to the mean effects, the average score in a group of participants with low scores at pretest is likely to improve repeat testing for statistical reasons. One way to control regression to the mean is to use outcome measures that have high test-retest reliability. In this study, all measures had test-retest statistics in the high range; however, due to the lack of a control group with comparable initial severity of reading problem it remains possible that the interaction effects occurred purely for statistical reasons. Future research addressing the efficacy of Understanding Words and/or if any program can be delivered effectively in the real world will need to better control for regression to the mean through inclusion of a reading-level control group.
The absence of a control group means that strong claims cannot be made about program efficacy. It is possible that the reading gains were due to nonspecific treatment variables and not specifically due to the nature or content of the intervention program. Forthcoming case series data with double baseline controls and a randomized trial with wait list controls will provide data on which stronger claims may be made. Given the multidimensional nature of the intervention, future research will also be required to tease out which components of the intervention are necessary for treatments gains to occur. It will also be useful for future research to compare the effectiveness of Understanding Words with the support/intervention provided to children in real-world school settings in a similar way to which Wright and Conlon (in press) evaluated the efficacy of school-based learning support services for reading difficulties.
In conclusion, the data from the current open trial are encouraging for the promise of Understanding Words as a reading intervention. Further research, some of which is already underway, will investigate (a) if the program can be run in groups, (b) if it can be administered effectively by parents and paraprofessionals, and (c) if it can be administered in Internet-based format because of the need for costeffective programs in schools.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research and/or authorship of this article.