Language profiles and literacy outcomes of children with resolving, emerging, or persisting language impairments

Background Children with language impairment (LI) show heterogeneity in development. We tracked children from pre‐school to middle childhood to characterize three developmental trajectories: resolving, persisting and emerging LI. Methods We analyzed data from children identified as having preschool LI, or being at family risk of dyslexia, together with typically developing controls at three time points: t1 (age 3;09), t3 (5;08) and t5 (8;01). Language measures are reported at t1, t3 and t5, and literacy abilities at t3 and t5. A research diagnosis of LI (irrespective of recruitment group) was validated at t1 by a composite language score derived from measures of receptive and expressive grammar and vocabulary; a score falling 1SD below the mean of the typical language group on comparable measures at t3 and t5 was used to determine whether a child had LI at later time points and then to classify LIs as resolving, persisting or emerging. Results Persisting preschool LIs were more severe and pervasive than resolving LIs. Language and literacy outcomes were relatively poor for those with persisting LI, and relatively good for those with resolving LI. A significant proportion of children with average language abilities in preschool had LIs that emerged in middle childhood – a high proportion of these children were at family risk of dyslexia. There were more boys in the persisting and resolving LI groups. Children with early LIs which resolved by the start of formal literacy instruction tended to have good literacy outcomes; children with late‐emerging difficulties that persisted developed reading difficulties. Conclusions Children with late‐emerging LI are relatively common and are hard to detect in the preschool years. Our findings show that children whose LIs persist to the point of formal literacy instruction frequently experience reading difficulties.

Resolving and persisting trajectories are not the only patterns of development observed among children with language difficulties. Studies of early language delay (ELD or 'late-talkers') have also revealed a late-onset or emerging trajectory (e.g., Dale, Price, Bishop, & Plomin, 2003). Henrichs et al. (2011), following infants from 1½ to 2½ years, and Zambrana, Pons, Eadie, and Ystrom (2014), following children from 3 to 5 years, reported similar prevalence rates of 3% for persisting LI, 5-6% for resolving LI, and 6% for emerging LI. Examining predictors of the different patterns, Zambrana et al. (2014) reported gender differences with boys being most common in the persistent group, less common in the transient group and least common in the emerging group. Persistent LI was associated with a range of risk factors including social adversities. Interestingly, they also reported an association between the late-onset trajectory and a family history of literacy problems. As preschool phonological impairments are common in children at family risk of dyslexia (e.g., Nash, Hulme, Gooch, & Snowling, 2013), such deficits may have downstream effects not only on the development of reading but also, as suggested in a phonological theory of SLI proposed by Chiat (2001), on lexical and syntactic development.
A limitation of current findings regarding the trajectories of LI is that data come primarily from parental report measures. In addition, while the focus has been on early language milestones, a key issue is how these children's literacy skills develop. It is well documented that children with LI typically experience literacy difficulties (e.g., Catts, Adlof, Hogan, & Ellis Weismer, 2005) but the developmental relationship between early LI and later literacy difficulties is not straightforward (Bishop & Snowling, 2004). First, according to the 'critical age hypothesis', it is only when language difficulties are present at the point of reading instruction that they have detrimental effects on reading development. Bishop and colleagues initially considered that the impact was via expressive phonological difficulties, but later implicated broader aspects of language (Bishop & Adams, 1990;Stothard et al., 1998). Second, studies of children at family risk of dyslexia have revealed not only that early oral language difficulties often presage poor literacy but also that their effects are mediated by poor phonological awareness (e.g. Torppa, Lyytinen, Erskine, Eklund, & Lyytinen, 2010). Third, the association reported by Zambrana et al. (2014) between an emerging LI trajectory and family risk of literacy problems highlights a further possibility -that some children with preschool phonological impairments associated with family risk of dyslexia may in addition experience downstream effects of these deficits on broader oral language skills.
The main aim of the present study was to use objective measures of language and literacy to validate previous findings regarding resolving, emerging, and persisting trajectories of LI. Given the well-established finding that preschool language impairments are a risk factor for dyslexia (Snowling & Melby-Lervag, 2016; for a review) we assessed children at high risk of dyslexia either because they were at familial risk or because they had a preschool LI. The children were followed from 3½ through 5½ to 8 years. A particular focus here is on children with late-emerging LI. Based on previous research, we hypothesized that the persisting and resolving trajectories would be more common among boys than girls, that the emerging trajectory would be strongly associated with family risk of dyslexia and that the persisting trajectory would be associated with a great number of adversities including social disadvantage (Hoff, 2006 for discussion). In terms of literacy outcomes, we predicted that children who showed resolving language impairment would be least at risk of reading and spelling problems, while children with emerging and persisting impairments would be more severely affected, consistent with their poorer language skills at the time of formal reading instruction.

Participants
Children participating in this study were part of the Wellcome Language and Reading Project. Ethical permission for the project was granted by the Psychology Department, University of York, and the NHS Research Ethics Committee. Informed written consent was obtained from the children's parents. Children were recruited because they were at risk of developing dyslexia (owing to a family risk (FR), or because parents were concerned they had a preschool language impairment (LI)) or because they were typically developing with typical language development (TL). Children were categorized as FR if they had a parent or sibling who could be classified as dyslexic (see Nash et al., 2013, for further details).
Language impairment. At each time point, criteria were applied to determine which children could be classified as language impaired. Following recruitment, children were classified as LI or not according to the conventions normally used (see below). However, because we wished to investigate novel categories of language impairment, it was important to ensure reliability of measurement (in particular to guard against regression to the mean which could be said to characterize the 'resolving' group). We therefore followed a procedure which defines LI as falling -1 SD below the mean of a composite language measure at t1, t3, t5. The composite measure contained the same constructs at each time point: expressive vocabulary, receptive grammar and expressive grammar. There were identical tests at t1 and t3; by necessity, two of the three specific measures were different at t5 (see details of tests used to form composites below). LI classification was made without regard to nonverbal ability.
Language impairment at t1. Children were classified as LI if they 'failed' at least 2/4 language tests. Three subtests were from the Clinical Evaluation of Language Fundamentals were used (CELF-Preschool 2 UK - Wiig, Secord, & Semel, 2006): Basic Concepts, Expressive Vocabulary, Sentence Structure; and a scaled score of 7 or below counted as a fail. For the Test of Early Grammatical Impairment (TEGI - Rice & Wexler, 2001), failure of the screener constituted a fail (see Nash et al., 2013). For this study, we validated this clinical classification at t1 using scores on the composite language measure (Expressive Vocabulary, Sentence Structure, TEGI, op cit). All but two children who were clinically classified also fulfilled the criterion of falling À1 SD below the mean of a composite language measure for the whole sample. These two cases were dropped so that all cases at t1 in this paper fulfilled the dual criteria of (i) clinical classification (ii) 1 SD below the mean on a reliable composite language measure. Given that children with language difficulties are heavily overrepresented in this sample, this criterion of À1 SD below the total sample mean can be regarded as quite a conservative definition of LI.
Language impairment at t3 and t5. A child was classified as LI if their score fell 1 SD below the mean of a composite measure t3 -Expressive Vocabulary, Sentence Structure, TEGI, op cit; t5 -Expressive Vocabulary (op cit), Test for Reception of Grammar (TROG-II - Bishop, 2003) and Formulated Sentences (CELF 4).
Language trajectories. LI status at t1, t3, and t5 was used to characterize each child's language trajectory. Children with typical language (TL) were those who never reached LI classification. 'Resolving' LI applied to those who had LI at t1 and/or t3, but not t5; 'emerging' LI to those who had no LI at t1, but LI at t5 (regardless of t3 status); and 'persisting' LI to those who had LI at t1 and t5 (regardless of t3 status). We placed more emphasis on t1 (the age of first 'diagnosis') and t5 when we expected language development to have stabilized. Data from t3 allow us to assess the 'critical age' hypothesis. It should be noted that we did not use any measures of phonological processing in setting the criteria for LI or for defining language trajectory.

Tests and procedures
Children were assessed individually by a member of the research team at each time point (t1, t3, t5), at home or school. The tests were administered as part of a larger assessment session, lasting approximately 1½-2 hr per child (with appropriate rest breaks given). The following standardized tests were administered as per instructions in the manuals (more details of tests can be found in Nash et al., 2013 andThompson et al., 2015).
Nonverbal ability. At t1, nonverbal ability was measured using two subtests from the Wechsler Preschool and Primary Scale of Intelligence (WPPSI-III -Wechsler, 2003a): Block Design and Object Assembly. At t5, two subtests from the Wechsler Intelligence Scale for Children (WISC-IV -Wechsler, 2003b) were used: Block Design and Matrices. At both time points, composite nonverbal IQ scores were calculated based on the mean of z-standardized scores for the two subtests; the average standard scores based on the subtests were also calculated.
Receptive language was assessed at t1 using the Basic Concepts subtest (CELF-Preschool 2 UK), and at t5 with the Receptive One Word Picture Vocabulary Test (Brownell, 2000).
Grammar. At t1 and t3, receptive grammar was assessed using the Sentence Structure subtest (CELF 4). Inflectional morphology was measured via two subtests of the TEGI (Rice & Wexler, 2001): the third person and past tense probes. The percentage of correct responses across both probes is reported.
At t5, receptive grammar was assessed using the Test for Reception of Grammar (TROG-II - Bishop, 2003); we report the number of blocks successfully completed (4 items per block). Expressive grammar was measured using the Formulated Sentences subtest (CELF 4). Phoneme awareness. At t3 and t5, the Phoneme Deletion subtest from the YARC (Hulme et al., 2009) was administered. At t5, 12 items were added to extend the test (five words with picture support and seven nonwords without picture support) to guard against ceiling effects.

Rapid automatized naming
Rapid automatized naming (RAN) was measured at t3 and t5: children named aloud a series of five objects repeated at random in an 8 by 5 matrix. RAN Objects was indexed by a rate scorenumber of objects correctly named per second.

Word-level literacy
The Single Word Reading Test (SWRT -Foster, 2007) was used to assess word reading accuracy at t3 and t5.
The Spelling subtest of the Wechsler Individual Assessment Test (WIAT-II - Wechsler, 2005) was administered at t5.

Prose reading
The Passage Reading subtest of the York Assessment for Reading Comprehension ) was administered at t5. Children scoring <30 on the SWRT began at Beginner Level; children scoring 30 on the SWRT began at Level 3. They read the passages aloud in their own time and then answered questions about them. Reading Comprehension is the total number of comprehension questions answered correctly. Raw scores were based on all seven passages, with full credit awarded for passages below basal and maximum errors assumed for passages above ceiling. Standard scores were based on the two highest passages (as per the manual).

Results
Our sample focused on dyslexia and purposefully oversampled for LI. Table 1 shows the language composite scores (with 95% confidence intervals) derived from measures of expressive language, receptive grammar, and expressive grammar at the three time points (t1, t3, t5) for each language trajectory group. At all time points, the TL groups are significantly different from each of the other groups and there is no overlap in the confidence intervals. At t1, the resolving and persisting LI groups are not significantly different as confirmed by the overlapping confidence intervals; moreover, these do not overlap with those of the emerging group. At t5, the confidence intervals for the resolving and persisting LI groups no longer overlap but those of the emerging and persisting LI groups do (neither show overlap with those of the resolving group). At t3, each of the groups differs from the others; there is a step function such that scores for TL > resolving > emerging > persisting and there is no overlap between the confidence intervals around the means for each group. Table 2 shows the age, SES, gender and family risk (FR) status of each language trajectory, along with information regarding the percentage of each group who had speech difficulties at recruitment. Of the sample, 34% (n = 75) were LI at some point. The remaining children were considered to have typical language (TL). Of the LIs present at t1, 22% had resolved and 78% were persisting at t5. Using data from t1, t3, and t5 to describe trajectories of LI, 16% could be classed as resolving, 28% emerging, and 56% persisting. If a more stringent cut-off of À1.5 SD below the mean is taken to classify a child as LI, 6/ 42 children from the persisting group and 7/12 children from the resolving group no longer fulfill criteria for LI at t1. However, all of the children in the persisting group fulfill this strict criterion at t5. It seems therefore that using a more conservative criterion reduces the false positive rate (i.e. misclassifying as LI at time 1 children who go on to be free of LI~4 years later) such that only 2% of the sample now show this profile, but it has little effect on those whose language impairments persist. Using the stricter criterion at t5 reduces the number in the emerging group from 21 to 16 making up 7.3% of the sample.
Gender was associated with LI trajectory (v 2 (3) = 10.68, p = .014) with more boys in the persisting than the TL group (OR = 3.15, p < .005). Importantly, although there was no significant overall association between FR status and LI trajectory (v 2 (3) = 6.67, p = .083), there were more children at family risk of dyslexia in the emerging than the TL group (OR = 3.80, p < .014).

Characterization of language trajectories
Raw scores for language and nonverbal abilities are given in Table 3. The standard scores in Table 4 (calculated where appropriate) indicate how the groups are performing relative to age norms.
From the raw scores, the general pattern is for the LI groups to perform below the TL group, but not always similarly to one another. To evaluate the differences in raw scores between groups, Cohen's d was calculated, correcting for unequal sample sizes. We interpret d = 0.8 as a large, d = 0.5 a moderate, and d = 0.3 a small effect size. Asterisks in Table 3 represent significance levels of group comparisons after controlling for multiple comparisons (Bonferroni correction a = .004); where appropriate transformations were applied to improve the normality of variables with significantly non-normal distributions before testing for group differences.
On nonverbal IQ, the LI groups perform similarly to one another at t1 with lower scores than the TL group but the emerging and persisting groups show a substantial decline by t5 to perform lower than the TL and resolving groups. As might be expected, all LI groups perform worse than the TL group on language measures at every time point, with medium to large effect sizes. It is important to note that, at t1, the emerging group always performs better than the resolving group with a large effect size for receptive grammar. However, at t3, this pattern reverses with the emerging group scoring less well than the resolving group on all measures. These deficits in the emerging group increase over time, so that by t5, they are performing worse than the resolving group and similarly to the persisting group on all measures.
At t1, the resolving group (at this time point classified as having LI) performs significantly worse than the TL group on all language measures, with large effect sizes. Standard scores indicate particular difficulties with grammar while expressive vocabulary is in the average range. At t5, the standard Finally, the persisting group shows the most severe difficulties on all language tests at t1 through t5 with standard scores in the below-average range. Compared with the resolving group at t1, they have lower nonverbal ability (the effect is moderate but not significant) and perform significantly worse on expressive vocabulary and basic concepts (a language comprehension task; although effect sizes are moderate to small). The effect size for receptive grammar is large but favors the persisting group. By t3, the resolving group performs significantly better than the persisting group on all measures, with large effect sizes.
Literacy outcomes for children with resolving, emerging, and persisting language difficulties Raw scores for literacy-related measures are given in Table 5, and standard scores in Table 6. Once again Cohen's d was calculated as a measure of the differences between groups. Asterisks in Table 5 represent the significance levels of group comparisons after controlling for multiple comparisons (Bonferroni correction a = .004); where appropriate transformations were applied to improve the normality of variables before testing for group differences. As predicted, the scores of the resolving group were similar to those of the TL group and both the emerging and persisting groups showed deficits on measures of literacy and related measures, as expected given their concurrent LI status. Literacy scores for the resolving group are not significantly different from those of the TL group on any measure and effect sizes are generally small. In contrast, the emerging and persisting LI groups perform worse than the TL and resolving groups and similarly to each other (effect sizes for the contrast between the resolving and emerging groups were large but not always significant). While all of the LI groups (resolving, emerging, persisting) have standard scores in the average range at t5 it is important to note that their scores are at least 10 standard score points below that of the TL group.
Finally, it is instructive to consider how many of the children within each group fulfilled criteria for 'dyslexia' at t5. In the broader Wellcome sample we have defined dyslexia using a criterion of 1 SD below the control mean on a composite measure comprising word reading and spelling scores; this corresponds to a standard score of 88. Using this criterion, 14% of the TL, 8% of the resolving (one child), 48% of the emerging and 41% of the persisting group are dyslexic.

Discussion
This study tracked children from age 3½ through 5½ to 8 years, and considered their LI status across these three time points in order to classify LI trajectories. Of those with an LI at age 3½, 22% of cases had resolved by age 8 while 78% persisted. These figures differ from those reported by Bishop and Adams (1990) who found that 44% of LIs at age 4 resolved by age 8½ years but it is important to note that the present study recruited children at family risk of dyslexia as well as children with preschool LI. Also, as shown above, the cut-off criterion for LI influences rates of language impairment such that when a more conservative criterion is used, fewer children show the pattern of resolving impairment although this has less effect on those classified with persisting language difficulties. For those children in our sample who demonstrated LI at some point, there were more boys in both of the groups who had LI at t1 and boys were significantly more likely to show the persisting trajectory. Importantly, there were more children at family risk of dyslexia in the emerging group, a finding which confirms the observation of Zambrana et al. (2014). Furthermore, boys and girls were represented equally in this group and the children were not socially disadvantaged. Together these findings are suggestive of a different etiology from that associated with preschool language impairment, possibly of genetic origin.
Our first aim was to characterize the language profiles of these different LI trajectories. Children with persisting LI always performed significantly below the TL group: they had marked, pervasive and sustained language difficulties, relative to their  unaffected peers. Children with resolving LI, who were by definition, impaired on all language measures at age 3½ reached age-expected levels at age 8 but it is noteworthy that their scores on language tasks were still lower than those of unaffected peers, particularly for expressive vocabulary. This replicates Bishop and Adams (1990) in demonstrating residual but subclinical effects of preschool LI in middle childhood; moreover, it highlights the possibility of 'illusory recovery' and the likelihood that some children with early LI may relapse in adolescence as language and literacy demands increase (cf. Snowling et al., 2000).  In terms of differentiating between pre-school LIs that resolved versus persisted, children with resolving LI had significantly better comprehension and vocabulary knowledge at age 3½, but poorer grammar, than those with persisting LI. Furthermore, the resolving group had higher nonverbal IQ. This is compatible with other studies in showing that early LIs are more likely to resolve if they are less severe, and if they are accompanied by average nonverbal abilities (e.g., Bishop & Edmundson, 1987;Cole et al., 1995).
Turning to children with emerging LI, although they performed significantly below the TL children at age 3½, their standard scores across the language measures were in the average range. By age 8, their language scores were in the low-average to belowaverage range, falling significantly below those of the TL group and equaling those of the persisting group. It seems therefore that LIs that do not emerge until middle childhood will be difficult to detect (if relying on standard scores), confirming findings from preschool development (e.g., Dale et al., 2003;Henrichs et al., 2011); however, our findings suggest that a family risk of dyslexia (characterizing 76% of this group) might be a useful risk indicator.
Our second aim was to evaluate literacy outcomes for each LI trajectory. Consistent with the findings of Bishop et al., (op cit), the resolving LI group performed at the same level as the TL group on all literacy-related measures and only one child in the group was identified as dyslexic at t5. Consistent with the critical age hypothesis, the language difficulties of children in this group were within the average range at t3, around the time of beginning formal reading instruction, and their emergent literacy was on course, although they still had relatively poor vocabulary and word recognition was somewhat delayed.
The emerging and persisting LI groups performed significantly worse than the TL group on all literacyrelated measures at ages 5½ and 8, consistent with previous findings that school-age children with LI have concurrent literacy difficulties (e.g., Stothard et al., 1998). The emerging profile is particularly notable in that the late-onset of their language difficulties observed at t3 coincides with the point of literacy instruction when literacy development was significantly delayed (average reading standard score of 82). While this profile raises the possibility of a 'Matthew Effect' for language and for nonverbal ability (after Stanovich, 1986), it is also consistent with mapping theories (e.g., Chiat, 2001) that propose that phonological processing impairments in preschool (typically considered risk factors for dyslexia) can have repercussions for the development of language, perhaps most especially vocabulary development.
Together our findings are sobering: regardless of whether an LI emerges early or late, if it is present in the early school years it has a negative impact on learning to read: 48% of the emerging and 41% of the persisting groups were identified as dyslexic at age 8. However, it is also important to be aware of possible differences in these two pathways to poor literacy. Our findings suggest that a broad range of adversities are associated with the persisting trajectory, including low SES (see Hoff, 2006 for discussion), and parent ratings not reported here suggest that a high proportion of these children have significant attention problems. In contrast, the emerging trajectory is primarily associated with family risk of dyslexia and the rates of social disadvantage and of attention problems are lower. The male liability is also lower for the emerging than for the persisting profiles. Future research should be directed toward understanding the differential predictors of emerging and persisting LIs.
Before concluding, we acknowledge limitations of this work. In particular, although we have attempted to improve reliability of measurement by using composite language measures we cannot circumvent the weaknesses of using arbitrary cut-offs to create categorical variables (language groups) from continuous data. Moreover, the cut-offs used to define LI affect the proportions of children classified as following the different trajectories. Further, it is possible that our findings depend upon the tests used; if we had included phonological processing skills in the diagnostic criteria some of the emerging group may have fulfilled criteria for LI at an earlier stage. Nonetheless, it remains common practice for children to be diagnosed with LI according to arbitrary cut-points on language tests, and exploring betweengroup differences can inform theoretical and clinical issues of importance. While we acknowledge the argument that resolution of LI over could be due to regression to the mean (Tomblin et al., 2003), we highlight the case of emerging LI in which language scores regress away from the mean over time.

Conclusion
In this study, we characterized three trajectories of language impairment, and evaluated their literacy outcomes. Persisting LI, identified in preschool, was characterized as severe and pervasive, with relatively poor literacy outcomes. However, when preschool LI has resolved around the time of formal reading instruction, there is a generally good outcome for language and literacy, although with residual (subclinical) weaknesses in vocabulary which could still affect later outcomes. Possible protective factors are relative strengths in nonverbal ability and less extensive vocabulary difficulties at preschool (when compared with the persisting LI group). Finally, a third group of children had late-emerging problems which were identified in middle childhood but not evident in preschool. Many of these children were at family risk of dyslexia and their language and literacy outcomes were as poor as those with per-sisting LI. Together these findings suggest there are at least two pathways to poor readingone is the outcome of preschool LI, the other more specifically associated with family risk of dyslexia and associated with late-emerging LI.