White matter plasticity and reading instruction: Widespread anatomical changes track the learning process

White matter tissue properties correlate with children’s performance across domains ranging from reading, to math, to executive function. We use a longitudinal intervention design to examine experience-dependent growth in reading skills and white matter in a group of grade school aged, struggling readers. Diffusion MRI data were collected at regular intervals during an 8-week, intensive reading intervention. These measurements reveal large-scale changes throughout a collection of white matter tracts, in concert with growth in reading skill. Additionally, we identify tracts whose properties predict reading skill but remain fixed throughout the intervention, suggesting that some anatomical properties may stably predict the ease with which a child learns to read, while others dynamically reflect the effects of experience. These results underscore the importance of considering recent experience when interpreting cross-sectional anatomy-behavior correlations. Widespread changes throughout the white matter may be a hallmark of rapid plasticity associated with an intensive learning experience.

Skilled reading requires orchestration of a large cortical network, and individual differences in reading performance have been linked to the properties of white matter tracts connecting portions of this network specialized for processing visual, acoustic, and semantic features [1][2][3][4][5][6][7][8][9] . Although individual differences in white matter are thought to reflect the joint influence of genetics and experience [10][11][12] , white matter properties are often held to underlie variation in performance and to causally influence individual learning trajectories [13][14][15][16] . A number of recent studies, working within this framework, have identified features of the white matter that predict reading outcomes in dyslexia 17 , and reading-related skills, like phonological awareness, in prereading children 14,18,19 . The implication of these observations is that underlying anatomical differences may predestine certain individuals to struggle with learning to read. In this view, differences in white matter properties could be considered a reflection of intrinsic deficits, which might be relatively resistant to remediation, but which could plausibly be used for early identification of individuals in need of extra educational support 20 .
Successfully relating anatomical differences with behavioral outcomes requires an understanding of the timescale over which white matter tissue properties exhibit experiencedependent change, and the anatomical specificity of these effects. White matter plasticity, including activity-dependent myelination and oligodendrocyte proliferation, has been observed in animal models over the time-scale of days to weeks [21][22][23][24] , and these effects coincide with changes in tissue properties measured non-invasively using diffusion-weighted magnetic resonance imaging (dMRI) in animals 25,26 . It has further been suggested that myelination may play a causal role in skill learning, since blocking the production of new myelinating oligodendrocytes inhibits motor skill development in mice 27 , implying that changes in white matter are critical to the learning process, rather than epiphenomenal. It is not clear whether similar effects occur in the context of human learning, particularly for a complex skill like reading, which is typically acquired with many hours of practice over a large developmental window. However, the studies cited above strongly suggest that learning should be accompanied by rapid, measurable changes in white matter. Further, a number of recent studies highlight the surprising malleability of human white matter in response to short-term training [28][29][30][31] , including training of reading and related skills [32][33][34] . This opens the possibility that correlations between white matter properties and behavior arise as temporary states within a highly plastic system that flexibly adapts to environmental demands. In this case, observed relationships between anatomy and behavior might be less stable than often presumed, given an appropriate change to the educational environment.
Here, we test whether controlled changes to a child's educational environment induce changes in white matter tissue properties over the time-scale of weeks. Using a longitudinal intervention design, we track improvements in reading skills, and accompanying changes in white matter, in a group of grade-school aged, struggling readers during 8 weeks of intensive (4 hours each day, 5 days a week), one-on-one training in reading skills. We first examine learning effects within three tracts thought to carry signals critical for skilled reading [1][2][3][4][5][6][7][8][9]14,15,18,[35][36][37][38][39][40] : the left arcuate fasciculus (AF), left inferior longitudinal fasciculus (ILF), and posterior callosal connections (CC). These pathways connect canonical reading-related regions within the ventral occipitotemporal (including the visual word form area, or VWFA [41][42][43][44][45][46][47], superior temporal 48,49 , and inferior frontal cortex 50 , and hence, these tracts are considered to be part of the core circuitry for reading 37,46,47 . We find that the AF and ILF exhibit experiencedependent change within weeks the intervention onset, while tissue properties within the posterior CC remain fixed. Moreover, we illustrate the ambiguity of brain-behavior correlations measured in a dynamic system: As training rapidly alters an individual's white matter and behavior, cross-sectional correlations between white matter properties and reading skills change substantially between measurement sessions. Meanwhile, CC white matter properties, which do not change during training, remain correlated with reading skill throughout the intervention. We therefore suggest that some anatomical properties may be stable predictors of the ease with which a child learns to read, while others dynamically reflect the effects of experience. These effects likely arise from distinct mechanism that cannot be distinguished by cross-sectional studies. Finally, we test the hypothesis that experiencedependent plasticity is anatomically localized to specific tracts. Contrary to this anatomicalspecificity hypothesis, we find that educational experience alters a widespread system of white matter tracts in concert with reading skills. This system includes, but is not limited to, the core reading circuitry.

Tracts connecting the core reading circuitry correlate with pre-intervention reading skill
We began by replicating previously reported correlations between reading skill and properties of the white matter tracts connecting key components of the reading circuitry [1][2][3][4][5][6][7][8][9]14,15,18,[35][36][37][38][39][40] . To summarize individual differences in reading, we report Reading Skill, a composite score that incorporates our full battery of reading tests from the Woodcock-Johnson 51 and TOWRE 52 standardized assessments (see Methods for details, and Supplementary Figure 1A). To characterize the cross-sectional relationship between white matter and reading, we calculated simple, bivariate correlations between Reading Skill and each diffusion metric at the preintervention baseline session. As shown in Figure 1, pre-intervention (Session 1) measurements replicate previously reported correlations between reading scores and diffusion properties in the left arcuate, left ILF, and the CC: Correlations between MD and Reading Skill are positive both in the intervention group, and in the full sample, containing intervention and control subjects (Figure 1). The Reading Skill composite is a weighted sum of the individual reading tests, and similar effects are observed when examining correlations with the Woodcock Johnson (insert stats) and TOWRE (insert tests). Mirroring these effects, correlations between FA and reading are negative (Supplementary Figure 2). While several previous studies report a negative relationship between FA and reading in these pathways 3,4,8,38 , others report a positive relationship between FA and reading 1,2,7,35,53,54 . Thus, while properties of these pathways have consistently been shown to correlate with reading skill, the direction of this relationship is not consistent across studies, or tracts (see 55 ). These inconsistencies may depend on factors like age, education, or SES, or may reflect the inherent ambiguity of dMRI metrics like FA, which can be influenced by a number of underlying biological phenomena with distinct, and potentially opposing, relationships to reading 38 . Figure 1. Pathways connecting the core reading circuitry correlate with pre-intervention Reading Skill. Tract average mean diffusivity (MD) is plotted as a function of pre-intervention (Session 1) reading skill. Best-fit lines plotted in gray give estimates for the full data set, while colored lines show fits for the intervention subjects, alone. Correlations for the intervention subjects are given in colored text below the value estimated for the full data set (in black).
In addition to the tracts chosen a priori for analysis, we examined several other tracts previously shown to correlate with reading scores, albeit less consistently across studies 9,56-58 , in a subsequent exploratory analysis. As shown in Supplementary Table 1, the left inferior frontal occipital fasciclus (IFOF) was also significantly correlated with reading skill (Bonferroni corrected p < 0.05), and a number of other tracts showed moderate, non-significant correlations. Finally, to test whether correlations were specific to Reading Skill, as opposed to general academic ability, we calculated correlations with math scores (Woodcock-Johnson Calculation and Math Facts Fluency) and found that none of the tracts that significantly correlated with reading (including the AF, ILF and callosal connections) correlated with math skills. Indeed, neither MD nor FA showed a significant relationship to math skills in any of the tracts chosen for analysis. Reading skills improved substantially during the 8-week intervention period. Standard scores on the Woodcock-Johnson Basic Reading Composite, an untimed measure of reading accuracy, improved significantly over the course of the intervention (F(1,77) = 59.75, p < 10 -10 , linear mixed effect model with a fixed effect of intervention time, in hours, and a random effect of subject). After 8 weeks, the intervention-group mean was within one standard deviation of the population norm (100 +/-15): pre vs. post intervention scores were 80.00 +/-3.50 vs. 92.94 +/-2.50). In line with these results, scores on the TOWRE Index, a timed measure of reading, improved substantially (F(1,77) = 53.69, p < 10 -9 ), as did scores on the Woodcock-Johnson Reading Fluency subtest (F(1,76) = 36.042, p < 10 -7 ). In contrast, we found no evidence for change in math skills during the intervention (Woodcock-Johnson Calculation Score, F(1,63) = 2.54, p = 0.12; Woodcock-Johnson Math Fact Fluency: F(68) = 1.87, p = 0.18), confirming that the intervention specifically affected reading skills. Additional details of these analyses are given in Supplementary Figure 1.

Intensive intervention changes reading skills and white matter tissue properties
Growth in reading skill was specific to the intervention group, as indicated by a significant group (intervention versus control) by time (days) interaction for all reading, but not math, measures using a reading-skill-matched subgroup of the control subjects (n = 9). For this analysis, we substituted 'days' for 'intervention hours', to provide a meaningful index of time for both the intervention and control groups. For intervention subjects, 'days' were highly correlated with 'intervention hours', since testing sessions were scheduled at regularly spaced intervals (r(78) = 0.95, p < 001). For WJ Basic Reading Skills, we saw no significant effect of group (F(1,94) = 0.16, p = 0.68) or time (F(1,94) = 0.19, p = 0.67), but a significant group-bytime interaction (F(1,94) = 4.22, p = 0.042), indicating that growth in reading skills during the intervention period was specific to the intervention subjects. Similarly, for the TOWRE Index, we saw no significant effect of group (F(1,94) = 1.12, p = 0.29) or time (F(1,94) = 0.24, p = 0.63), but a significant group-by-time interaction (F(1,94) = 4.069, p = 0.047). For the WJ Calculation test, we saw a significant main effect of group (F(1,94) = 4.10, p = 0.046) but not time (F(1,94) = 0.31, p = 0.58), and no significant group-by-time interaction (F(1,94) = 1.13, p = 0.29), consistent with stability of this measure in both groups. Results for the full control sample (including both good and poor readers, n = 19) are given in Supplementary Table 2; this analysis shows that amongst the skilled reading control subjects, performance improved with repeated testing for the timed measures (TOWRE and Reading Fluency). In all control subjects, untimed measures (WJ Basic Reading) were stable, showing no change over 8 weeks. In other words, skilled readers benefitted slightly from repeated practice with the timed reading tests, while poor readers did not show any improvements with practice, and only showed an improvement in performance as a result of the intervention program.
To test whether changes in reading skill were accompanied by measurable changes in white matter structure, we first examined MD and FA as a function of intervention time (hours) within the set of white matter tracts considered to be crucial for skilled reading [1][2][3][4][5][6][7][8][9]14,15,18,[35][36][37][38][39] , and which showed significant relationships with pre-intervention reading skill in the current sample: the left arcuate fasciculus (AF), left inferior longitudinal fasciculus (ILF), and posterior callosal connections (CC). Intervention driven tissue changes were evident within the AF and ILF, but not the CC: Specifically, mean diffusivity (MD) decreased as a function of intervention hours in the intervention within the left AF (F(1,77) = 8.46, p = 0.0047, linear mixed effect model with a fixed effect of intervention time, in hours, and a random effect of subject) and the left ILF (F(1,77) = 7.28, p = 0.0086), but not the CC (F(1,77) = 2.37, p = 0.13). Subject motion did not change over time (Supplementary Figure 3), and including subject motion as a covariate in the model did not Like the reading outcomes reported above, intervention-driven changes in MD were specific to the intervention group, as indicated by a significant group (intervention versus control) by time (days) interaction. As above, we substitute 'days' for 'intervention hours', to give a meaningful predictor for both the intervention and control subjects. In the left AF, we found a significant  Table 3, the group-by-time interaction approached significance for the quadratic term for FA in the left AF and ILF, but not for MD in the AF or ILF, or for either parameter in the CC.
Given the observed non-linearity of intervention-driven effects in FA, we opted to use 'session number' as a categorical predictor in the analysis to follow, since this approach summarizes session-to-session differences from baseline, without imposing a shape on the trajectory of change. Sessions were systematically spaced over time, and this timing was consistent across subjects; hence 'session' was highly correlated with 'days' (r(127) = 0.97, p < 0.001). As shown in Figure  ). An exploratory analysis of this same session-by-group interaction for all available tracts is given in Supplementary Figure 10. Finally, to ensure that the interaction was not driven by differences in the stability of our measurements in good vs. poor readers, given that the control group included both typical readers and subjects with dyslexia, we repeated the above analysis with baseline Reading Skill included as a covariate in the model.   61 for additional details of this analysis). For visualization purposes, the middle 80 nodes are plotted. Each line represents the group average mean diffusivity (MD) across subjects, measured at four time-points: pre-intervention (Session 1), after ~2.5 weeks of intervention (Session 2), after ~5 weeks of intervention (Session 3), and after 8 weeks of intervention (Session 4). Shaded error bars give ±1 standard error of the mean. Color values indicate session, ranging from darkest (Session 1) to brightest (Session 4) for each tract. The x-axis shows the location where each tract was clipped prior to analysis (corresponding to black boundary lines in renderings, above). Tract renderings are shown for an example subject. The middle 60% (bounded by black lines) of each tract was analyzed in (B-C), to avoid partial volume effects that occur at endpoints of the tract, where it enters cortex. Both the AF and ILF, but not the posterior callosal connections, show a systematic decrease in MD over the course of intervention. (B-C) Bars show model predicted change (coefficients and standard errors from mixed effects model) in MD (B) and FA (C) for each session. Bar heights represent the magnitude of change observed in that session, relative to Session 1 (pre-intervention) baseline, as determined by the mixed effects model. As described in the main text, both the arcuate fasciculus (AF) and inferior longitudinal fasciculus (ILF) showed significant change between sessions for the intervention group (filled bars), but not the control group (unfilled bars). Asterisks indicate a significant decrease in MD (B) or increase in FA (C) for each session relative to the pre-intervention baseline at a Bonferroni corrected p < 0.05 (*) and p < 0.01 (**).

Reading intervention does not 'normalize' differences in the white matter
One possible interpretation of group differences in MD and FA between good and poor readers is that these differences reflect abnormal tissue properties in poor readers. In that case, one could expect that remediation of reading difficulties might be associated with a "normalization" of deficits in white matter structure. However, we find that intervention driven changes in white matter do not follow the trajectory predicted by a normalization account. Figure 3 shows changes in MD and reading scores for the intervention group, relative to the subset of non-intervention controls who had reading skills in the typical range. We defined 'Typical Readers' as Control Group subjects with timed (TOWRE Index) and untimed (WJ Basic Reading Score) reading accuracy within a standard deviation of the population mean (at or above 85 on both measures). For the intervention group, we plot changes in both WJ Basic Reading and the TOWRE Index (rather than composite Reading Skill), in order to situate the cross-sectional and intervention-driven effects relative to an age-normed, population mean. After completing the intervention, tissue properties in the intervention subjects were not more similar to the typical reading controls, despite a substantial improvement in reading skills. As diffusion properties such as MD are influenced by multiple biological sources, this finding indicates that short-term plasticity is likely to reflect a different biological mechanism than the group differences reported here and in other studies. Further, the short-term, experiencedependent changes in the white matter were larger (Cohen's d = 0.75 for the AF, and d = 0.66 for the ILF) than typical group difference reported in the literature 8,19,53 , and the group differences observed here (d = 0.53 for the AF and d = 0.59 for the ILF). These results demonstrate that the effects of recent experience on measured tissue properties may equal or exceed effects due to intrinsic or long-term maturational factors, suggesting that group differences measured in cross-sectional studies may, in some cases, be driven by systematic differences in environmental influences between groups.  (1)(2)(3)(4) for the left arcuate, ILF, and posterior callosal tracts in the intervention group. The gray ellipse in each panel shows the mean and standard error for the subset non-intervention control subjects with reading skills in the typical range (poor reading controls were excluded, leaving n = 10 typical reading controls). The dashed gray arrow shows the expected trajectory for MD values if the intervention group were to become more similar to the typical reading controls in terms of both reading skills and MD. In contrast, the true trajectory of change in plotted as a colored arrow in each panel. The intervention group includes some readers with only moderate reading impairments (and, therefore, higher MD values), and so the group difference in pre-intervention scores is less than would be expected for a group of good vs. poor readers.

Anatomy-behavior correlations depend on recent experience for the arcuate and ILF
Over the course of the intervention, only the posterior callosal connections retained their relationship to Reading Skill. In contrast, as MD values declined in the AF and ILF, the instantaneous, cross-sectional correlation between reading and MD changed between sessions, as indicated by a significant interaction between MD and session in predicting Reading Skill for the intervention group (linear mixed effects model predicting Reading Skill from MD, session, and their interaction, with a random effect of subject, see . This is consistent with the stability of diffusion properties in this group, and supports the notion that the significant interaction for the intervention subjects did not arise due to differences in measurement noise over time. Finally, to rule out the possibility that systematic differences in head motion might influence anatomy-behavior correlations (e.g., children with lower reading scores might move more in the scanner than children with higher reading scores), we calculated the correlation between head motion and Reading Skill. Motion and Reading Skill were unrelated (r(97) = 0.13, p = 0.19).

White matter properties of the AF and ILF change in concert and track individual learning
The AF and ILF connect distinct components of the reading circuitry and are thought to carry signals that contribute uniquely to the reading process 37,47,55,62 . Therefore, a reading intervention might affect these tracts differently, prompting changes that reflect independent biological processes unfolding with different time-courses, and reflecting different aspects of learning. To address this possibility, we asked whether changes in the AF and ILF occur in synchrony in the intervention group. If wholly independent mechanisms were driving growth in both tracts, we would not expect to see similar time-courses of growth for the AF and ILF within subjects. Alternatively, if changes within the AF and ILF reflect a common biological mechanism operating over a large anatomical scale, then time-courses of growth should be correlated within subjects.
To address these questions, we fit a linear mixed effects model to all intervention subjects' mean-centered diffusion measurements over all time points. This allowed us to quantify the similarity between AF and ILF longitudinal growth trajectories while excluding betweensubject differences in baseline diffusion properties 63 . Results for complementary analysis, examining diffusion measurements relative to a pre-intervention baseline, is given in Supplementary Table 5. As shown in Figure 5, the time-courses of change in the AF and ILF were highly correlated for both MD and FA (MD: r = 0.86, p < 0.001; FA: r = 0.50, p = 0.021), implying that, within each individual, white matter growth trajectories were tightly coupled for these two tracts. We then fit the same model for time-lagged versions of each tract's timecourse, to test whether these regions changed in synchrony. If growth in one tract were to precede growth in the other, this would imply a distinct and more gradual process occurring in the second tract, or a possible causal relationship. In that case, the time-courses should be better predicted by time-lagged versions each other. However, we failed to detect a significant correlation at any non-zero lag, suggesting that these tracts change in concert as a function of experience in the reading intervention program.
Finally, to examine the relative timing of white matter changes in relation to learning, we performed the same cross-correlation analysis with the reading scores: Each intervention subject's reading scores were mean-centered to remove inter-subject differences in baseline reading ability, and a linear mixed effects model was fit to shifted (lag = -1 and lag = 1) and un-shifted (lag = 0) versions of the time-courses. Time courses of MD, but not FA, were significantly correlated with time-courses of Reading Skill only at lag = 0 (MD: r = -0.30 Arcuate, p = 0.0069; r = -0.30 ILF, p = 0.0061), demonstrating that, within a subject, the timecourse of white matter plasticity tracked the time-course of learning. For MD, we again found that the growth trajectories were best fit by the un-shifted time courses, suggesting that white matter changes are coupled to reading experience and, therefore, track improvements in Reading Skill. In the control group, no tracts showed a significant relationship to reading skill at any lag (shown for lag = 0 in Supplementary Table 6), consistent with the stability of both reading and white matter properties in control subjects. The scatter plots in panel B and D also make it clear that the time-course of plasticity is more tightly coupled across tracts than it is to behavior. Hence, even though there is a statistically significant relationship between the time-course of white matter and behavioral changes, there is also un-explained variance that is likely to be related to aspects of the intervention environment that do not directly impact behavior.
Since a substantial proportion of the total changes in MD occurred during the first 2 weeks of intervention, we also examined the relationship between reading and white matter changes during this interval by correlating Session 2 vs. Session 1 difference scores. Individual differences in the magnitude of session 2-1 MD change were not significantly correlated with the magnitude of reading score change. As shown in Supplementary Table 7, we observed a trend for both raw and standardized reading scores. Since this analysis only includes half of the data, we cannot ascertain whether the result represents the absence of a relationship at this short timescale, or the lack of statistical power.

Reading intervention prompts distributed changes in white matter
White-matter growth rates were highly correlated for two tracts considered to be critical for skilled reading, meaning that a subject showing rapid, intervention-driven growth in the AF also shows considerable growth in the ILF. However, changes in mean diffusivity and fractional anisotropy were not limited to the connections of the core reading circuitry; instead, we observed significant change throughout a collection of tracts, extending beyond our a priori hypothesis. Figure 6a models growth in MD as a linear function of the number of intervention hours, and we use a conservative Bonferroni correction in this exploratory analysis. In the intervention group, 12 out of 18 tracts showed significant (Bonferroni corrected) change. None of the 18 tracts showed significant change in either MD or FA in the control group. Further, as shown in Table 1, multiple tracts showed a significant relationship to changes in reading skill, including, but not limited to, the core circuitry for reading. (See Supplementary Table 8 for a complementary analysis relating FA and Reading Skill.) Therefore, learning effects are not specific to tracts that are considered to be the core circuitry for reading, and intervention-driven changes are evident in an extensive collection of white matter tracts. Given that intervention effects appear to be spatially widespread, and that changes within two key tracts, the AF and ILF, are tightly coupled, we next examined the correlation structure across the full collection of tracts showing intervention-driven growth. Specifically, we tested whether growth rates are solely coupled within the AF and ILF, versus a larger collection of tracts. To that end, we fit linear growth rates (change in MD or FA as a function of hours of intervention) to each subject's data for the 18 tracts and then computed the correlation between growth rates across each pair of tracts. To assess the suitability of a linear model, we used Bayesian Information Criteria (BIC) 59,60 to evaluate the linear model relative to two non-linear models, one with a quadratic and one with an additional cubic component. In all tracts with significant intervention-driven effects, the linear model outperformed both the quadratic and cubic models.  Figure 6b shows the correlation between linear growth rates of pairs of tracts across individuals. The ordering of the tracts was determined according to a hierarchical clustering of these correlation coefficients. This analysis revealed that many tracts show highly correlated intervention-driven changes (r > 0.7) and identified a cluster containing many of the cortical association tracts (the left and right ILF, SLF, IFOF, and arcuate, as well as the left uncinate and left corticospinal tracts) which all changed in concert. In addition, we identified a separate cluster of tracts whose properties change during the intervention, but with independent growth rates. For example, highly significant growth rates are observed bilaterally in the thalamic radiation, but these growth rates are not correlated with growth measured in the left arcuate (Figure 6c). Accordingly, these tracts are assigned to distinct clusters. We suggest that changes within these distinct clusters may reflect distinct biological mechanisms. A complementary analysis of FA is provided in Supplementary Figure 2, and identifies a consistent clustering of the tracts.

Discussion
Intensive reading training causes rapid changes in tissue properties within the left arcuate and inferior longitudinal fasciculus, two tracts considered crucial for skilled reading. However, the effects of intervention are not limited to these regions. Instead, we find widespread change throughout multiple cortical association and projection tracts. Importantly, within individuals, intervention-driven effects are tightly coupled across this collection of tracts. Further, tissue properties and reading skills change in concert: An individual's time-course of white matter changes tracks their time-course of changes in reading skill. This suggests that the white matter rapidly adapts to the changing environmental demands posed by the intervention. The extent of plasticity in the white matter has important implications for the interpretation of correlations between white matter tissue properties and academic skills: As cross-sectional correlations change week-to-week, correlations measured at any single time point offer an incomplete, and potentially misleading, view of the underlying relationships between anatomy, behavior and experience.
Intervention leads to rapid changes that are distributed across cortical association and projection tracts, including, but not limited to, the left arcuate and left inferior longitudinal fasciculus. These tracts connect distinct components of the reading circuitry, and are generally considered to support separable aspects of reading. For example, the AF has been linked specifically to phonological awareness 8,35 , while the ILF, which projects to the 'visual word form area' (VWFA) 47 , may be especially involved in visual word recognition. Typically, over years of development, growth rates for these two tracts are independent from each other 38 . We therefore hypothesized that the learning process might differentially affect tissue properties within these tracts. Further, given the diversity of behavioral profiles seen in people with dyslexia, subjects could show differing spatial profiles of change. For example, a subject with strong intervention-driven effects within the AF might show smaller effects within the ILF, while another subject might show the opposite pattern. However, our results support an alternative view. Longitudinal changes in the AF and ILF are tightly coupled within subjects and also correlated to changes in many other white matter tracts, suggesting that these effects arise from a common biological mechanism operating over a large anatomical scale.
Typically, dMRI studies of the white matter seek to identify a single critical structure that is related to a specific cognitive skill. Our measurements offer a different view on white matter plasticity and learning: Anatomically widespread effects may be a hallmark of rapid, shortterm plasticity associated with intensive training of reading skills. Since reading depends on the coordination of a large cortical network, training of reading skills may prompt particularly widespread effects across the white matter. Functional changes measured with fMRI after reading training appear to be widespread 64 , affecting multiple sites within the cortical and subcortical reading network. However, a relatively small and focal change in anatomy could theoretically produce widespread functional changes, and therefore these effects need not be accompanied by large-scale anatomical remodeling. Indeed, a small number of past studies in human subjects have reported focal changes in white matter after training of reading skills 32,65 , but past work has not employed the intensive training paradigm used here (see also 33 ). Alternatively, the widespread effects may reflect general mechanisms of learning during an intensive educational experience, and therefore may not be specific to the curriculum of this reading intervention.
It is important to note that the tracts identified in this analysis, including the left hemisphere ILF, SLF, and AF, carry signals that are relevant for a number of cognitive functions 66 , not only reading [67][68][69][70][71] . Interestingly, individual differences in plasticity within the left AF have recently been linked to individual gains in math skills following math intervention 72 , even though the left AF is conventionally associated with language related skills. It should be noted, however, that in 72 , math skills training did not produce a significant change in the arcuate at a group level, and therefore the previous set of findings differ from ours. Given the relatively coarse (mm) scale of dMRI, it is possible that distinct types of intervention (e.g., training in reading versus math skills) affect distinct subpopulations of fibers with distinct cortical terminations and functional roles. However, an alternative interpretation also emerges from the current study: Intensive training may lead to plasticity within regions that are not necessarily critical for performing the trained task, and thus intervention-driven effects in the left AF might reflect general mechanisms that are common to learning both reading and math. Despite the lack of a group-level intervention effect in the left arcuate in 72 , it remains possible that a sufficiently intense math intervention might prompt changes not only within the left arcuate, but within many of the same tracts identified here. Indeed, our effects may reflect the intensity and quality of the learning environment, rather than the specific trained skills. Moreover, since it would not be feasible to enroll skilled readers in a highly intensive reading intervention program, it is unclear whether the observed effects are specific to individuals with reading difficulties. Future work examining the generalizability of these effects in other domains, such as math, would allow an examination of general learning effects in a broader population and should help clarify the role of domain specific deficits.
What biological mechanism might underlie the observed effects? Changes in the diffusion signal can arise from multiple sources, including use-related swelling and branching of glial cells 30,31,73 , changes in vasculature, myelination of unmyelinated axons, myelin remodeling, and/or growth of new myelinating oligodendrocytes (reviewed in 74 ). Oligodendrocyte precursor cells are present throughout the white matter, and large-scale proliferation of oligodendrocytes has been shown in mice within hours of optogenetically-stimulated activity in adult motor cortex 23 . Mature oligodendrocytes, in turn, participate in myelin maintenance and remodeling throughout the lifespan. Thus, a particularly intriguing possibility is that an initial pattern of widespread change in diffusion properties reported here might reflect rapid growth of myelinating oligodendrocytes, of which only a fraction will ultimately contribute to new myelin sheaths within focal, task relevant regions. In that case, it should be possible to differentiate diffusion signal changes related to rapid growth of oligodendrocytes from signal changes related to longer-term changes in myelin after the training period has ended. In particular, we might expect a rapid initial large decrease in MD, since diffusion would be hindered by new oligodendrocytes. Subsequent changes in myelin might emerge as relatively smaller, persistent changes in other quantitative MRI parameters 62,75-77 .
Intervention-driven changes in white matter do not follow the trajectory predicted by a normalization account, in which remediation of reading difficulties could be expected to eliminate differences in the white matter between children with dyslexia and typical readers. Instead, we find the opposite: Short-term learning-induced changes are large relative to baseline group differences, and they deviate from the normalization prediction, as shown in Figure 3. Previously reported group differences may therefore be driven in large part by environmental differences between groups, since systematic difference in the environment (e.g., differences in the quality or intensity of recent educational experiences for dyslexic versus control subjects) could be expected to exert a large influence on diffusion measurements, and to potentially counteract or change pre-existing anatomy-behavior relationships. This offers an explanation for why some studies find a positive correlation between FA and reading skills 14,19 , while other studies find a negative correlation between FA and reading skills 8,78 within the exact same tracts.
In contrast to the widespread changes described above, we find that the posterior callosal connections are remarkably stable over the course of intervention, and also show stable correlations with reading skills. Although we interpret this null result cautiously, one possibility is that differences in MD within posterior callosal connections reflect relatively stable anatomical variation, which predicts reading skill, but does not change during shortterm, intensive training. Indeed, the structure of the posterior corpus callosum differs in both children and adults with dyslexia, and the positive correlation between diffusion properties in this pathway and reading skills has been reported by many other studies 3,4,79-81 . These connections are known to mature relatively early; therefore, the subjects in our study may already be outside the sensitive period in which experience shapes these connections. In that case, training at an earlier age might prompt changes in the CC alongside acquisition of reading skills.
In summary, our results show that altering a child's educational environment through a targeted intervention program induces rapid, large-scale changes in white matter tissue properties. We observe changes in both MD and FA that occur over the timescale of weeks, that track changes in an individual's reading skills, and are tightly coupled across tracts connecting distinct parts of the neural circuitry for reading.

Participants
A total of 93 behavioral and MRI sessions were conducted with a group of 24 children (11 female), ranging in age from 7 to 12 years, who participated in an intensive summer reading intervention program. Members of the intervention group were recruited based on parent report of reading difficulties and/or a clinical diagnosis of dyslexia. An additional 52 behavioral and MRI sessions were conducted with 19 participants, who were matched for age but not reading level. These subjects were recruited as a control group to assess the stability of our measurements over the repeated sessions. Control subjects participated in the same experimental sessions, but did not receive the reading intervention. Ten of these subjects had typical reading skills (4 female), defined as a score of 85 or greater on the Woodcock Johnson Basic Reading composite and the TOWRE Index. Nine had reading difficulties (3 female), defined as a score below 85 on either the Woodcock Johnson Basic Reading composite or the TOWRE Index. Reading assessments were carried out at the start of the intervention period to confirm parent reports and establish a baseline for subsequent estimates of growth in reading skill. Demographics and initial test scores are summarized in Table 2.
All participants were native English speakers with normal or corrected-to-normal vision and no history of neurological damage or psychiatric disorder. We obtained written consent from parents, and verbal assent from all child participants. All procedures, including recruitment, consent, and testing, followed the guidelines of the University of Washington Human Subjects Division and were reviewed and approved by the UW Institutional Review Board.

Reading intervention
Intervention subjects were enrolled in 8 weeks of the Seeing Stars: Symbol Imagery for Fluency, Orthography, Sight Words, and Spelling 82 program at three different Lindamood-Bell Learning Centers in the Seattle area. The intervention program consists of directed, one-on-one training in phonological and orthographic processing skills, lasting four hours each day, five days a week. The curriculum uses an incremental approach, building from letters and syllables to words and connected texts, emphasizing phonological decoding skills as a foundation for spelling and comprehension. A hallmark of this intervention program is the intensity of the training protocol (4 hours a day, 5 days a week) and the personalized approach that comes with one-on-one instruction.

Experimental Sessions
Subjects participated in four experimental sessions separated by roughly 2.5-week intervals.
For the intervention group, sessions were scheduled to occur before the intervention (baseline), after 2.5 weeks of intervention, after 5 weeks of intervention, and at the end of the 8-week intervention period. For the control group, sessions followed the same schedule while the subjects attended school as usual. This allowed us to control for changes that would occur due to typical development and learning during the school year. Twenty-one intervention subjects completed all four experimental sessions; 3 subjects completed only 3 sessions, which fell at the start, middle and end of the intervention. In the control group, 7 subjects completed all 4 sessions; 12 subjects completed at least 3 sessions; 14 subjects completed at least 2 sessions; 19 subjects completed at least one session.
In addition to MRI measurements, described in greater detail below, we administered a battery of behavioral tests in each experimental session. These included sub-tests from the Wechsler Abbreviated Scales of Intelligence (WASI), Comprehensive Test of Phonological Processing (CTOPP-2), Test of Word Reading Efficiency (TOWRE-2) and the Woodcock Johnson IV Tests of Achievement (WJ-IV). Rather than analyzing each subtest individually, we created a general reading skills index by conducting a principal component analysis on subtests from the latter two batteries (TOWRE and WJ-IV) and taking scores from the first principal component, which accounted for 83.76% of the total variance in reading performance (Supplementary Figure 1). We used this measure for all subsequent analysis in order to avoid issues that arise from multiple comparisons, and to increase the reliability of our reading skill index. Our Reading Skills index was highly correlated with both the WJ-BRS composite (r(97) = 0.95, p < 0.001) and the TOWRE composite (r(97) = 0.96, p < 0.001).

MRI Acquisition and Processing
All imaging data were acquired with a 3T Phillips Achieva scanner (Philips, Eindhoven, Netherlands) at the University of Washington Diagnostic Imaging Sciences Center (DISC) using a 32-channel head coil. An inflatable cap was used to minimize head motion, and participants were continuously monitored through a closed circuit camera system. Prior to the first MRI session, all subjects completed a session in an MRI simulator, which helped them to practice holding still, with experimenter feedback. This practice session also allowed subjects to experience the noise and confinement of the scanner prior to the actual imaging sessions, to help them feel comfortable and relaxed during data collection.
Diffusion-weighted magnetic resonance imaging (dMRI) data were acquired with isotropic 2.0mm 3 spatial resolution and full brain coverage. Each session consisted of 2 DWI scans, one with 32 non-collinear directions (b-value = 800 s/mm 2 ), and a second with 64 non-collinear directions (b-value = 2,000 s/mm 2 ). The gradient directions were optimized to provide uniform coverage 83 . Each of the DWI scans also contained 4 volumes without diffusion weighting (bvalue = 0). In addition, we collected one scan with 6 non-diffusion-weighted volumes and a reversed phase encoding direction (posterior-anterior) to correct for EPI distortions due to inhomogeneities in the magnetic field. Distortion correction was performed using FSL's topup tool 84 . Additional pre-processing was carried out using tools in FSL for motion and eddy current correction 85 , and diffusion metrics were fit using the diffusion kurtosis model 86 as implemented in DIPY 87 . Data were manually checked for imaging artifacts and excessive dropped volumes. Given that subject motion can be especially problematic for the interpretation of group differences in DWI data 88 , data sets with mean slice-by-slice RMS displacement > 0.7mm were excluded from all further analyses. Datasets in which more than 10% of volumes were either dropped or contained visible artifact were also excluded from further analysis. In total, these criteria removed 13 out of 93 total intervention datasets, and 3 out of 52 control datasets.
To further quantify potential effects of motion, we tested for differences in motion across sessions and subject groups (intervention vs. control; see Supplementary Figure 3), after excluding datasets based on the criteria listed above. We observed no difference in motion as a function of session (F(3,121) = 0.090, p = 0.97) or group (F(1,121) = 2.54, p = 0.11), and no group-by-session interaction (F(3,121) = 0.30, p = 0.83). Thus, we do not attribute the between-session changes in white matter within the intervention group to systematic differences in motion. Further, including motion as a covariate in our analysis did not change our results, as described below.

White Matter Tract Identification
Fiber tracts were identified for each subject using the Automated Fiber Quantification (AFQ) software package 38 , after initial generation of a whole-brain connectome using probabilistic tractography (MRtrix 3.0) 89 . Fiber tracking was carried out on an aligned, distortion corrected, concatenated dataset including all four of the 64-direction (b-value = 2,000 s/mm 2 ) datasets collected across sessions for each subject. This allowed us to ensure that estimates of diffusivity and diffusion anisotropy across session were mapped to the same anatomical location for each subject, since slight differences in diffusion properties over the course of intervention can influence the region of interest that is identified by the tractography algorithm. We also replicated our main results using tractography derived separately for each session and subject (see Supplementary Figure 4).
We focused our initial analysis on 3 tracts that are thought to connect the core reading circuitry 37,38

Quantifying White Matter Tissue Properties
To detect intervention-driven changes in the white matter, we fit the diffusion kurtosis model 86 as implemented in DIPY 87 to the diffusion data collected in each session. The diffusion kurtosis model is an extension of the diffusion tensor model that accounts for the non-Gaussian behavior of water in heterogeneous tissue containing multiple barriers to diffusion (cell membranes, myelin sheaths, etc.). After model fitting, diffusion metrics were projected onto the segmented fiber tracts generated by AFQ. Selected tracts were sampled into 100 evenly spaced nodes, spanning termination points at the gray-white matter boundary, and then diffusion properties (mean, radial, and axial diffusivity (MD, RD, AD) and fractional anisotropy (FA)) were mapped onto each tract to create a "Tract Profile".

Code and Data Availability
All code and data required to reproduce reported findings is available at [URL available upon publication].

Statistical Analysis
Data analysis was carried out using software written in MATLAB. To assess change over the course of intervention, we first averaged the middle 60% of each tract to create a single estimate of diffusion properties for each subject and tract. We selected the middle portion to eliminate the influence of crossing fibers near cortical terminations, and to avoid potential partial volume effects at the white matter / gray matter border. Mean tract values were then entered into a linear mixed effects model, with fixed effects of intervention time (either hours of training, or session entered as a categorical variable) and a random effect of subject. We modeled the relationship between white-matter properties and behavior in a similar fashion, predicting Reading Skill from mean tract values and session, with subjects treated as a random effect.
We further examined the time-course of change in white matter and reading skills by (1) performing a cross-correlation analysis on individual longitudinal trajectories and (2) calculating individual linear growth rates, which allowed us to directly model relationships between behavioral and white-matter growth rates across subjects.
Finally, to examine the anatomical specificity of intervention-driven changes, we fit a mixed linear model to the growth trajectories of a large collection of white matter tracts. We then performed hierarchical clustering on the correlations between linear growth-rates, using a complete-linkage clustering algorithm implemented in MATLAB, to test for correlated growth trajectories across a large collection of cortical association tracts.     Tracts showing significant change (Bonferroni corrected p < 0.05) are highlighted in red. Tracts with significant change before correction are indicated with a single (uncorrected p < 0.05) or double (uncorrected p < 0.01) asterisk.