Academic mobility in U.S. public schools: Evidence from nearly 3 million students

We use administrative panel data from seven states covering nearly 3 million students to document and explore variation in “ academic mobility, ” a term we use to describe the extent to which students ’ ranks in the distribution of academic performance change during their public schooling careers. We find that student ranks are highly persistent during elementary and secondary education — that is, academic mobility is limited in U.S. schools on the whole. Still, there is non-negligible variation in the degree of upward mobility across some student subgroups as well as individual school districts. On average, districts that exhibit the greatest up-ward academic mobility serve more socioeconomically advantaged populations and have higher value-added to student achievement.


Introduction
An effective and equitable education system can be viewed as a form of social insurance against a poor birth endowment-even in the face of considerable obstacles, access to effective schools can provide a pathway to success.However, the performance of the U.S. education system in this regard leaves much to be desired.Students from different socioeconomic backgrounds enter K-12 schools already exhibiting large achievement gaps and these gaps persist, or even widen, as they progress through school (Haskins and Rouse, 2005; Jang and Reardon, 2019;  Reardon, 2011).It would be a mistake to conclude U.S. public schools do not contribute to social equity (counterfactual equity conditions would almost surely be worse in their absence), but the inability of schools to narrow achievement gaps during elementary and secondary education is an ongoing policy concern.
In this article, we introduce the concept of "academic mobility" to study the persistence of student placements in the distribution of academic performance during elementary and secondary education.An education system with high academic mobility is one where students' early-grade ranks are less predictive of their later-grade ranks.We estimate academic mobility using administrative panel data from seven states covering almost 3 million students.
Our estimation procedures borrow from tools developed in a related literature on economic mobility including Chetty, Hendren, Kline, and  Saez (2014) and Chetty, Hendren, Jones, and Porter (2018). 1 We assess students' initial performance levels using test scores in the third grade, which is the earliest grade we have universal data on test performance in public schools.Then, we use four long-term outcomes to estimate academic mobility during K-12 education: eighth-grade test performance, high-school test performance, on-time high school graduation, and high school graduation within one year of on-time.For most of our analysis we focus on upward mobility among initially low-achieving students.Again following the recent literature on economic mobility, we define "absolute upward mobility" as academic mobility measured at the 25th percentile of the distribution of initial performance ranks.
We find that students' ranks in the distribution of academic performance are highly persistent during K-12 education.It follows that absolute upward mobility is low.For example, on average across our seven states, a student at the 25th percentile of the academic performance distribution in the third grade can be expected to perform at roughly the 30th percentile by high school.Moreover, conditional on beginning with a low rank, students from more advantaged backgrounds generally have greater upward mobility than their less advantaged peers.Asian students in particular have very high upward mobility.These results buttress existing research on the persistence, and even widening, of achievement gaps during K-12 schooling.
We also explore variation in academic mobility across school districts.Despite finding academic mobility is low on average, we document statistically and economically significant variance in upward mobility across districts.We decompose the variance into two components: "baseline mobility" and "relative mobility."Districts with high baseline mobility promote gains throughout the performance distribution; initial low achievers are caught in a rising tide that lifts all boats in these districts.In districts with high relative mobility, initial low achievers gain on their higher-achieving peers as they progress through school-i.e., the within-district achievement gap narrows over time. 2 Variation across districts in both components contributes to the total variance in absolute upward mobility, but we find that most of the variance in upward mobility is driven by cross-district differences in baseline mobility.While our mobility metrics are descriptive and should not be interpreted causally, these results are informative about the ways in which districts are likely to help low-achieving students improve.The low variation in relative mobility suggests limited differences in success across districts at reducing within-district achievement gaps.
We also explore the correlates of absolute upward mobility at the district level and show mobility is largest in districts serving more socioeconomically advantaged students.For instance, mobility is higher in districts where local-area incomes, education levels, and residential stability are higher, and where more Asian and White families live.Independent of these attributes, district value-added to student achievement is also a strong predictor of high upward mobility.When we estimate district-level academic mobility separately for Black, Hispanic, and low-income students to allow for heterogeneity in the correlates of mobility, we generally find the same factors predict upward academic mobility for all students.

Data
We use state administrative panel data from public schools in seven states-Georgia, Massachusetts, Michigan, Missouri, Oregon, Texas, and Washington.We assemble cohorts of all students with standardized test scores in math and English language arts (ELA) in the third grade and follow them through high school.
Table 1 reports descriptive information for the third-grade cohorts in each state, as well as for K-12 students in the entire U.S. for comparison.We track academic mobility for two to four cohorts of students in the sample states between 2005 and 06 and 2008-09 (hereafter, including in Table 1, we identify school years by the spring year; e.g., 2006 for "2005-06"). 3Our analysis includes about 2.9 million students.The sample states exhibit substantial heterogeneity in their populations.For example, the shares of Black and Hispanic students across states range from 3.0 to 38.1 and 4.0 to 47.7, respectively.There is also considerable variation across states in the shares of students: receiving free or reduced-price lunch (FRL), identified for an Individualized Education Program (IEP), and who are geographically mobile. 4In addition, the structure of the education system differs across states in terms of the shares of schools located in urban/suburban/rural areas, and the numbers of districts and schools, both in absolute and per-capita terms.While our sample is not representative of the United States, the seven states are diverse and provide substantively different evaluation contexts. 5nder the No Child Left Behind and Every Student Succeeds Acts, all students in public schools are tested in math and English Language Arts (ELA) annually in grades 3-8.As a result, our analysis of academic mobility between grades 3 and 8 is fairly uniform across states (although each state administers its own tests).At the high school level, students must be tested at least once but there is flexibility over the grade and subject.To assess academic mobility based on high-school achievement, in each state we identify the exam with the highest coverage rate administered in a common grade.These tests are shown in Table 2. 6 With the exception of Michigan, which has a universal ACT/SAT policy, the common-grade requirement is such that the test subject is ELAbased.This is because the high school English curriculum is more rigid than in other subjects.The focal tests are administered mostly in the tenth and eleventh grades (except in Georgia), have high coverage rates, and are overwhelmingly taken in a common grade.In Oregon there is no test given overwhelmingly in a common grade in high school, so we omit Oregon from the portion of the analysis focused on high school test ranking.
In addition to assessing test-based academic mobility, we also assess mobility in terms of the likelihood of high school graduation.We consider both on-time graduation and graduation within one year of ontime.

Overview
Our methodological approach follows the framework developed by Chetty, Hendren, Kline, and Saez (CHKS, 2014) and Chetty, Hendren,  Jones, and Porter (CHJP, 2018) to study intergenerational economic 2 Both a student's absolute position in the performance distribution and a student's relative position within a class, school, or district are important outcomes of interest.A student's absolute position is important given causal evidence on the link between test scores and later life outcomes (Goldhaber and Özek, 2019).There is also increasing evidence that a student's relative rank has independent effects on student behaviors and outcomes, as social comparisons help to shape ability beliefs.See, for instance, Cicala et al. (2018), Denning  et al. (2020), Elsner and Isphording (2017a, 2017b), Elsner et al. (2019), and Murphy and Weinhardt (2020). 3The earliest cohort is from 2006 because this is the first year of consistent testing in grades 3-8 in most states, and the latest cohort is from 2009 because this is the oldest cohort for whom we can track graduation outcomes (within one year of on-time graduation) using our data panels.
4 Geographic mobility is defined by students who are enrolled in more than one school during the year in which they took the third grade test.States differ in terms of the frequency of collecting school enrollment information, which may account for some of the heterogeneity across states in this variable.The FRL data used for these cohorts pre-date the option of schools to use the Community Eligibility Provision (Koedel and Parsons, 2021). 5Appendix Table A1 further shows that enrollment shares in charter schools in our third-grade cohorts is small in all states, ranging from 0 to 7.7 percent, with a median value of 1.8 percent. 6The requirement of a common grade limits concerns about the confounding effect of test timing on our cross-district measures of academic mobility, which has come up most often with respect to studies of Algebra-I end-of-course exam performance (Clotfelter, Ladd, and Vigdor, 2015; Domina et al., 2015; Parsons  et al., 2015).
W. Austin et al.  mobility.The test-based mobility metrics we use link percentile rankings in the test distribution at different points in the schooling career.Like CHKS and CHJP, we have sufficiently rich data to describe the joint distribution of early-and late-career student performance nonparametrically in the form of 100x100 percentile matrices for each outcome and state.However, a key insight from CHKS that permits a more parsimonious presentation is that the rank-rank relationship between intergenerational economic outcomes is functionally linear.This is also true in our application, allowing us to summarize academic mobility with just the slope and intercept parameters from a linear regression.
We illustrate the linearity of the rank-rank relationships using binned scatterplots of students' entry and late-outcome test ranks in each state.Fig. 1 shows scatterplots from Georgia as an example.The top two graphs are for test scores.The entry ranks on the horizontal axis are the average rank in math and ELA in the third grade.The outcome ranks on the vertical axes are: (1) the average rank on math and English Language Arts (ELA) tests in the eighth grade and (2) the rank on the high school test listed in Table 2. Similar graphs for all sample states are provided in Appendix Figure A1.
The test-based rank-rank relationships are linear, at least to a close approximation, in all states for all tests (we discuss the scatterplots for graduation outcomes below).Given this, the mapping between students' early-and late-career outcomes can be summarized by equation ( 1): where O i is a late-career outcome rank for student i and R i is the initial rank in the third grade.
When the rank-rank relationship is estimated on the entire population (in our case, an entire state), α and β are mechanically linked.To see why, note the estimated regression line must pass through the mean of the data, which in a percentiles-on-percentiles model means it must pass through (50, 50).As a result, the mobility relationship is fully captured by the slope coefficient, β, which also defines the y-intercept, α.
When we disaggregate the data below the state level-i.e., for subpopulations of students within a state or for individual school districts-the parameters α and β are separately identified and provide unique information about baseline and relative mobility, respectively.This is because the rank-rank regression line need not pass through the point (50, 50) for each subpopulation.Consider the following versions of equation ( 1) that permit subgroup analyses: In equation ( 2), the subscript s indicates group membership for student i.We define groups s by race/ethnicity, FRL eligibility, and the urbanicity of the school attended in the third grade (urban, suburban, or rural).In equation (3), the subscript d identifies students who attend district d in the third grade.As long as the dependent and independent variables in equations ( 2) and (3)-which are in percentiles-continue to be calculated from the full statewide distributions of test scores, the intercepts and slopes for the groups indexed by s and d are separately identified and provide unique information about the nature of academic mobility.
Total academic mobility at initial percentile p, inclusive of baseline and relative mobility, can be expressed for district d as follows: Similarly, Ōps gives the student-subgroup analog.Like in CHKS, we focus on total mobility of students at the 25th percentile of the initial performance distribution to produce measures of absolute upward mobility for initially low-achieving students, denoted by Ō25 .From equation (4), Ō25 for students in district d is estimated by αd + βd *25.
It is straightforward to interpret a higher value of baseline mobility (α) as a positive attribute, but the same is not true of relative mobility   Notes: In Washington, a test change led to the change in the grade in which the third-grade cohorts took their high school exit exams (from grade 10 to 11), as shown in the Table .Michigan transitioned from the ACT to the SAT in the 2016-17 school year.The first two analysis cohorts took the ACT in 11th grade; the second two cohorts took the SAT in 11th grade.In Oregon, there is no single high school test given to more than 90 percent of students in a fixed grade to support our analysis of mobility using HS test achievement.
W. Austin et al.  (β).For instance, consider two districts where initially low-achieving students perform similarly.The district with higher relative mobility will be the one where initial high achievers perform relatively worse.Many researchers and education systems use the district achievement gap as a measure of performance, but this is an insufficient measure due to the potential tradeoff between inequality and the outcome level for initial low achievers.A rightward shift in the entire achievement distribution that is more pronounced at higher achievement percentiles would increase both the achievement gap and the expected outcome percentiles for initial low achievers, while a leftward shift in the entire distribution that is again more pronounced at the upper percentiles would reduce both the achievement gap and the expected outcome percentiles for initial low achievers.Finally, we turn to the application of this framework to analyze graduation outcomes.Although graduation is a binary outcome, the academic mobility parameters are conceptually similar in the graduation models (CHKS, 2014).For example, Ō25d for on-time graduation indicates the likelihood of high-school graduation for a student at the 25th percentile of the third grade performance distribution from district d.This likelihood can be compared to the likelihood of graduation for a student in the 25th percentile in district c to compare mobility across districts measured by graduation.
The bottom graphs in Fig. 1 show binned scatterplots mapping students' entry percentiles to their graduation outcomes in Georgia and Appendix Figure A1 shows similar scatterplots for the other states.The plots are roughly linear throughout most of the initial rank distribution (about the upper 80 percent).The nonlinearity at lower entry percentiles is due to a combination of strong floor effects and the fact that graduation is a binary outcome.

Measurement error in Students' initial test scores.
The initial percentile ranks are based on students' third grade scores on high-stakes state tests.These tests meet the highest standards of test publishers in terms of their reliability, but they are not error-free. 7Measurement error in these tests comes from two broad sources (Boyd et al., 2013; Lockwood and McCaffrey, 2014): (1) the tests rely on a finite number of questions to assess student knowledge, making student scores subject to test-item sampling variance, and (2) idiosyncratic factors associated with student or test circumstances on the day of the test (e.g., the proverbial dog barking in the parking lot).
Measurement error in students' initial scores will result in mean reversion.If left unaccounted for, this will lead us to overstate academic mobility.To illustrate consider an extreme scenario where initial scores are comprised entirely of error.Under the standard assumption that the error is uncorrelated with the outcome, the expected value of β would be zero, implying perfect academic mobility.More generally, measurement error in students' initial scores will attenuate our estimates of β and correspondingly inflate our estimates of α.8 Within the context of a latent-ability framework, we use two approaches to address the measurement error problem.Both leverage the fact that we observe two different measures of skill in the third grade from the math and ELA tests.Each measure can be written as a function of general skill, an orthogonal subject-specific skill, and an error.Formally, we write: ) where S M i and S E i are observed test scores for student i in math (M) and ELA (E), respectively, S i is general skill, M i and E i are subject specific skills constructed to be orthogonal to S i , and e M i and e E i are test-specific random measurement errors.e M i and e E i are assumed to be mean-zero and independent.
Our preferred error correction is a two-step procedure where we first average the third-grade ranks in math and ELA to set the initial rank.By averaging the two noisy measures, the error variance is reduced.Then, we make an additional correction to remove error deriving from the testing instruments themselves-i.e., due to sampling variance in the items that appear on the tests.Our correction is based on test reliability ratios reported by test publishers, which we incorporate into our models using a standard errors-in-variables (EIV) regression framework.The EIV models disattenuate β by the expected value of the attenuation bias caused by the measurement error (and correspondingly shrink α).Procedurally, the error variance is subtracted from the total variance in the initial ranks to calculate the error-adjusted parameter β (Fuller, 1987;  Lockwood and McCaffrey, 2014).
A complication is that we use the average of the entry ranks as the initial rank variable, but the reliability ratios from test publishers are for the individual tests.Define r m and r e as the reliability ratios for the third grade math and ELA tests individually, and θ m,e as the correlation of performance on the two tests.Following Wang and Stanley (1970), the reliability of average performance across the two tests is given by: r c = 0.25rm+0.25re+0.50θm,e0.50+0.50θm,e . We use reliabilities based on this equation in our EIV models for each state., 910 Our approach to adjust for measurement error has strengths and weaknesses.A key strength is that it is more efficient than available alternatives-namely instrumental variables (IV), which we discuss below-resulting in more precise estimation.This is especially important when we estimate mobility parameters for each district individually.However, it is not a comprehensive correction and has two notable limitations.First, the averaging strategy, while conceptually appealing, is incomplete since we observe just two scores (Ashenfelter and Krueger,  1994).And while the EIV correction helps, it only addresses measurement error associated with the testing instruments and ignores other sources of error.Therefore, we do not expect our estimates of β to be fully disattenuated; rather, they are lower-bound estimates.This means the EIV specifications will overstate relative academic mobility to some degree (recall that a lower value of β corresponds to more academic mobility).The second limitation is that the EIV approach does not allow for subgroup heterogeneity in the magnitude of measurement error (the publisher-reported test reliabilities are averages across all students and not available for student subgroups).For instance, below we compare academic mobility between FRL and non-FRL students.If the magnitude of measurement error is larger for one of these groups, it could confound the comparison.A similar problem exists for other comparisons, including across school districts.
We assess the severity of these concerns by replicating our entire analysis using instrumental variables to correct for measurement error.We estimate our IV models by first specifying the math score as the primary independent variable, with the ELA score serving as the instrument.Next, we duplicate the dataset and flip the positions of the third-grade tests so the ELA score is the independent variable and the math score is the instrument.Then we stack the duplicated dataset and estimate two-stage-least-squares models on the stacked dataset, with standard errors clustered by student to account for the duplication.This procedure yields first-stage error correction coefficients that are averages of the coefficients obtained when each test is individually specified as the independent variable and instrument, respectively.
The IV approach addresses the two main limitations of the EIV approach.First, the IV error correction is not confined to addressing a specific type of measurement error, making it more comprehensive than the EIV correction based on the test reliability ratios.Second, the IV approach allows for subgroup heterogeneity in measurement error-for instance, if FRL students have more error in their test scores than non-FRL students, the IV models for FRL students will make a stronger correction.
But while these theoretical benefits make the IV approach appealing, it also has limitations.The exclusion restriction in the specification where the 3rd grade ELA score is an instrument for the 3rd grade math score relies on the assumption that 3rd grade ELA score is only related to the outcome through its relationship with the general skill component of the 3rd grade math score.However, if the subject specific ELA skill in 3rd grade (E i in Equation ( 6)) also has a direct relationship with the outcome, the exclusion restriction would be violated.This would likely inflate the estimates of β because the subject-specific skill in third grade would be positively related to skill in the future; the same holds in specifications where the 3rd grade math test serves as the instrument for the 3rd grade ELA test.In addition, the IV models are substantially less efficient. 11he IV analogs to all results that follow in the main text are reported in Appendix B. At a high level, two themes emerge in comparing the EIV and IV results.First, as expected, the EIV estimates of β are smaller, on average, than the IV estimates, though they are at least 0.80 for the eighth grade and high school tests in all states.Thus, despite providing lower-bound estimates of β, the EIV coefficients show limited mobility.Second, while the mobility parameters differ to some degree across methods, our comparative findings are upheld substantively using either approach.That is, the gaps in academic mobility by student characteristics, and the variance in academic mobility across school districts, are similar.This suggests that while conceptually concerning, in practice the potential for measurement-error heterogeneity to confound our comparisons is limited.
Finally, for completeness, we also estimate models where we set the initial rank using the average of the third-grade math and ELA ranks but do not use the reliability ratios to correct for test measurement error.As expected, our estimates of β are consistently smallest using this approach, although even in these models our comparative findings remain similar.Results using this third approach are shown in Appendix B in tandem with the IV results.

Geographic mobility.
Some students exit the public school system and/or leave their home states before we observe their later-grade outcomes.12Table 3 shows system exiters are negatively selected-i.e., the average entry percentiles of students with missing late-grade outcomes are almost always below those of students with observed outcomes. 13The sample attrition raises two concerns.The first is reference bias and applies to our analysis of test percentiles, which are normed against the population of test takers.The departure of negatively selected students, if left unaddressed, would lead us to understate upward mobility even if there is no unobserved selection (note the reference bias issue is not relevant to our analysis of graduation outcomes because these outcomes are not normed in the distribution).The second concern is the potential for unobserved selection into system exit conditional on the initial rank.This concern applies to both our testbased and graduation-based mobility metrics and may be problematic for subgroup analyses.For example, suppose system exiters are negatively selected conditional on their initial ranks and district A has a higher proportion of exiters than district B. The differential attrition between districts will cause a compositional difference in their comparison and lead to an overstatement of outcome variance.
We address the reference-bias concern by including students with missing outcomes in our analysis via imputation.Our imputation procedure uses all available test information prior to the missing outcome, up to the seventh grade, to impute test percentiles in eighth grade and high school, and both graduation outcomes. 14The imputed values allow us to preserve the full entry-cohort distributions in each state, mitigating the concern about reference bias.
To address the concern about unobserved selection, we build hypothetical selection scenarios into the imputation framework.The baseline selection scenario, which we maintain throughout our primary analysis, is that students with missing outcomes are negatively selected on unobservables to the same degree as within-state, cross-district movers.We produce imputed values for students with missing outcomes that embody this condition by relying on observed outcomes for district movers within each state to estimate a "mobility selection parameter."Using this scenario as an anchor, we consider the sensitivity of our findings to four scenarios where the degree of selection into exit is reparameterized relative to baseline are as follows: (1) 25 percent more negative than baseline, (2) 10 percent more negative than baseline, (3) 10 percent less negative than baseline, and (4) 25 percent less negative than baseline.With the selection-adjusted imputed values, we reestimate our academic mobility models to determine the sensitivity of our findings to different assumptions about the direction and magnitude of unobserved selection into system exit, above-and-beyond selection into district mobility within the public school system of a state.Full details regarding our imputation procedure are in Appendix C.
This sensitivity analysis shows that none of our findings are substantively affected by the different unobserved-selection conditions we test.This is the combined result of several aspects of outcome missingness in our data: (1) even in the most extreme unobserved selection scenario, and noting that we already capture observed selection via earlygrade performance, the degree of parameterized negative selection into exit is modest (based on within-state district movers), (2) although the likelihood of outcome missingness is not evenly distributed across student subgroups or districts, the divergence across subgroups and districts is not extreme, and (3) most students do not exit, limiting the scope for attrition to impact our findings.
Finally, we turn to the issue of geographic mobility within states.We assign students to their third-grade districts, which means our estimates of cross-district variability take on an interpretation akin to "intent-totreat" parameters.Although most students remain in the same district during grades 3-12, many change districts as well.Some of the changes are structural (e.g., a district that ends after the eighth-grade) or opportunistic (e.g., our data cover a period of growth in the charter sector in many states), although moves surely occur for many other reasons as well. 15Disentangling the reasons for student mobility across districts, and the implications, is a substantial undertaking and natural extension of this work, but here we focus on understanding differences in academic mobility across districts defined by the district attended in third grade.

Findings
For presentational convenience we focus the discussion of our findings primarily on simple averages of the state-level results. 16However, we conduct our entire analysis separately for each state and present many of the state-by-state results alongside the state averages.State-bystate results that are suppressed in the main text are reported in Appendix A.

Broad patterns of academic mobility at the state level
Table 4 reports estimates of β and Ō25 from equation (1); recall that α is redundant in the statewide models.Panel A shows the cross-state averages of β and Ō25 using each of our three estimation methods (EIV, IV, and uncorrected).Panel B provides state-by-state results using our preferred EIV approach (Appendix Tables B2a and B2b show stateby-state results using the other methods).
Consistent with prior evidence that early measures of achievement are highly predictive of later outcomes, a student's rank in the test distribution in the third grade is a strong indicator of later-grade achievement.Using the EIV approach, the cross-state average estimates of β in the eighth-grade and high-school test models are 0.84 and 0.82, respectively (these averages reflect relatively homogeneous estimates across the seven states, as shown in Panel B).As high as these estimates are, they likely understate β because of the incomplete correction for measurement error.The analogous IV estimates, which we interpret as bounding β from above, are 0.90 and 0.92 on average across states.Put plainly, where a student starts in the distribution when tested in the third grade is highly predictive of where they are in the distribution in eighth grade and high school.
Turning to the graduation models, the estimates of β are lower and more variable across states.The simple-average values of β for on-time and lagged graduation are 0.35 and 0.27, respectively, in the EIV models.The IV estimates are again larger, but still reflect a much weaker gradient between initial percentile ranks and the likelihood of graduating.The weaker gradient is visually apparent in the scatterlplots in Fig. 1 and Appendix Figure A1, and driven by the fact that graduation rates are high over most of the entry-rank distribution.Put another way, because high school graduation is a fairly indiscriminate outcome, earlycareer performance is a weaker predictor of success.
Our estimates of Ō25 for the test outcomes are similar across states and tests, ranging from 28.2 to 33.2, with average values around 30 for each test.The IV analogs in Appendix B are lower-ranging from 27.2 to 29.5.The upper and lower bound estimates of Ō25 thus form a fairly tight window.The graduation-based Ō25 values, which capture on-time and delayed graduation likelihoods for the average 25th percentile student, are 75.8 and 80.6, respectively, on average across the sample states, and again exhibit more state-to-state variability than their testbased analogs.Like with the test-based estimates, the IV estimates for Notes: Sample sizes and entry percentiles are based on the average of the grade 3 math and ELA percentiles (i.e., percentiles at entry).For the test outcomes, the mean of each rank distribution should be 50, but in several states it deviates (very) slightly because of lumpiness in the underlying test-score distributions.For graduation outcomes, we report the percent of students who graduate among stayers because percentiles are not informative.

Table 4
Statewide estimates of β and Ō25 for each outcome.
Panel A. Average statewide estimates of β and Ō25 across the seven states, using different measurement error approaches.
Grade-8 Test HS Test Grad Grad + 1 graduation-based Ō25 in Appendix B are similar to the EIV estimates but slightly smaller.
The similarity across states in the test-based mobility parameters is partly the result of the distributions of test ranks being forced into alignment by the percentiles conversion.This does not happen with graduation outcomes.Thus, one source of differences in the graduation models are differences in statewide graduation rates.Unsurprisingly, states with higher graduation rates have higher graduation-based Ō25 values.This highlights a source of ambiguity in interpreting our findings with respect to graduation.One interpretation of a high Ō25 value is that it reflects a state's success in pushing initially low-performing students to graduate.But an alternative interpretation is that it reflects low graduation standards (Costrell, 1994).Unfortunately, our data are illsuited to distinguish between these interpretations, though when we get to the district-level analysis below, we show districts' test-based and graduation-based mobility metrics are positively correlated (ρ ≈ 0.3-0.4).This provides some support for the more optimistic interpretation of graduation-based mobility, at least measured at the district level.

Academic mobility for student subgroups within states
In Tables 5, 6, and 7 we report results from versions of Equation ( 2) where we define student subgroups (s) by third-grade racial/ethnic designation, FRL designation, and school urbanicity (urban, suburban, rural).The entry and outcome percentile ranks continue to be normed against the full state distributions.This allows for separate identification of α s and β s , with the tradeoff that we may overstate the academic mobility of higher performing subgroups relative to lower performing subgroups in the presence of uncorrected measurement error (Hanushek  and Rivkin, 2009).Consequently, we place greater emphasis on the IV estimates in this section because of the more comprehensive treatment of measurement error; any failures of the exclusion restriction will tend to inflate the estimates of β, but we are focusing on the differences between subgroups.
Table 5 and Fig. 2 show results by race/ethnicity.We compare mobility for Asian, Black, Hispanic, and White students. 17Focusing on Ō25s -marked by a vertical line at the 25th percentile of the entry distribution in each graph in Fig. 2-we find that initially low-performing Asian students have much higher upward mobility than all other racialethnic groups: the average Ō25s value for the eighth grade test is 39.0 for Asians, 27.1 for Blacks, 29.8 for Hispanics, and 30.7 for Whites.The other panels in Table 5 reveal a similar pattern for the other outcomes.Fig. 2 shows an Asian student advantage in outcomes throughout the distribution of initial ranks via higher baseline mobility (i.e., Asian students have a high value of α s ).For test scores this translates to an outcome-rank advantage throughout the entry-rank distribution; for graduation, outcomes converge at higher entry percentiles for all racialethnic groups because the graduation likelihood approaches 1.0 for students with high entry percentiles.
Using the IV measurement error correction, the racial-ethnic gaps are smaller but substantively similar (Appendix Table B3a).For example, the average values of Ō25s for the eighth grade test are 37.5 for Asians, 26.2 for Blacks, 29.1 for Hispanics, and 28.9 for Whites.The gap between Asian and other students remains large despite the more comprehensive measurement-error correction, indicating much higher upward mobility for Asian students.
The results in Table 5 (and Appendix B3a) exemplify the outsized influence of baseline mobility (α) in driving variation in absolute upward mobility ( Ō25 ) across racial-ethnic groups.The differences in relative mobility (β) are modest in comparison.For instance, consider the gap between the group with the highest absolute upward mobility-Asian students-and the group with the lowest upward mobility-Black students-on the eighth-grade test.Based on either the EIV or IV results, 90-plus percent of the Asian-Black Ō25 gap is accounted for by the gap in α between Asian and Black students, with only a small fraction of the gap remaining to be explained the gap in β (which is multiplied by a factor of 25 to map to Ō25 ).The value of α is mechanically overstated relative to β by focusing at a point in the distribution below the 50th percentile; still, even evaluated at Ō50 , α is the dominant explanatory factor.The primary influence of baseline mobility is a recurring theme throughout our investigation of the variance in absolute upward mobility across student subgroups and school districts.
Our finding of negative Black-White mobility gaps aligns with evidence on the widening of Black-White outcome gaps during K-12 education documented previously (Clotfelter, Ladd, & Vigdor, 2009;  McDonough, 2015; Todd & Wolpin, 2007).Our mixed findings for Hispanic-White differences (across outcomes and the EIV and IV models) contribute to a mixed literature.For example, Clotfelter, Ladd, and  Vigdor (2009) find the Hispanic-White achievement gap narrows during grades 3-8 in North Carolina.Alternatively, Reardon and Galindo  (2009) find the gap is flat from grades 1-5 using a nationally representative sample, and Todd and Wolpin (2007) find it remains flat or widens modestly. 18ext, Table 6 and Appendix Table B4a follow the structure of Table 5 but divide students by FRL status.Compared to FRL students, non-FRL students have greater absolute upward mobility.For test scores in eighth grade and high school, the average Ō25 gaps in by FRL status in Table 6 are 4.5 and 6.7 percentage points, respectively.The average ontime and lagged graduation gaps are 12.5 and 11.0 percentage points.Again, the more comprehensive treatment of measurement error via IV reduces the magnitudes of these gaps to some degree, but they remain substantial (see Appendix Table B4a).
The last subgroup comparison is by school urbanicity in the third grade, shown in Table 7.Here there is less heterogeneity across groups.The most notable differences in Table 7 are for graduation outcomes: graduation rates for initially low-performing students who attend urban schools are significantly lower than graduate rates for their peers who attend suburban and rural schools (who have similar graduation rates to each other).These results are again replicated substantively in the IV models in Appendix Table B5a.

District-Level variation in mobility and Cross-Outcome, Cross-Cohort correlations
In this section we estimate the within-state, cross-district standard deviations of α, β, and Ō25 to explore the extent to which baseline, relative, and absolute upward mobility vary across school districts.For charter schools, we follow the coding conventions of the states to assign district status.In most cases, charter schools (or their networks in instances of multi-site charters) are coded as separate districts, although a small number of charters are intergated into larger districts, in which case they are coded as part of the larger district.Note, however, that charter enrollment shares in our cohorts are small; across states they range from 0 to 7.7 percent, with a median value of 1.8 percent (see Appendix Table A1). 19e raw variances of αd , βd , and Ô25d will overstate the true vari-17 There is also an "other race/ethnicity" category in the data to capture all other students, but it is a small group and omitted from our focal comparisons.18 A more nuanced explanation of Reardon and Galindo's (2009) findings is that their estimates imply a modest shrinking of the gap in math and a modest increase in reading.19 Charter enrollment in the U.S. more than doubled during the timespan over which we track our focal third-grade cohorts (National Center for Education Statistics, 2022), so charter enrollment would be expected to account for a larger share of total enrollment in more recent cohorts.ances due to sampling variance. Weestimate the sampling variance of each parameter by the average of the squared standard errors (Aaronson, Barrow, and Sander, 2007), which we subtact from the total variance to estimate the true variance.20 Error-corrected standard deviations of the mobility parameters are shown in Tables 8 and B6a for the EIV and IV estimates, respectively.The variances across districts for all parameters, all outcomes, and in all states are statistically signficant.Fig. 3 shows the distributions of Ō25d in two example states, Missouri and Washington, for all outcomes.The distributions are consistently unimodal and smooth, ruling out odd patterns of heterogeneity across districts.
On average across states, the EIV results in Table 8 indicate that one standard deviation in the distribution of absolute upward mobility ( Ō25 ) corresponds to a change in student rank on the eighth-grade and highschool tests of about 4.8 percentile points.For on-time and delayed graduation, the analogous average standard deviations are 5.5 and 4.9 percentage points, respectively.The standard deviations of the IV-based estimates are similar.Adding context from Table 4, the estimates in Table 8 indicate that a third-grade entrant at the 25th percentile who attends a district with academic mobility one-standard-deviation above average would be expected to score at the 35.2 percentile in the state distribution of the high school test, compared to the 30.4 percentile at the average district.In terms of on-time graduation, a similar comparison yields a graduation likelihood at the high-mobility district of 81.3 percent, versus 75.8 percent at the average district.
It is also of interest to compare the importance of α d and β d in driving upward mobility.Although separable inference is challenging because α d and β d are negatively correlated within districts, on average; there is ample variation in the data to separately identify the magnitude of variation in both parameters across districts.The comparison is complicated because the importance of α d and β d varies by the initial rank-i.e., at low initial ranks, variation in α d will be a more important driver of upward mobility but as the initial rank increases, β d becomes more important.This dynamic is illustrated in Appendix Figure A2.The key takeaway from the figure is that over most of the distribution of initial ranks, and certainly at lower-valued ranks, variation in α is the Notes: These estimates are from mobility regressions estimated separately for each racial-ethnic student group in each state, as shown in equation ( 2).O25 is equal to α + 25*β.Oregon does not offer a high school test taken in a (near) universal grade, so Oregon is omitted from the HS test results.All β coefficients are statistically significant; standard errors and statistical significance information suppressed for brevity. 20The standard errors of Ô25d are linear combinations of the standard errors of αd and βd .
W. Austin et al.  primary driver of variation in upward academic mobility.This only changes at very high levels of the initial outcome percentile-the crossover point is at approximately the 83rd percentile, where This result is another example of the consistent theme of our analysis that there is more variation in academic-mobility intercepts than slopes, Notes: These estimates are from mobility regressions estimated separately for each FRL student group in each state, as shown in equation ( 2).O25 is equal to α + 25*β.
Oregon does not offer a high school test taken in a (near) universal grade, so Oregon is omitted from the HS test results.All β coefficients are statistically significant; standard errors and statistical significance information suppressed for brevity.

Table 7
Statewide academic mobility estimates by the urbanicity of the school district in the third grade.Notes: These estimates are from mobility regressions estimated separately for each urbanicity student group in each state, as shown in equation ( 2).O25 is equal to α + 25*β.Oregon does not offer a high school test taken in a (near) universal grade, so Oregon is omitted from the HS test results.All β coefficients are statistically significant; standard errors and statistical significance information suppressed for brevity.
W. Austin et al.  in this case across districts.This suggests the potential is greater for districts to improve the outcomes of initially low-performing students through overall improvement rather than by differentially impacting students at different points in the entry distribution.We are mindful in this interpretation that our estimates are not causal, but the ratio of the variances of α d and β d is suggestive of how districts are likely to affect the trajectories of low performers absent reforms to current practice.Finally, Tables 9 and 10 show correlations between district estimates of absolute upward mobility across outcomes and cohorts.We adjust for estimation error in the correlation between any two sets of estimates by Fig. 2. Illustrations of the linearly estimated rank-rank relationships for 8th-grade test score and on-time graduation outcomes by race-ethnicity, corresponding to the results in Table 5.On average across states.Notes: These illustrate the cross-state averages of the linearly-estimated mobility relationships by raceethnicity between the 3 rd grade test rank and (a) the 8 th -grade test rank and (b) on-time high school graduation.The linear-model parameters for each raceethnicity (and state) and outcome are shown in Table 5.
W. Austin et al.  first estimating the ratio of the true variance to the total variance for each set of estimates.Then we multiply the raw correlation by the inverse of the square root of the product of the ratios, following Spearman  (1904).As noted by Kraft (2017), this procedure generates what are best interpreted as upper-bound correlations because it assumes all estimation error is uncorrelated.We also show unadjusted correlations that provide complementary lower bounds.For ease or presentation, we focus on the adjusted correlations in our discussion, and for brevity we show the average values of the correlations across states in the tables.The state-by-state results are reported in Appendix Tables A2 and A3.
Table 9 shows that the mobility metrics are positively correlated across outcomes within districts, on average.The error-adjusted, upperbound correlations within outcome mode are very high-for test outcomes the average correlation across states is 0.87, and for graduation outcomes it is 1.0.The adjusted correlations across outcome modes are positive but lower, ranging from 0.37 to 0.40.
Table 10 shows analogous correlations within districts and outcomes, but across cohorts.The states that contribute to the average in each cell depend on the cohorts included in the state samples (per Table 1).The contiguous-cohort adjusted correlations are between 0.59 and 0.73 on average, and somewhat larger for test-based mobility than graduation-based mobility.The adjusted correlations for cohorts twoand three-years removed are mostly smaller but still consistently positive, and no adjusted correlation across any cohorts for any outcome is below 0.42.

Primary correlates
Next we explore links between academic mobility and the attributes of districts and their local areas.We assemble a database of district and local-area attributes from two sources: (1) our administrative education databases and (2) externally geocoded data from the National Center for Education Statistics (NCES).Using the administrative data, we construct variables for the percentages of students in each district who are (a) Black, (b) Hispanic, (c) FRL enrolled, (d) participants in an individualized education plan (IEP), and (e) geographically mobile.Following CHKS, we also construct a Theil index that captures within-district segregation by race-ethnicty (measured by the segregation of underrepresented minority students, who we define as Black and Hispanic), and a parallel segregation index based on economic status (measured by FRL enrollment) motivated by recent research on economic connectedness (Chetty et al., 2022). 21All these variables are constructed using data from students in our cohorts in the third grade.
An additional district attribute we construct using our administrative data is value added to student test scores in math and ELA in grades 4-8.Our value-added estimates capture district contributions to student test score growth in both subjects.We estimate value added using data from the same time periods during which we follow the cohorts in each state but jackknife the estimates around our cohorts to remove any mechanical correlation between academic mobility and value added.We construct the value-added estimates to be uncorrelated with student characteristics following Parsons, Koedel, and Tan (2019).Finally, we estimate value-added separately for above-and below-median students based on lagged test scores.Appendix D provides estimation details for the value-added models.
We also correlate academic mobility with local-area attributes geocoded to districts' catchment areas based on data from the American Community Survey (ACS), made available by the Education Demographic and Geographic Estimates (EDGE) program of NCES.We include variables that capture local-area median household income and the poverty rate, along with the percent of families with school-aged children where the head of household is identified as (a) Black, (b) Hispanic, (c) a high school graduate, (d) a college graduate, (e) speaking a language other than English at home, (f) residentially stable, and (g) never married.These variables are from the population of parents of school-aged children in districts' catchment areas.Finally, we correlate academic mobility with district per-pupil expenditures taken from the District Finance Survey, also from the NCES. 22ig. 4 shows coefficients from univariate regressions of Ō25d -estimated for each of the four long-term outcomes-on the district and local-area attributes.We report average coefficients across the seven states for presentational convenience.The independent variables are standardized within each state to have a mean of zero and variance Notes: These standard deviations are for the parameters estimated from equation (3) for each district in each state, adjusted for estimation error variance using the average of the squared standard errors of the mobility parameters.Oregon does not offer a high school test taken in a (near) universal grade, so Oregon is omitted from the HS test results.
of one-therefore, the coefficient averages reflect the predicted change in academic mobility associated with a one-standard-deviation move in the distribution of the independent variable, on average across states.23Detailed state-by-state regression output underlying Fig. 4 is available in Appendix Table A4.Results using IV-based estimates of Ō25d as dependent variables in place of the EIV-based estimates are broadly similar and reported in Appendix Table B9a. 24he preceding analysis offers some predictions about our findings.are especially noisy in some cases due to the reduced efficiency of the IV models, and (2) it is only feasible to estimate the district-level mobility regressions using IV for a subsample of larger districts.See the discussion in Appendix B for more information.
challenging working conditions, and inefficient resource use.

Extensions
We conduct three extensions of the analysis of correlates.First, we replicate the correlational analysis but use as dependent variables student-type-specific estimates of absolute upward mobility, estimated separately in each district for students who are Black, Hispanic, and FRLenrolled.This is to assess whether the predictors of higher academic mobility overall also predict higher academic mobility among students in these at-risk groups.A limitation is that the subgroup specific estimates of academic mobility are less precisely estimated-sometimes by a considerable margin depending on the composition of a district.This should not cause bias because Ō25d is the dependent variable, but it does reduce efficiency and lower the precision with which we can identify some relationships.Noting this limitation, we generally find the attributes that predict higher academic mobility overall also predict higher mobility for at-risk students.Figures analogous to Fig. 4 for each student type are in Appendix Figures A3-A5.
Second, we correlate district value-added separately with α d and β d , instead of Ō25d , to assess whether districts with high value added promote greater convergence in student outcomes.Table 11 shows these results.We do not find consistent evidence of a relationship between district value-added and β d (the association is small and inconsistently signed across states and outcomes).In contrast, the associations between district value-added and α d are overwhelmingly positive.This suggests the correlations between value added and Ō25d in Fig. 4 are driven primarily by variation across districts in α d , not β d , further reinforcing our findings on the relative importance of slopes and intercepts in driving variation in absolute upward mobility across districts.
Third, we aggregate our estimates of academic mobility up to the commuting-zone and county levels in order to correlate them to external estimates of intergenerational economic mobility from CHKS (2014) and Chetty and Hendren (2018), respectively.Details for this portion of our analysis-including discussion of a number of challenges associated Fig. 4. Correlates of Ō25d for each outcome, on average across states.Notes: The bars represent cross-state averages of coefficient estimates from univariate regressions of Ō25d for each outcome on a wide range of district and local-area attributes.All independent variables are standardized so that the interpretation of each coefficient is in standard-deviation units.Simple average values of the coefficients across states are reported.The top vertical panel shows correlates constructed using the state administrative education datasets.The bottom vertical panel shows correlates taken from the NCES (either from the EDGE program based on data for parents with school-aged children, or the District Finance Survey, as described in the text).The state-by-state results underlying this figure are reported in Appendix Table A4, which includes information on the statistical significance of individual coefficients in individual states.Notes: See Appendix D for information about our procedure for estimating value added for each district.Oregon does not offer a high school test taken in a (near) universal grade, so Oregon is omitted from the HS test results.
W. Austin et al.  with linking our estimates of intragenerational academic mobility to their estimates of intergenerational economic mobility-are provided in Appendix E. A high-level summary of the results is as follows.There is insufficient cross-commuting-zone variance in academic mobility to account for observed variance in economic mobility at this same level of geography (also see Rothstein, 2019). 26A key factor contributing to this result is that most of the variance in academic mobility across school districts occurs within, and not between, commuting zones. 27Between counties there is more variation in academic mobility because counties typically cover much smaller geographic areas.In Appendix E, we show our estimates of academic mobiltiy are positively correlated with Chetty and Hendren's economic mobility estimates at the county level.While we are hesitant to draw strong conclusions from the correlations, they at least allow for the possibility of a substantive link between academic and economic mobility.

Conclusion
We introduce the concept of "academic mobility" and use it to study the distributional stickiness of student performance during K-12 schooling.On the whole, we find that academic mobility in the education system is limited-students' ranks in the academic performance distribution in the third grade are highly predictive of their ranks in higher grades.However, we also estimate statistically significant and educationally meaningful differences in academic mobility across school districts.Initially low-performing students who attend districts one standard deviation higher in the academic mobility distribution perform about 5 percentile points higher on tests in the eighth grade and high school relative to their peers who attend districts with average mobility.They are also 5-6 percentage points more likely to graduate from high school.
Our analysis of academic mobility across student groups divided by race-ethnicity, eligibility for free and reduced-price lunch, and district urbanicity produces patterns that are largely as expected based on existing research.Still, some results stand out.One is the large and consistent upward mobility advantage among Asian students relative to all other racial/ethnic groups.Another is that initially low-performing students in rural districts have broadly similar upward mobility to their suburban peers, which is at odds with the prevailing theme of the "rural schools problem" in education research (Burton et al., 2013).
When we decompose total academic mobility into its components and examine cross-district heterogeneity, we find cross-district differences in baseline mobility are the primary driver of cross-district variance in total academic mobility.This suggests low-performing students experience the largest performance gains when attending districts where students generally excel.It also casts doubt on the narrative that districts vary substantially in the degree to which they narrow within-district achievement gaps as students progress through school.
We correlate absolute upward mobility with a wide array of district and local-area characteristics.We find that absolute upward mobility is largest in socioeconomically advantaged areas as measured along a variety of dimensions.We also show districts with high value added to student test scores have significantly higher upward mobility (as measured by test and non-test outcomes).
Finally, we briefly consider the potential for differences in academic mobility to explain geographic variation in economic mobility across commuting zones and counties.Variation in academic mobility cannot explain a meaningful fraction of the variance in economic mobility across commuting zones documented by CHKS (2014), corroborating related findings from Rothstein (2019).There is much more variation in academic mobility at the county level, and we find that county-level estimates of academic and economic mobility are positively correlated. 28t bears repeating that our academic mobility metrics do not carry a causal interpretation.We do not know if our estimates reflect the true impacts of the local areas we define by school districts, or something else like the selection of families (Bruhn, 2020; Chyn and Katz, 2021).Moreover, if we overcome this hurdle and can recover causal estimates of these areas-an objective we intend to pursue in future research-it will still be difficult to assess what it is about them that drives the findings (inclusive of factors inside and outside of schools).These are problems endemic to the burgeoning field of place-based research (Chetty et al., 2020; Harding et al., 2021; Kaestner, 2020).Noting this important caveat, our findings illuminate broad patterns in academic mobility and suggest directions for future research that will create a body of evidence to guide the development of policies that support academic mobility.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 1 .
Fig. 1.Binned scatter plots with percentile ranks on the 3rd grade test on the horizonal axis (averaged across math and ELA), and either test-outcome percentiles or graduation rates on the vertical axis, in Georgia.Notes: This figure shows binned scatterplots of the raw (binned) entry and outcome ranks in Georgia.Appendix A shows similar scatterplots for all other states and all outcomes.

Fig. 3 .
Fig. 3. Distributions of estimated Ō25d in two states, Missouri and Washington, for all outcomes.These distributions are visual complements to the results in Table8.Notes: Distributions of estimated O25 values.The distributions are reported as estimated directly without any adjustments for sampling variance.Each graph reports the total variance and signal variance, where the latter is the total variance minus the sampling variance, to indicate the portion of the variation driven by sampling error.

Table 1
Definition of the analytic sample and descriptive statistics at panel entry for each state.

Table Notes :
"Cohort Years" refers to the years of panel entry for the cohorts included in the analytic sample; i.e., the years in which the students were in the third grade.The spring year is used to indicate the academic year (e.g., 2009 = 2008-09 school year).Students who took both the Math and ELA third-grade state tests are included in the core sample.For Washington and Massachusetts, in earlier years of data, enrollment surveys were not conducted frequently, which likely contributes to the low reported mobility rates in those two states.In more recent data, the mobility rates in Massachusetts and Washington are around 5 and 8-9 percent, respectively.Note that the numbers of schools and districts indicate the numbers of unique schools and districts included in the analysis in each state.Data for the "Entire U.S." are reported in the bottom row of the table for context and taken from the 2008 common core of data and are for students in public K-12 elementary and secondary grades.Note that we do not report a mobility percentage because a comparable variable is not available in the common core of data.

Table 2
High school exams by state.

Table 3
Documentation of sample attrition in each state and for each late-grade outcome.
Errors in Variables Regression, IV = Instrumental Variables Regression, Uncorrected = Uncorrected Linear Regression.The top row of Panel B repeats the average values reported in Panel A using EIV.In these statewide regressions corresponding to equation (1), α and β are not separately identified.O25 is equal to α + 25*β.Oregon does not offer a high school test taken in a (near) universal grade, so Oregon is omitted from the HS test results.All β coefficients are statistically significant; standard errors and statistical significance information suppressed for brevity.
W.Austin et al.

Table 5
Statewide academic mobility estimates by race/ethnicity.

Table 6
Statewide academic mobility estimates by FRL status.

Table 8
Estimates of the of the within-state, cross-district standard deviations of Ō25d , α d , and β d .

Table 11
Correlations of district-level value added to student achievement with α d and β d .