Virtual Learning in Kindergarten Through Grade 12 During the COVID-19 Pandemic and Chronic Absenteeism

Key Points Question What is the association between the use of virtual learning in kindergarten through grade 12 education during the 2020-2021 school year and chronic absenteeism? Findings In this cross-sectional study, data from 11 017 school districts from the 2018-2019 and 2021-2022 school years within a difference-in-difference framework show that districts with more virtual school days in 2020-2021 had higher rates of chronic absenteeism during the 2021-2022 school year. These higher rates are associated with results in districts with high poverty levels. Meaning Key future questions include understanding whether this result is causal and why lower district income was associated with worse outcome.

boundaries and turned into real 2022 values using the Current Price Index for all items, total for the United States. 3nally, we merge information about learning modes during the 2020/21 SY from the COVID-19 School Data Hub. 4 The authors of the web page used data from state agencies to classify the percent of school days at the LEA level that were in-person, hybrid, or virtual during the 2020/21 SY.The web page does not have data from Iowa, Montana, and Oklahoma.
In all, we have four data sets (students counts, demographics within the district, chronic absenteeism rates, and learning modes during COVID) from three different sources (CCD, Census, and Data Hub) and as a result, merging data across these three sources will generate some non-matches.We lose data from the three states not covered from the COVID Data Hub.
In general, the mapping of ACS respondents in general does not include charter schools.There are 4,293 LEAs in the CCD that are listed as charter schools and only 30 are in the ACS data.
LEAs can combine over time, some LEA's close, and others are started up; therefore, some districts cannot be merged over time.The CCD reports 32 separate LEAs for different components of New York City Public Schools that we aggregate into one district.We also delete 13 LEAs that have greater than 100% chronic absenteeism rate during the 2021/22 SY.
In the CCD, there are 18,609 LEAs in the 2018/19 CCD.Merging data from all sources using the NCES LEA ID we produce an analysis sample of 11,466 LEAs for which we have twoyears' worth of data each.We refer to this dataset as our balanced panel of districts.This In the analysis section, we add measures of vaccination rates and COVID case rates at the county level to the basic model.We download population vaccination rate data at the county level as of the end of December, 2021 5 and calculated the fraction of residents that had at least one vaccination by that date.We also calculate average weekly COVID per-person infection rates at the county level from August 1, 2021 through May 31, 2022. 6These two datasets are merged based on the county where the district is located which is identified from an NCES data file. 7If the district spans counties, a simple average is taken across counties.(1) where the key outcome of interest (Yit) is the percent of students that are chronically absent in district i in school year t (t=2018/19 or t=2021/22).We control for a set of time-varying characteristics of residents that live within district boundaries (Xit).Given that we have multiple observations per district, we can add a set of mutually exclusive district dummy variables (fixed effects, μ1i) that capture permanent characteristics about districts that do not vary over time such as the urbanicity of the district, the relative size of the district, etc.As much of the differences in outcomes like chronic absenteeism are between district and not within district over time, the set of district fixed effects capture a large fraction of the variation in the outcome of interest.We also control for a year fixed effect (λ1t) that captures factors common to all districts but that vary over time.The final term ε1it is a random error term.In all models, we weight observations by total district school enrollment.
The key covariates are the percent of days which students spent in the previous academic year in either hybrid or virtual instruction, measured by Hybridit-1 and Virtualit-1, respectively.
As hybrid and virtual instruction were virtually non-existent in K-12 education prior to COVID, the values for Hybridit-1 and Virtualit-1 where t-1 is 2017/18 is set to zero.The fixed-effects model is often called a within-group estimator as it attributes to the key coefficients θ1 and δ1, the within-panel co-movements in Y and alternate instruction.In this case, districts that moved to hybrid and virtual instruction are "treated" with differing intensities of alternative instruction.
Districts that had no hybrid and virtual instruction are a comparison sample.Seeing that they had no change in hybrid and virtual instruction over time, the time-series movements in chronic absenteeism represent the secular trend in this outcome that are common to all districts.In this way, our results can be thought of as a difference-in-difference estimate within this two-way fixed-effect framework.
The model above has some limitations.First, because the dependent variable is bounded on the 0-100 interval, the model may predict outside this interval.It is also the case that given the structured nature of the dependent variable, the errors are, by construction, heteroskedastic.
It has been noted that these difference-in-difference models have high Type I error rates, primarily because of within-group autocorrelation in errors across observations. 8The last two problems can be handled easily by using the "clustered" standard error suggested by Liang and Zeger 9 which is a generalization of the procedure of White. 10 This method allows for withinpanel correlation in errors and also controls for arbitrary forms of heteroskedasticity while reducing the Type I error rate outlined above.
An alternative to our linear model is to explicitly model the restricted nature of the dependent variable.A standard model would be to estimate a log-odds regression.Define pit to be the fraction of students that are chronically absent in district i in year t with this variable being on the unit interval 0 to 1.The log-odds version of equation (1) would then be where the terms are define similarly to above.The predicted value from this regression would then map back into a predicted pit that would fall on the unit interval.A shortcoming of this model is that the dependent variable is not defined when pit equals 1 or 0. In our case, pit equals 0 in a dozen cases, therefore, we would either lose those observations or use an ad hoc procedure so as not to not lose that data.In the next section, we summarize estimates for equation (2) © 2024 Evans WN et al.JAMA Network Open.
where we set pit=0.0001 for the observations where the true value is zero to generate comparable estimates.
A competitor model to (1) is to estimate the equation assuming that the district effects (μ1i) are random effects which is a more efficient model than fixed effects.We did not use this model because the random effects model assumes that that the permanent differences across districts are random draws and hence uncorrelated with underlying control variables that would be included in Xit.Under the null hypothesis that the effects are indeed random, fixed effects should provide statistically similar but less efficient estimates.If the random effects are correlated with the underlying X's the random effects will be inconsistent.We can test this hypothesis with a Hausman test. 11We summarize those results in the next section and identify why we use the fixed-effects model.
In Table 3, we estimate models that allow the coefficient on virtual and hybrid education to vary based on district characteristics in the 2018/19 school year.In these models, we take a district characteristic like poverty rates in the 2018/19 SY and place districts into quintiles.We then estimate separate models for each quintile.

eMethods 3. Supplementary Results
The COVID-19 pandemic disrupted many aspects of daily life and health, and economic consequences of the pandemic were not uniform in the population. 12,13,14The coefficient on virtual days could potentially be capturing the fact that there was a fundamental change in the impact of the control variables on chronic absenteeism brought about by the events of COVID rather than through the method of schooling.To test this, we allow the coefficients to vary over time.We do this by adding an interaction between the variables in our X vector with a dummy variable for the 2021/22 SY.We first add one interaction for each of the 10 covariates one at a time, then add all 10 interactions at once.These results are reported in eTable 1 below.
In the first row of the table, we reproduce the coefficients for % days hybrid and % days virtual from the basic model found in Table 2 in the main body of the paper.In the next 10 rows, we add an interaction for one variable and the year 21/22 dummy.In the last row, we report the model when all interactions are added.When interactions are added one at a time, the coefficients changes some but the results are always statistically significant.The largest change occurs when we add the interaction with percent Hispanic and the year dummy.We do not believe our results are being driven by a fundamental change in the relationship between the fraction Hispanic students and chronic absenteeism over time, as we run a separate model that deletes any district with greater than or equal to 5% Hispanic in both years of the panel.This leaves 4003 districts and 8,006 observations in a regression.In this much smaller sample, the coefficient (95% CI) on % days in virtual instruction is 0.096 (0.061, 0.131).If we reduce this to districts with <1% Hispanics in both years, we have 1,048 districts and the coefficient (95% CI) on % days in virtual instruction is now 0.117 (0.043, 0.192).
As we noted above, an alternative model is to explicitly model the limited nature of the dependent variable.One way to do this is through a log-odds regression where the dependent variable is the natural log of the odds ratio of the fraction chronically absent (pit) as in equation (2).In our sample, we do have a dozen observations where pit =0 and the dependent variable is undefined.To keep these observations in the regression, we redefine pit =0.0001 for those observations and re-estimate the model.In this case, we obtain a coefficient (θ2) on virtual days of 0.0023 (95% CI of 0.0012, 0.0013).It can be shown that the gradient dpit/d(virtual days)= θ2pit(1-pit).Letting pit be the mean of the sample in 2021/22 SY (0.294) we obtain dpit/d(virtual days) = 0.00047 which means that schools with 100% virtual instruction in the 2020/21 SY experienced an increase in chronic absenteeism of 0.047 or 4.7 percentage points, which is similar to the results in Table 2.
In Table 3 in the main body of the paper, we report the heterogeneity of results based on an indicator of students at-risk for chronic absenteeism: poverty.In this section, we repeat the results from Table 3 with three additional measures: the fraction of adults aged 25 and above with a college degree, the fraction of households with single parents, and the real median household income for households with their own children under 18.These results are reported in eTables 2, 3, and 4 respectively, with the structure of the table mirroring that of Table 3.
eTable 2 contains heterogeneity in the results using the percent of parents with a college degree.The basic patterns are similar to Table 3 in that in rows (1) through (3), we see that virtual instruction was used more frequently in the districts with the lowest percent of collegeeducated adults, and in row (4), we see that this same group has the highest chronic absenteeism rates prior to COVID.In row (5), in the top quintile group, there is a small and statistically imprecise impact of virtual days on absenteeism.In the four lowest quintiles, all effects are large and statistically significant, and the results are increasing in size as the fraction of adults with college degrees declines.The results in the lowest quintile suggest that 100% of days in virtual learning during the 2020/21 SY would have produced a rise in chronic absenteeism rate of 14.6 percentage points (95% CI of 11.1 to 18.1).
The results when heterogeneity is based on percent single parents (eTable 3) and median household income (eTable 4) produce similar patterns.In rows (1) through (3) of the tables, we see that virtual instruction is used most in the highest quintile of single parents and the lowest quintile of median household.These are the same groups that have the highest pre-COVID chronic absenteeism rates (row (4) of the tables).In row (5) of the tables, in the lowest quintile of single parents and the highest quintile of median family income, the coefficient on days in virtual instruction is statistically insignificant.However, in the highest quintile of single parents, students that spent 100% of days in virtual instruction had an 8.4 percentage point higher chronic absenteeism (95% CI of 4.6 to 12.3).Likewise, in the lowest quintile of median family income, those that spent 100% of days in virtual income experienced a 9-percentage point increase in chronic absenteeism (95% CI of 5.1 to 12.9).
The results in Table 3 and eTables 2 through 4 demonstrate definitively that the use of virtual instruction in the 2020/21 SY was related to underlying characteristics of the school district.In row (3) of these four tables, we see that virtual instruction was decidedly higher in districts with high rates of poverty and single parenthood and lower levels of income and college educated adults.In row (4) of those tables, we also see that the same districts that had these characteristics also had much higher levels of underlying chronic absenteeism prior to COVID.
This suggests that the primary assumption of the more efficient random effects model may be in error.
dataset represents roughly 91% of all K-12 students in the United States in the 2018/19 SY, as well as 93% of students in the 47 states and the District of Columbia in the COVID School Data Hub.© 2024 Evans WN et al.JAMA Network Open.

eMethods 2 .
Detailed Description of the Statistical ModelGiven the panel nature of the data, we estimate a model of the form 9o verify this point, we estimate a version of equation (1) under random and fixed effects, © 2024 Evans WN et al.JAMA Network Open.not using district weights and not clustering the standard errors at the district level.The coefficient (95% confidence interval) [standard error] on the coefficient for virtual days in the random effects model is 0.098 (0.091, 0.105) [0.0035] while the estimate from the fixed effects model is 0.077 (0.069, 0.084) [0.0039].As expected, the random effects model is more efficient but a Hausman test easily rejects that null hypothesis of equality of the two coefficients (p-value < 0.001) which suggests the assumptions of the random effects model is incorrect.It is also the case that the standard error in this fixed-effects model is incredibly small which is consistent with the notion that fixed-effects models have high Type I error rates 8 so the clustering procedure of Liang and Zeger is necessary.9