Predicting virologically confirmed influenza using school absences in Allegheny County, Pennsylvania, USA during the 2007‐2015 influenza seasons

Abstract Background Children are important in community‐level influenza transmission. School‐based monitoring may inform influenza surveillance. Methods We used reported weekly confirmed influenza in Allegheny County during the 2007 and 2010‐2015 influenza seasons using Pennsylvania's Allegheny County Health Department all‐age influenza cases from health facilities, and all‐cause and influenza‐like illness (ILI)‐specific absences from nine county school districts. Negative binomial regression predicted influenza cases using all‐cause and illness‐specific absence rates, calendar week, average weekly temperature, and relative humidity, using four cross‐validations. Results School districts reported 2 184 220 all‐cause absences (2010‐2015). Three one‐season studies reported 19 577 all‐cause and 3012 ILI‐related absences (2007, 2012, 2015). Over seven seasons, 11 946 confirmed influenza cases were reported. Absences improved seasonal model fits and predictions. Multivariate models using elementary school absences outperformed middle and high school models (relative mean absolute error (relMAE) = 0.94, 0.98, 0.99). K‐5 grade‐specific absence models had lowest mean absolute errors (MAE) in cross‐validations. ILI‐specific absences performed marginally better than all‐cause absences in two years, adjusting for other covariates, but markedly worse one year. Conclusions Our findings suggest seasonal models including K‐5th grade absences predict all‐age‐confirmed influenza and may serve as a useful surveillance tool.


| INTRODUC TI ON
Influenza surveillance utilizes multiple data sources, including syndromic indicators, laboratory-confirmed cases, and deaths. 1 Nonclinical sources have the potential to complement clinical and laboratory data and improve influenza prediction efforts. 2 Student absenteeism is a real-time school-based indicator for influenza surveillance tool. It is advantageous for being widely available in real time, having minimal reporting delays, and being relatively low cost and is a reasonable proxy for influenza infections since school-age children (5-to 17-year-olds) experience higher infections compared to other age groups 3 and contribute to household and community-level transmission. 4 Previous studies used school-based surveillance (ie, absence duration 5 or causes) 6,7 to identify patterns correlated with influenza-or ILI-related cases, primarily at a city-level, but usefulness of school absenteeism as a surveillance indicator in these studies has been mixed. Using ILI-specific absence duration predicted 2005-2008 outbreaks well in Japan with high sensitivity and specificity, 5 but a similar approach using city-level all-cause absences from 2005 to 2009 had low predictive ability when predicting outbreaks in New York City. 6 Absence patterns correlated well with sentinel surveillance in Hong Kong showing similar peaks in absenteeism and ILI consultation and influenza detection rates, but ILI-specific absences had low specificity. 8 The varied conclusions of these studies could be from differing school-and absence-type captured, and short surveillance periods, but other types of absence data could have utility.
Grade-specific differences in absences have not been explored as a predictor of influenza but may correlate better to high-risk infections-groups. Given the variation of infection burden and proportion of illness-related absences by age, particular individual school levels and grades may serve as a proxy for these high-risk infection groups. School absenteeism may also be useful for detecting underlying viral changes in transmission. Unusual patterns of school absences arising across different periods of time have also been correlated to detecting changes in influenza A and B viruses 9 and have been attributed to detecting the re-emergence of an influenza B/ Victoria antigenic group. 10 The varied study findings of school absenteeism suggest further assessment is needed.
Here, we evaluated how school absences models predicted weekly confirmed influenza cases in Allegheny County, Pennsylvania over multiple influenza seasons. We compared predictions from all-cause absence models for the 2010-2015 influenza seasons at varying administrative levels. We also compared predictions for individual influenza seasons (2007-2008, 2012-2013, and 2015-2016) from models including all-cause and ILI-related absences from three school-based cohort studies.

| Ethics
Our analyses used only de-identified data. We obtained Institutional Research Board approval from University of Pittsburgh (PRO13100580), Johns Hopkins Bloomberg School of Public Health (IRB #5474), Centers for Disease Control and Prevention (IRB#00000319), and the Allegheny County Department of Health. Greater Pittsburgh area daily minimum and maximum temperature and relative humidity data came from the National Oceanographic and Atmospheric Administration's National Climatic Data Center. 12 We used temperature and relative humidity (a proxy for absolute humidity) given their effects on influenza transmission (ie, viral dispersal and survival). 13 All-cause absences were defined as a full or partial school day missed for any reason. Cause-specific absences were a full or partial school day missed due to influenza-like illness (ie, fever (>37C) and either cough, sore throat, runny nose, or congestion).

| Data
We restricted school absences to periods overlapping the influenza seasons to examine absence patterns during influenza circulation, and excluded weekends, observed federal holidays, and school breaks. Weekly school absences were the total absences reported in one school week (ie, if no observed holidays, five days in a school week). Weekly absence rates were total absences in a QUANDELACY Et AL.

|
week, divided by the total students enrolled times the number of school days in a given week.
We used the daily minimum and maximum temperatures to obtain the average daily temperature. Average daily temperature and relative humidity were each aggregated to the week level to obtain the weekly average temperature and relative humidity for each influenza season.

| Statistical analysis
We predicted weekly influenza cases over seven influenza seasons using negative binomial regression models. Continuous predictor variables were weekly absence rates (lagged by one-week), calendar week, average weekly temperature, and relative humidity. Models Sensitivity analyses examined absence duration, and lagged influenza, and kindergarten-specific absences. We used one-day and two-day or longer absences to assess the impact of absence duration on weekly influenza predictions from 2010 to 2015.
Models used one-day absences, and absences two days or longer individually, together, and in models containing average temperature, relative humidity, and calendar week. We also assessed weekly influenza predictions from models including one-weeklagged influenza cases, and county-level and kindergarten-specific all-cause absences.
We compared nested and non-nested models using Akaike's Information Criterion corrected for small sample sizes (AICc).

| Model validation and predictions
We validated our models using training (ie, in-sample) and testing (ie, out-of-sample) data generated from the following four (independent) sampling approaches: (i) randomly sampled 80% of weeks without replacement; (ii) leave out 52 non-contiguous randomly sampled weeks; (iii) leave out 20% of randomly sampled schools, and (iv) leave one influenza season out (ie, model training used all but one season and the out-of-sample season was used for model testing) to account for influenzas' seasonal variation. Estimated R 2 used linear regressions of out-of-sample observed influenza cases (outcome) and predicted cases (independent variable). Prediction metrics used mean absolute error (MAE) and relative mean absolute error (relMAE). Mean absolute error was defined as the mean of the absolute value of model prediction errors. 18 Relative MAE is the ratio comparing a model's MAE to a reference MAE (ie, from a model including calendar week, and average weekly temperature, and relative humidity), where relMAE of 1.0 indicated the same prediction error for two models. We visually compared observed and predicted cumulative distributions and time-series of influenza cases.

| Influenza predictions using countylevel absences
We evaluated negative binomial models of seasonal variables (ie, calendar week, average weekly temperature, and relative humidity) alone, and including weekly all-cause county-level school absences QUANDELACY Et AL.

| 759
at one-, two-, and three-week lags. One-and third-week-lagged absences had similar model performance (Supplemental Table 2); therefore, we used one-week-lagged absences in all models to better reflect influenza's infectious period (ie, 1-week spread). 25 Compared to seasonal models, AICs of in-sample models including calendar week, average weekly temperature, average weekly relative humidity, and one-week-lagged weekly county-level all-cause absences either stayed the same or slightly worsened (∆AICc = 2, 1, and 0, Table 1), whereas models of calendar week, average weekly temperature, and one-week-lagged weekly absences had slightly improved fits (∆AICc = −4, −4, −4, Table 1). For prediction performance, MAEs either stayed the same or decreased when including one-week-lagged weekly absences in models of calendar week, average weekly temperature, and relative humidity relative to seasonalonly models (relMAE = 0.95, 1.0, & 0.95, Table 1).
For individual influenza seasons, weekly lagged country-level absence multivariate models predicted low-severity seasons 26 (ie, 2010-2011, 2011-2012) poorly, but predicted more moderately F I G U R E 1 Weekly reported virologically confirmed influenza cases, and all-cause and influenza-like-illness (ILI) specific absences in Allegheny County, Pennsylvania, USA, during influenza seasons from 2007 to 2015. Surveillance of influenza cases during each influenza season in Allegheny County occurred from the 40th week of one year to the 20th week of the following year (solid black lines). All-cause absences were collected for the entire school year for each school district, and data were restricted to their respective influenza seasons. Nine school districts within Allegheny contributed to weekly counts of all-cause absences. Additionally, all-cause and ILI-specific absences were collected during independent influenza seasons for three separate studies  Table 5). Given the consistently low MAEs of the model including calendar week, average weekly temperature, average weekly relative humidity, and school absence, we present results from this model.

| Influenza predictions using school-type and grade-specific absences
We compared the performance of different school types (elementary, middle, and high school) and grade-specific absences in seasonal models. Elementary school models had lower relMAEs compared to middle and high school models across validations (Supplemental Table 6). Given varied model performance across school types, we also considered one-week-lagged grade-specific all-cause absences in seasonal models to assess heterogeneity in predictions by grades.
Univariate analyses found K, 1, 2, 3,4th, and 5th grade absence models had lower MAEs than (individual) middle school and high school grade-specific absence models, particularly in leave 20% of schools' out validation ( Figure 3A). Multivariate grade-specific absence models also had lower MAEs relative to seasonal models across three cross-validations ( Figure 3B). We observed consistently lower relMAEs for kindergarten-specific absences (relMAE: 0.91, 0.98. 0.92 in three validations). Overall, middle and high school grade-specific absence models did not decrease MAEs relative to seasonal models, although 8,9, and 10th grade models in leave 20% of weeks out and 6th grade models in leave 20% schools out had lower MAEs.
We investigated whether absenteeism can be used to create more accurate predictions of virologically confirmed influenza only in school-aged children, and we built models of virologically confirmed influenza only in those 5-17 years old, rather than of all ages. We found modest improvements in two of three validations when including absences compared to not using absences in models that incorporated week of year, relative humidity, and temperature.
Predictions were more accurate when predicting virologically confirmed influenza in children than when predicting all ages.

| Influenza predictions comparing allcause and influenza-like illness-specific absences from cohort data
Using school-based cohort studies, we compared the performance of all-cause absences to ILI-specific absences, a better proxy for influenza infection. Because the cohorts had short time-series (ie, one influenza season), we were unable to examine models containing all seasonal variables and to include average temperature in some models. Multivariate ILI absence models had higher R 2 estimates and lower relMAEs than all-cause absence models in analyses using PIPP, 2012-2013 SMART, and pooled absence data ( Table 2). From the 2015 to 2016 SMART 2 data, the allcause absence model had a lower relMAE (relMAE: 0.59) than the TA B L E 1 Fit and Performance of negative binomial models of seasonal variables including and excluding one-week-lagged countylevel all-cause school absence rates to predict weekly confirmed influenza cases in Allegheny County, Pennsylvania during the 2010-2015 seasons

| Sensitivity analyses
In sensitivity analyses, we found using absence duration did not improve model predictions. One-day absence models and those including both one-day and absences two days or more had lower relMAEs compared to models containing absences two days or more (Supplemental Table 9), but predictions from the three models did not substantially vary (Supplemental Figure 2). Evaluation of models including one-week-lagged influenza cases found little improvement of model prediction and performance when compared to seasonal models. Higher MAEs were observed for one-week-lagged influenza models, and one-week-lagged influenza and absence models but had similar R 2 estimates (Supplemental Table 8). One exception was the one-week-lagged influenza model from the leave 20% schools' out validation, which had a lower relMAE (relMAE: 0.97) (Supplemental Table 8). One-week-lagged influenza and one-week-lagged influenza and kindergarten absence models performed similarly to one- week-lagged influenza models in three cross-validations, except in the leave 52 weeks out validation (relMAE: 0.97).

F I G U R E 2 Four model predictions of confirmed influenza in Allegheny
County using leave-one-season-out validations for the 2010 to 2014 influenza seasons. Model predictions of four negative binomial models (calendar week, average weekly temperature, average weekly relative humidity (red), one-week-lagged county-level all-cause absences, temperature, and week model (yellow), one-week-lagged countylevel all-cause absences, relative humidity, and week model (blue), and one-week-lagged county-level all-cause absences, temperature, relative humidity, and week model (purple)) using leave-one-season-out validation approaches showing model predictions compared to observed virologically confirmed influenza cases (black line) in Allegheny County, Pennsylvania, USA, (A) weekly counts during each of the 2010-2011 to 2014-2015 influenza seasons, (B) the change in predicted cases using modeling including absences compared to a seasonal model excluding absences (red), and (C) the cumulative proportions of predicted and observed influenza cases for each season. R 2 was obtained using a linear regression, where the observed cases from the left-out season are the dependent variable and the independent variable was predicted cases from a negative binomial model of week-lagged county-level all-cause absences, relative humidity, temperature, and calendar week QUANDELACY Et AL.

| D ISCUSS I ON
We found including school absences in seasonal models improved community-level confirmed influenza predictions over multiple seasons within Allegheny County. All-school absence models subtly improved predictions, reducing MAE by 5% across multiple validations, but school-and grade-specific absence models had better predictions, reflecting underlying age-specific differences in infections.

F I G U R E 3
Mean and relative absolute errors for predictions using grade-specific absence models to predict influenza in Allegheny County Pennsylvania from 2010 to 2014 influenza seasons. Mean absolute errors were estimated from univariate grade-specific weekly absences models (A), and the relative mean absolute error compared models of grade-specific weekly absences, week of the year, average weekly relative humidity, average weekly temperature to models of calendar week, average weekly relative humidity, and average weekly temperature (B). Colors reflect the three different school types: red is elementary school, green is middle school, and blue is high school. Solid black line refers to a relMAE of 1, where mean absolute errors of the gradespecific absence models and models excluding absences are the same TA B L E 2 All-cause and cause-specific absences model performance using three school-based cohorts' data to predict confirmed influenza cases in Allegheny   The week-only model included only week of the year and absence models included weekly lagged absence rates from the previous week, week of the year, and average temperature. b SMART models included weekly lagged absence rates and week of the year.
c Cross-validation used leave 20% of schools out. and are a proxy to the younger age groups that experience higher infections and increased susceptibility. 5,25,27 In contrast, middle and high schools' absences were noisier prediction signals, possible because older students had more non-influenza-related absences (consistent with the overall higher absenteeism rates observed in these schools over time). Lower relMAEs from lower individual grade (K-5th grades) absence models from multiple validations further support our findings. Hence, elementary school absences could be useful for influenza surveillance.
ILI-specific absences predicted influenza better than all-cause absences when evaluating predictions from weekly all-cause and ILIspecific absence models (using school-based cohort studies), based on lower MAEs and higher R 2 for specific seasons and when pooled.
Other studies also found ILI-specific absences were a proxy for influenza when evaluating vaccine impacts, 28 suggesting ILI-specific absences likely capture actual influenza infections. We could not conduct cause-specific absence surveillance for more than one influenza season for each study nor could we perform school-type and grade-specific comparisons of all-cause and ILI-specific absences due to small time-period, but these may also be important predictors of influenza incidence.
Our study has some limitations. We did not evaluate our pre- for providing their data; and the schools, staff, students, and parents who participated in PIPP, SMART, and SMART 2 . We also thank Rahaan Gangat of the National Weather Service Pittsburgh for his assistance with climate data. We thank Scott Zeger for helpful discussions regarding analyses.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no conflicts.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available from the corresponding author upon reasonable request.

S U PP O RTI N G I N FO R M ATI O N
Additional supporting information may be found online in the Supporting Information section.