Accurate regional influenza epidemics tracking using Internet search data

Accurate, high-resolution tracking of influenza epidemics at the regional level helps public health agencies make informed and proactive decisions, especially in the face of outbreaks. Internet users’ online searches offer great potential for the regional tracking of influenza. However, due to the complex data structure and reduced quality of Internet data at the regional level, few established methods provide satisfactory performance. In this article, we propose a novel method named ARGO2 (2-step Augmented Regression with GOogle data) that efficiently combines publicly available Google search data at different resolutions (national and regional) with traditional influenza surveillance data from the Centers for Disease Control and Prevention (CDC) for accurate, real-time regional tracking of influenza. ARGO2 gives very competitive performance across all US regions compared with available Internet-data-based regional influenza tracking methods, and it has achieved 30% error reduction over the best alternative method that we numerically tested for the period of March 2009 to March 2018. ARGO2 is reliable and robust, with the flexibility to incorporate additional information from other sources and resolutions, making it a powerful tool for regional influenza tracking, and potentially for tracking other social, economic, or public health events at the regional or local level.

S-4 S14-S23 plot the coverage of 95% Confidence Interval constructed by ARGO2 in comparison to the actual CDC's reported %ILI (the prediction target).

Search query terms used in ARGO2
For the estimation before May 22, 2010, we use 70 query terms (listed in Supplementary

Results from CDC's Epidemic Prediction Initiative
We show the results of the participants in the 2015-2016 CDC Epidemic Prediction Initiative (FluSight) for nowcasting CDC's (weighted) %ILI in the ten US HHS regions in Supplementary Table S13. The data are publicly available at https://github.com/cdcepi/FluSight-forecasts, under license Creative Commons Attribution 4.0. The true %ILI, i.e., the estimation target, is the finalized %ILI on the CDC's report (revealed weeks after the estimation period). In the table, we report the relative MSE of the participants' estimation to the naive method, i.e., the ratio between the MSE of each method and the MSE of the naive method. We report the overall relative errors by averaging over the ten regions as well as the relative errors for each region. The S-5 naive estimate uses the previous week's %ILI on CDC's latest unrevised flu report available at the week of estimation (the CDC's report is subject to later revision) as the estimate for the current week. The methods submitted to the challenge include: 4Sight; ARETE; two methods from Columbia University (CU1, CU2, http://cpid.iri.columbia.edu); Delphi-Archefilter, Delphi-Epicast, Delphi-Stat(http://delphi.midas.cs.cmu.edu); a method from Iowa State University (ISU); JL; a method from Knowledge Based Systems, Inc. (KBSI1); Kernel of Truth (KOT, no regional estimates, http://reichlab.io); a method from MOBS-Lab (NEU, http://www.mobs-lab.org); a method from Predictive Science, Inc. (PSI, http://www.predsci.com/portal/home.php); a method from HumNat Lab (UMN, http://www.tc.umn.edu/~matteoc/Welcome.html). We also compared the results with ARGO2 and VAR. Notably, ARGO2 is the only method that uniformly outperforms the naive method across all ten regions.

Relative Efficiency
Supplementary Table S14 reports the Relative Efficiency of ARGO2 to other benchmark methods with the 95% confidence intervals. ARGO2 significantly outperforms all benchmark models considered in this study, as the 95% confidence intervals are all strictly above 1.
The Relative Efficiency is based on the average MSE of ten US HHS regions from method (a) to method (b). It is estimated as ^(̃( ) ,̃( ) ) = ∑ MSE (̃( ) , ). The terms ̃( ) and ̃( ) are %ILI estimators for region by the two methods respectively, and the MSE of estimator ̃ to the target is given by MSE(̃, ) = 1 ∑ ( =1̃, − , ) 2 . The 95% confidence interval (CI) is obtained by vector stationary bootstrap on the index with mean block size 5 (equivalent to 1 month of data) 1 . We first obtain the basic bootstrap CI for logarithm of Relative Efficiency and then recover the original scale by exponentiation. The nonparametric vector stationary bootstrap controls for cross-region spatial correlation and for cross-time autocorrelation of the error residuals. The bootstrap procedure is robust to mean block size.
Motivation of the ARGO2 model ARGO2 can be thought of as motivated by a hidden Markov structure on the three predictors: (i) changes in the %ILI , evolving according to a time series (e.g., autoregression), (ii) regional estimation ^ based on the regional Google search data, and (iii) national estimation ^ based on national Google search data. (ii) and (iii) are separately produced to estimate from data at two different resolutions. The following diagram illustrates their relationship.

S-7
By modeling the joint covariance matrix of the three predictors and using the best linear predictor of them, we achieve better estimation efficiency compared to partial models, such as the linear regression model on each individual region. We are able to take full advantage of the correlation structure of the data. In fact, for most conventional regression models, such correlation structure is ignored, and each region has its own regression trained separately. Also, jointly modeling the covariance on the original level ( ) of %ILI as opposed to its logit-transformed version ( ) ensures the best linear predictor is optimal in mean squared error for estimating .
Our assumed covariance structure is validated by statistical testing and empirical Finally, our inclusion of ridge-regression type shrinkage on the 40 × 40 joint covariance matrix of ( ⊺ , ⊺ ) ⊺ is motivated by the estimation improvement (in terms of mean squared error) achieved by ridge regression over ordinary linear regression.

Additional evaluation metrics
We also compared ARGO2 with benchmark methods using additional evaluation metrics: mean squared error on overestimation (MSE+), mean squared error on underestimation (MSE-), and bias (Bias). Suppose ̃ is the estimator for the target ILI activity level at time . The metrics are then defined as follows: We report the overall results in Supplementary We also report in Table S16 the overall results regarding relative MSE, MAE and MAPE to the naive method, as a supplement to Table 1 (in original error metrics).

S-11
Supplementary Figure S1. National Google Trends data and Regional Google Trends (GT) data. The thick blue horizontal line separates national data from regional data. The thin black horizontal lines separate data of different regions. Each block consists of 129 query terms whose GT values across time are plotted as heat map with 0

S-12
being white and 100 being red. As shown in the figure, the sparsity of the regional GT data is much higher than that of the national GT data, indicating that regional GT data are of much lower quality than the national counterpart.

S-13
Supplementary Figure   The evaluation is conducted in multiple periods and multiple metrics. The relative MSE, MAE, and MAPE to the naive method (i.e., the ratio of the measures between the evaluated method and the naive method), and the correlation are reported, with the best performance (for each metric in each period) in boldface and the original error metrics for the naive method in parentheses.

S-38
Supplementary The evaluation is conducted in multiple periods and multiple metrics. The relative MSE, MAE, and MAPE to the naive method (i.e., the ratio of the measures between the evaluated method and the naive method), and the correlation are reported, with the best performance (for each metric in each period) in boldface and the original error metrics for the naive method in parentheses.

S-39
Supplementary The evaluation is conducted in multiple periods and multiple metrics. The relative MSE, MAE, and MAPE to the naive method (i.e., the ratio of the measures between the evaluated method and the naive method), and the correlation are reported, with the best performance (for each metric in each period) in boldface and the original error metrics for the naive method in parentheses.

S-40
Supplementary The evaluation is conducted in multiple periods and multiple metrics. The relative MSE, MAE, and MAPE to the naive method (i.e., the ratio of the measures between the evaluated method and the naive method), and the correlation are reported, with the best performance (for each metric in each period) in boldface and the original error metrics for the naive method in parentheses.

S-41
Supplementary The evaluation is conducted in multiple periods and multiple metrics. The relative MSE, MAE, and MAPE to the naive method (i.e., the ratio of the measures between the evaluated method and the naive method), and the correlation are reported, with the best performance (for each metric in each period) in boldface and the original error metrics for the naive method in parentheses.

S-42
Supplementary The evaluation is conducted in multiple periods and multiple metrics. The relative MSE, MAE, and MAPE to the naive method (i.e., the ratio of the measures between the evaluated method and the naive method), and the correlation are reported, with the best performance (for each metric in each period) in boldface and the original error metrics for the naive method in parentheses.

S-43
Supplementary The evaluation is conducted in multiple periods and multiple metrics. The relative MSE, MAE, and MAPE to the naive method (i.e., the ratio of the measures between the evaluated method and the naive method), and the correlation are reported, with the best performance (for each metric in each period) in boldface and the original error metrics for the naive method in parentheses.

S-44
Supplementary The evaluation is conducted in multiple periods and multiple metrics. The relative MSE, MAE, and MAPE to the naive method (i.e., the ratio of the measures between the evaluated method and the naive method), and the correlation are reported, with the best performance (for each metric in each period) in boldface and the original error metrics for the naive method in parentheses.

S-45
Supplementary The evaluation is conducted in multiple periods and multiple metrics. The relative MSE, MAE, and MAPE to the naive method (i.e., the ratio of the measures between the evaluated method and the naive method), and the correlation are reported, with the best performance (for each metric in each period) in boldface and the original error metrics for the naive method in parentheses.

S-48
Supplementary The evaluation is based on the average of ten US HHS regions in multiple periods and

S-51
Supplementary The evaluation is based on the average of ten US HHS regions in multiple periods and