Estimating the Population Impact of a New Pediatric Influenza Vaccination Program in England Using Social Media Content.

Background The rollout of a new childhood live attenuated influenza vaccine program was launched in England in 2013, which consisted of a national campaign for all 2 and 3 year olds and several pilot locations offering the vaccine to primary school-age children (4-11 years of age) during the influenza season. The 2014/2015 influenza season saw the national program extended to include additional pilot regions, some of which offered the vaccine to secondary school children (11-13 years of age) as well. Objective We utilized social media content to obtain a complementary assessment of the population impact of the programs that were launched in England during the 2013/2014 and 2014/2015 flu seasons. The overall community-wide impact on transmission in pilot areas was estimated for the different age groups that were targeted for vaccination. Methods A previously developed statistical framework was applied, which consisted of a nonlinear regression model that was trained to infer influenza-like illness (ILI) rates from Twitter posts originating in pilot (school-age vaccinated) and control (unvaccinated) areas. The control areas were then used to estimate ILI rates in pilot areas, had the intervention not taken place. These predictions were compared with their corresponding Twitter-based ILI estimates. Results Results suggest a reduction in ILI rates of 14% (1-25%) and 17% (2-30%) across all ages in only the primary school-age vaccine pilot areas during the 2013/2014 and 2014/2015 influenza seasons, respectively. No significant impact was observed in areas where two age cohorts of secondary school children were vaccinated. Conclusions These findings corroborate independent assessments from traditional surveillance data, thereby supporting the ongoing rollout of the program to primary school-age children and providing evidence of the value of social media content as an additional syndromic surveillance tool.


Background
In 2012 the Joint Committee on Vaccination and Immunisation recommended the extension of the annual influenza vaccination campaign to include all healthy children aged 2 to 16 years of age in England [1]. This decision was informed by influenza transmission modeling done using an evidence-synthesis approach, showing that vaccination could not only protect the children themselves from infection, but also decrease influenza transmission in the general population. This finding included the indirect protection of at-risk groups, such as people over 65 years of age or those with underlying clinical risk factors [2]. The phased rollout of the live attenuated influenza vaccine (LAIV) program began during the 2013/2014 influenza season. In the first season, the program offered vaccinations to all 2 and 3-year-olds throughout England. A number of geographically distinct pilot regions also offered vaccinations to primary school age children (4-11 years of age) to determine the optimal model of delivery to school-age children. For the 2014/2015 influenza season, the program was extended nationally to offer vaccinations to all 2 to 4-year-olds. Pilot locations were added that offered vaccinations to children either (1) of primary school age (Primary school; 4-11 years), (2) the first two years of secondary school age (Secondary school, 11-13 years), or (3) both (Primary and Secondary school; 4-13 years) to determine optimal models of delivery.

Motivation
Public Health England (PHE) has been using a variety of surveillance systems to assess the overall population impact of the childhood influenza campaign in children of school-age on influenza epidemiology to validate the direct and indirect effects of vaccinating this age group. The pilot locations for 2014/2015 are of particular interest, as the variation in target groups may offer further insights into the optimal strategies for the national rollout. During the 2014/2015 campaign, most influenza indicators through traditional surveillance systems in both targeted and nontargeted age groups demonstrated a significant reduction in pilot areas that offered the vaccine to primary school age children. However, there was little impact in pilot areas, where only two age cohorts of secondary school age children were vaccinated [3]. These surveillance indicators were based on health systems ranging from General Practitioners' consultation rates to excess mortality.
Whilst such results are important in estimating the intervention's effects on health care services, online user-generated information offers a complementary data source that can provide additional insights into the impact of such campaigns on the wider community, including those persons that do not consult the health care system. Our study also highlights the potential value of user-generated information in the absence of routine evaluation systems. Internet-based surveillance systems are being viewed as novel logistically and economically viable developments that offer great potential as an extension of traditional surveillance systems [4]. Recent research efforts have shown that in combination with machine learning techniques, data from social media or search engines can be used to accurately estimate disease-related indicators such as influenza-like illness (ILI) rates [5][6][7][8][9]. These technologies provide health monitoring systems with additional, publicly available, and potentially more timely sources of data for syndromic surveillance. Furthermore, compared to traditional surveillance systems, user-generated content may offer insights about a wider range of the population, including the bottom part of the disease population pyramid (ie, those that do not seek medical attention) [10].
For the 2013/2014 pilot areas, in order to provide further evidence of the community-wide effects of vaccinating children with influenza vaccine, Lampos et al made use of online user-generated content in combination with statistical natural language processing techniques to estimate ILI rates in the population [9]. By matching nonvaccinated control areas with pilot areas and using flu-related Twitter posts or Bing search queries from these locations, the impact of the campaign within the Primary school age pilot areas was estimated, showing a significant decrease (22% to 33% reduction) in influenza transmission in the general population in these pilot areas compared to corresponding control areas [9]. PHE's estimates also showed evidence of a reduction in influenza transmission in targeted and nontargeted age groups in pilot areas compared to nonpilot areas, based on a variety of influenza indicators during a season dominated by circulation of influenza A(H1N1)pdm09 [11].

Aim
The work in this paper applies the same statistical framework as Lampos et al [9] (with a slightly improved supervised learning approach) on Twitter data for the influenza season of 2014/2015. We aim to assess the impact of influenza vaccine pilot trials in school age children on influenza transmission in those pilot areas. The 2014/2015 season was dominated by circulation of influenza A(h3N2) and influenza B. In addition, we examined the impact of vaccinating different target populations, specifically primary and/or secondary school-age children, on influenza rates in the general population. This analysis provides further insights into the most effective strategies for reducing community-wide influenza transmission. This work also aims to reevaluate the hypothesis that a statistical framework based on online user-generated content can form a valid source for more fine-grained influenza surveillance tasks, such as estimating the impact of a targeted intervention. We repeated the analysis for the 2013/2014 LAIV campaign that was previously studied in Lampos et al [9], but with revised pilot and control areas, for consistency with our study for the 2014/2015 season.

Data Sources
Two data sources were used for the experiments: geo-located Twitter posts related to ILI and official ILI rates provided by the Royal College of General Practitioners (RCGP) [12], the latter defining the ground truth. In addition, boundary data and population estimates from the Office for National Statistics (ONS) [13,14] were used to map the vaccine pilot and control areas.

Twitter Data
The Twitter data consisted of all exactly geo-located Twitter posts in England from August 29, 2011 to August 30, 2015, which comprise approximately 1% of all tweets made by users in England. This number is a rough estimate based on approximately 20% of the United Kingdom population using Twitter, with 33% of active users assumed to be posting 5 tweets per day [15]. Our dataset consists of 350,000 geo-located tweets per day on average. As in Lampos et al [9], the same initial list of 36 n-grams (phrases with n words) related to ILI was created manually. Then, based on frequent cooccurrence with this list in the Twitter time series data, a set of 217 n-grams was extracted (n<5; see Multimedia Appendix 1).
The RCGP ILI rates used for model learning were only available on a weekly basis, so frequency rates of this set of n-grams for a period of 7 days prior to any given day were computed, and formed the explanatory variables. To estimate the impact on the pilot areas, n-gram frequencies of tweets geo-located in the chosen pilot and control areas during the intervention period were used.

Official Health Reports
Weekly ILI estimates were provided by the RCGP, a sentinel network of approximately 100 practices in England, which covers a registered population of approximately 1 million persons [12]. These ILI estimates represent the weekly incidence rate of ILI cases/consultations per 100,000 patients registered with eligible practices during that week [12]. The data used cover the period from August 29, 2011 to August 30, 2015 for England.

Pilot and Control Areas
A total number of 140 local authorities implemented vaccinations as part of the pilot program. To create a suitable list of pilot areas for the impact assessment, these areas were combined on a county level, where possible. This list included a large amount of Secondary school pilot areas (37), so only the most populated ones were considered, whilst ensuring an even geographical distribution throughout the country. The geographical distribution and the areas' population sizes were defined using ONS boundary data and population estimates of England, respectively [13,14]. Of the 7 Primary and Secondary school pilot areas, 3 were eliminated due to small size or because they were enclosed within another pilot area. Pilot areas involving special schools were ignored, as these included only a small number of schools and were thus unlikely to provide any significant community-wide benefits. This preprocessing resulted in 6 Primary school, 4 Primary and Secondary school, and 7 Secondary school pilot areas.
A list of eligible control locations was chosen according to the following criteria: appropriate distance from pilot areas, a moderate population size, and a plausible geographical spread. These criteria resulted in a list of 16 control areas. Nonoverlapping boundary rectangles represented by their North-East and South-West corners were created around the chosen pilot and control areas. The geographical distribution of the pilot and control areas is shown in Figure 1. Table 1 lists the pilot areas considered for this study. For a full list of control and pilot areas, see Multimedia Appendix 2. Geographical distribution of the pilot and control areas chosen for the study with their corresponding boundary boxes. Control areas with red boxes have a distance of at least 10 km to any pilot area. The "Secondary" and "Primary and Secondary" pilot areas that were excluded from the study are shown without boundary boxes and in a lighter shade of blue and green, respectively. Contains National Statistics and OS data, Crown copyright and database right. Table 1. Pilot areas considered for this study during the 2014/2015 LAIV program with their respective population size [14] and geographical boundary rectangle corner coordinates. Pilot areas that were also used or have partial overlap with the ones used in the 2013/2014 LAIV program are highlighted in italics.

Statistical Framework
The following sections provide a brief outline of the statistical framework that was implemented. Apart from a slightly improved supervised learning approach, this framework is based on the work by Lampos et al [9], in which it is described and validated in more detail. The method consists of first learning a nonlinear regression model to estimate ILI rates from n-grams based on user-generated content (tweets in this case). Thereafter, by making use of inferred ILI rates in matched pilot and control regions, a linear modeling approach was applied to assess the potential impact of the intervention in the pilot areas.

Estimating Disease Rates Using a Gaussian Process
The majority of techniques used to acquire infectious disease estimates from user-generated data involve the use of linear regression models [16][17][18]. Lampos et al showed that nonlinear methods can improve model performance, especially when working with a smaller feature space consisting of varying n-gram sizes [8]. The authors proposed the use of Gaussian Processes (GPs) to model ILI rates and successfully applied these to Twitter, Google, and Bing data [8,9]. See below for details of the GP model used in this study.
Let X∈ℝ N×M be the observation matrix with N weeks and M frequency rates of n-gram features. Then given inputs x,x'∈ℝ M (representing rows of X), a GP can be defined as a statistical distribution for which any finite linear combination of samples is normally distributed and is written as: Here μ(x) and k(x,x') represent the mean and covariance function (or kernel), respectively [19]. By assuming that μ(x)=0∀ i=1,…, N, the distribution is entirely determined by its covariance function. As our core kernel, the sum of two differently parameterized Matérn functions (k M ) [20], with degrees of freedom v=3/2 was found to be the most suitable for estimating ILI rates from Twitter data: where σ m represents the overall level of variance and l m a characteristic length scale. Assuming that different n-gram sizes may vary in their usage and are likely to have a more concise semantic interpretation with an increasing n, we model them with different kernels. The fact that the sum of covariance functions forms a valid covariance function in itself allows for this and we have: where g n represents the features that belong to each n-gram category and C=3 is the number of n-gram categories (3-grams and 4-grams are merged in this particular model). To model noise, we use the sum of a squared exponential: and a noise function: (δ is a Kronecker delta function), as defined in [19].
GP regression involves minimizing the negative log-marginal likelihood function: where y denotes the ILI rates time-series, (K) ij =k(x i ,x j ) and μ=(μ(x 1 ),…,μ(x N )). Once the model is learnt, newly observed feature frequency rates x * result in new ILI rate estimates y * by computing E[y * |y,Ω,x * ], the mean of the posterior predictive distribution. The performance of the model was measured using a 10-fold cross validation (random temporal splits) on the training set, using the average Pearson correlation (r) and the mean absolute error (MAE).

Estimating the Impact of the LAIV Program
Once the GP model was trained, the impact of the LAIV campaign in pilot areas could be estimated using the methodology outlined in Lampos et al, Section 3.3 [9], which we briefly describe here as well.
Given a set of pilot and control areas, n-gram frequencies of Twitter posts geo-located in those areas are extracted for a period before and during the intervention. ILI rate estimates can then be computed for all areas and supersets of areas using a pretrained GP model and we denote these with q v and q c for pilot and control areas, respectively. By looking at these ILI estimates for a number of weeks, τ={ t 1 ,…, t N }, prior to the intervention, control and pilot locations with similar influenza activity can be matched based on a strong Pearson correlation, . Assuming a linear relationship in ILI rates between locations with similar influenza activity, a linear regression model can be learnt using and (ie, the ILI estimates prior to the intervention in the various matched area pairs): where ω,β,ε i denote the regression's weight and intercept, and independent, zero-centered noise, respectively. Using q c , the ILI estimates in the control areas during the intervention, this linear model can then predict the hypothetical ILI rates in pilot locations during the intervention had the intervention not taken place: where b∈ℝ N with (b) k = β∀k=1,…, N.
Comparing these hypothetical ILI rates to the ILI rates estimated by the GP model during the intervention allows the impact of the campaign to be estimated. The following measures were applied: where denotes the mean value of q. Thus, δ v and θ v measure the absolute and relative mean impact of the intervention, respectively. Confidence intervals for these measures are produced using bootstrap sampling [21]. This calculation involves sampling with replacement the residuals ε i of the linear regression, adding them to the fitted values, and then running the linear model for these, which produces estimates for β and ω. These values are then applied to a sampled (with replacement) set of q v and q c . Repeating this procedure 100,000 times creates sets of estimates for δ v and θ v from which we can derive confidence intervals using the 0.025 and 0.975 quantiles, provided that their distributions are unimodal and symmetric. Results are considered statistically significant if absolute values are higher than two standard deviations of the bootstrap estimates [9,22].

Results
We present an assessment of the impact of the childhood LAIV campaign during the 2013/2014 and 2014/2015 influenza seasons based on the previously described methodology. The GP model was trained on RCGP ILI rates in England and Figure  2 shows the RCGP ILI rates used, with the preintervention correlation period and the two impact assessment periods highlighted.

Performance of the Supervised Model for Estimating ILI Rates
A GP regression model was trained using weekly Twitter data geo-located in England from August 29, 2011 to August 30, 2015 and the corresponding RCGP ILI rates. Based on a 10-fold cross validation, an average Pearson correlation r=0.84 with a standard deviation of 0.08 and average MAE of 2.42 (weekly ILI rate per 100,000 people) with a standard deviation of 0.52 were measured. This approach is in line with the performance of the GP model used in the previous impact assessment [9].

Impact Estimates of the LAIV Program
Using the GP model trained on a national level (England), ILI rates for the chosen pilot locations were estimated. This was done for individual pilot locations, the set of all pilot locations, and sets of pilot locations in which the same cohorts were vaccinated (ie, Primary school, Secondary school). An exhaustive search of all possible combinations of control areas was performed. These combinations of control locations were matched to the sets of pilot locations during a period prior to the start of the LAIV campaign (August 29, 2011 to September 1, 2013) based on similar influenza activity, as measured by Pearson correlation. The 2013/2014 influenza season is not included in this correlation phase, as this involved the vaccination of 2 and 3-year-olds nationally and a number of primary school age pilot areas, which could change the linear relationship between certain control and pilot locations. For each pilot area and set of pilot areas, the most highly correlated combination of control areas was used to then estimate the impact of the LAIV campaign for the 2014/2015 influenza season. There is some overlap with the pilot areas of the previous influenza season, so the same analysis was redone for the 2013/2014 season (in this case with a different set of control areas) so results could be compared to previous studies [9,11]. Table 2 and Table 3 show the results for individual pilot locations, and sets of them for the 2014/2015 and 2013/2014 influenza season, respectively. For each area, the tables include the Pearson correlation r, the mean and 95% confidence intervals of 100,000 bootstrap estimates of the absolute and relative mean impact δ v and θ v during the intervention period, the number of control areas chosen n (c), and the size of the population targeted in the pilot Pop (v) and matched collection of control Pop (c) areas. The distribution of the bootstrap estimates was assessed graphically and seemed unimodal. Thus, statistically significant results are based on absolute values being higher than two standard deviations of the bootstrap estimates and are highlighted in italics. In addition, a significant preintervention correlation was necessary for reliable impact estimates, which we defined as being a Pearson correlation >0.60, as was done in the previous study [9].   For the individual locations, Gateshead and South Tyneside did show significant results, but their precampaign correlations were 0.59 and 0.34, respectively; both were less than the predefined threshold of 0.60, which makes their impact estimates possibly less reliable.
The correlations for the 2013/2014 influenza season ranged from 0.32 to 0.82, and whilst none of the individual locations demonstrated significant results, all pilots together estimated a statistically significant impact of a 13.77% (95% CI 1.45-25.01) reduction in the mean ILI rate during that season. Note that for the 2013/2014 season, the primary school-age vaccination was the only program implemented across all pilot areas.

Principal Results
By using social media content to assess the impact of the childhood influenza pilot program in England in 2013/2014 and 2014/2015, statistically significant results suggest a reduction in the mean ILI rate of approximately 17% (Table 2, row 2, column 4) across all ages in Primary school age pilot areas only during the 2014/2015 influenza season and 14% (Table 3,

Comparison With Prior Work
Both impact estimates are in line with results from independent studies by PHE that used traditional surveillance systems [3,11]. For the 2014/2015 season, however, the impact results are generally lower than expected with only a few statistically significant results. For example, it was expected that the Primary and Secondary school or the combined set of Primary school and Primary and Secondary school pilot locations would yield significant impacts, as they included a similar program to that in the Primary school pilot areas. Looking at the boundary boxes in more detail (Figure 1) shows that of the 4 Primary and Secondary school pilot areas, Leicestershire and Salford both include substantial parts of nonpilot areas, which is likely to have biased their results and underestimated effect sizes. The lack of statistically significant results across all individual locations is possibly due to the sparsity of the Twitter data available. For example, the individual Primary school pilot areas did not yield statistically significant impact estimates (with the exception of Gateshead and South Tyneside, which did show significant results, but their preintervention correlations were below the 0.60 threshold), whilst the aggregation of all Primary school areas did.
The previous study by Lampos et al implemented a similar approach using Twitter and Bing data to assess the impact of the LAIV pilots during the 2013/2014 influenza season [9]. This study estimated the impact to be approximately 33% for the aggregation of all pilot locations based on Twitter data, which is more than double what was found in this study. The discrepancy between these results is most likely due to two factors. First, the pilot areas used for the 2013/2014 season in the present study are slightly larger than those in the previous one, as some of the reused pilot areas have been expanded. This issue particularly applies to the boundary boxes for Leicestershire and Essex, as the previous study only included parts of these areas. Second, apart from one control area (Liverpool), most of the previous control areas were part of the 2014/2015 pilot program, and thus not reusable. New control areas were therefore selected, which may explain the discrepancy in impact estimates. Nevertheless, given that both studies exhibited a significant impact, the methodology produces qualitatively consistent results for the same influenza season, even when using a different set of control and pilot areas.

Conclusions
There is a strong indication that the primary school age vaccination program has the potential to be an effective strategy in reducing influenza transmission in the general population. This notion supports the ongoing rollout of the campaign for primary school children. For a secondary school-only vaccination program offering the vaccine to just two-year cohorts (and not to all children of secondary age), there is no clear evidence of any population-wide effect. Both of these conclusions are in line with findings from previous studies and complement traditional surveillance sources in exhibiting community-wide effects of the LAIV pilot campaign [3,9,11,23].
Most current influenza surveillance schemes rely on established health systems. Although these schemes provide important information on health care-related burden of disease and potential reductions due to vaccine impact, several provide less direct insight into community-wide transmission. User-generated content from social media offers rapid access to a larger range of the population, which has the potential of including a wider community (ie, including those that do not seek medical attention) and thus offers a valuable complementary source for the surveillance and evaluation of public health programs.

Limitations
There are several potential limitations in this study. Work is still needed to refine the methods used to deal with issues such as noise, model and data biases, and the fact that estimates from user-generated content are not directly based on actual ILI cases. More advanced natural language processing techniques may deliver more accurate results [24]. The choice of control areas requires further refinement; we are seeking an even geographical distribution as well as an adequate distance from pilot areas to avoid regional biases, and to isolate the potential impact observed in pilot areas, respectively. Furthermore, the methodology is highly dependent on the quantity and type of user-generated data that is available, as this determines the accuracy and interpretation of the ILI rate estimates. The majority of Twitter users, for example, are between the ages of 15-44 years with a higher proportion situated in urban/suburban areas [25]. This factor may skew results towards illness in certain demographic groups. The current framework conducts ILI rate modeling by training on syndromic surveillance data (from RCGP), such that biases that are found there are also passed onto the models. Furthermore, even if these biases can be avoided, there is an issue that no definite ground truth exists to allow for a proper verification.

Future Work
Future work could aim at moving towards unsupervised models that do not depend on traditional surveillance sources for training purposes. These models could produce their own, independent ILI indicators based solely on user-generated content with the potential of being able to tap into the bottom part of the disease population pyramid [10]. Inference of the demographics of users, such as age [26], socioeconomic status [27,28], or severity of disease [29] could be another focus of forthcoming work. Pebody et al showed that for both influenza seasons the impact of the pilot program was lower as influenza end-points of infection became more severe, which is an insight that the current modeling framework is unable to pick up on [3,11]. With suitable data access in the future, this framework has the potential of assessing the impact of intervention programs whose uptake is variable. The applicability of this framework extends beyond influenza, but across a number of health interventions, thereby allowing for a timely and potentially cost-effective complementary to the collection of traditional surveillance data.