A meteorological-based forecasting model for predicting minimal infection rates in Culex pipiens-restuans complex using Québec’s West Nile virus integrated surveillance system

Background The ministère de la Santé et des Services sociaux (MSSS) du Québec (Québec’s health authority) has expressed an interest in the development of an early warning tool to identify seasonal human outbreaks of West Nile virus infection in order to modulate public health interventions. The objective of this study was to determine if a user-friendly meteorological-based forecasting tool could be used to predict minimal infection rates for the Culex pipiens-restuans complex—a proxy of human risk—ahead of mosquito season. Methods Annual minimal infection rate (number of positive pools/number of mosquitoes) was calculated for 856 mosquito traps set from 2003 to 2006 and 2013 to 2018 throughout the south of Québec’s. Coefficient of determination (R2) were estimated using the validation dataset (one third of the database by random selection) with generalized estimation equations, which were prior fitted backwards with polynomial terms using the training dataset (two thirds of the database), in order to minimize the Bayesian information criteria. Mean temperatures and precipitation were grouped at five temporal scales (by month, by season and by 4, 6 and 10-months groupings). Results Mean temperatures and cumulative precipitation from the previous months of March (R2=0.37), May (R2=0.36), December (R2=0.35) and the autumn season (R2=0.38) accounted for ~40% of Cx. pipiens-restuans annual minimal infection rates variations. Including the “year of sampling” variable in all regression models increased the predictive abilities (R2 between 0.42 and 0.57). Conclusion All regression models explored have too weak predictive abilities to be useful as a public health tool. Other factors implicated in the epidemiology of the West Nile virus need to be incorporated in a meteorological-based early warning model for it to be useful to the provincial health authorities.


Introduction
West Nile virus (WNV) has been the most important mosquito-borne infection in Québec for the past two decades (1).A provincial surveillance program specific for this arbovirus has been put in place since 2000 (2), one year after its first detection in North America following an outbreak of neuro-invasive diseases in the state of New York, United States (US) (3).It was composed of an enhanced passive surveillance of humans, of sick or dead wild birds (which act as reservoirs or accidental hosts) and active mosquito surveillance (4).The surveillance system confirmed WNV's introduction in Québec in 2002 with the first WNV-positive birds and locally-acquired human cases (5,6).Neurological cases of WNV-infections typically represent a very small proportion (~1%) of infected individuals since 70% to 80% of infections remain asymptomatic, while the rest have unspecific symptoms (i.e.influenza-like illness) (3).Between 2003 and 2018, a total of 541 cases (n=24 deaths) were reported in Québec with the following clinical presentation: neurological impairment (70%); non-neurological (23%); asymptomatic (6%); and unknown (1%) (1).Since then, two peaks of reported cases have been observed in 2012 (n=134; 30% of cases from across Canada) and 2018 (n=200; 46% of cases from across Canada) (1,7).Yearly fluctuations in the number of human cases are observed in other regions (8,9) and appeared to be associated with phenomena occurring at different geographic and temporal scales (10) (i.e. according to local interactions between mosquitoes, birds and humans, environmental factors and large-scale climatic fluctuations such as the El Niño-Southern Oscillation) (11)(12)(13)(14).
Scientists are still working to understand and identify factors influencing outbreaks of WNV infections, and to develop models that would be able to predict when and where these outbreaks could potentially occur.The provincial health authority (Ministère de la Santé et des Services sociaux) expressed an interest in the development of an early warning tool to identify outbreak years in advance, and to use this information to mitigate risk and decrease human exposure by modulating their interventions.Such tools are generally based on meteorological data in order to predict a potential outbreak with a delay ranging from a few weeks to a few months before the surveillance system confirms that WNV is circulating (15).Temperature and precipitation are the most commonly used WNV predictors (16).The main goal of this project is to determine if a meteorological-based early warning tool could predict WNV infection rates in mosquitoes.

Methods
The history of Québec's WNV integrated (using human, animal and entomological components) surveillance system is fully described elsewhere (17).Background screening of the scientific literature was used to build a directed acyclic graph (DAG) (Figure A1) to conceptualize relationships between potential drivers of WNV and identify important variables to include in models, using the DAGitty tool (18).Data from 13,830 mosquito pools of the Culex pipiens-restuans complex, these two species espèces (Cx.pipiens and Cx.restuans) cannot be differentiate morphologically (19), were extracted from the Système intégrée des données de vigie sanitaire du virus du Nil occidental (20) to calculate annual minimal infection rate (MIR) (number of positive pools/number of mosquitoes in each pool) (Figure 1).The MIRs were calculated for each of the 856 mosquito pools, which were grouped for each trap deployed per year (i.e.mosquito trap-year) and distanced by greater or equal to 1 km 2 radius, since it represents the average travel distance of Cx. pipiens-restuans (21) and the spatial resolution of the meteorological data (1 km 2 ).Mosquito surveillance was carried out inconsistently in 12 out of 18 Public Health Units (PHU), with a greater number of mosquito pools tested for Montreal and Montérégie (Table 1), except since 2017 where 49 traps are deployed annually in the same seven PHU (Montréal=5, Laval=4, Montérégie=5, Outaouais=7, Lanaudière=7, Mauricie-Centre-du-Québec=7 et Capitale-Nationale=14).
Explanatory variables included the mean daily temperatures (T mean =T maximal +T minimal /2) (oC) and daily precipitation (mm) extracted from the Oak Ridge National Laboratory Distributed Active Archive Center (http://daymet.ornl.gov/)according to the Global Positioning System (GPS) coordinates of each trap-year (latitude and longitude).Mean daily temperature and sum of precipitation were grouped according to five different temporal scales: 1) previous months; 2) previous seasons: summer (June and July), spring (March to May), winter (December to February) and autumn (September to November); 3) previous four-months (November of the year before to February of the same year); 4) previous six-months (September of the year before to February of the same year); and 5) previous ten-months (September of the previous year to July of the current year) groupings, based on the literature suggesting that meteorological conditions impact WNV transmission months ahead (11,22).Meteorological data from the closest neighboring traps were assigned to eight mosquito traps with missing data, with half of these traps being separated by more than one kilometer from their matched trap.To consider variations in the geographical and temporal mosquito sampling, two new "sampling strengths" variables were created for each  PHU (Figure 2) based on the number of tested mosquito pool: 1) annually; and 2) during the 2003-2018 period (Table 1).
Bivariate analysis between Cx. pipiens-restuans annual MIR (outcome) and explanatory variables (PHU, year and month) were performed using a linear regression model; an auto-regressive working covariance matrix with mosquito traps in the same city were nested within each PHU.Then multivariate regression models evaluated the predictive ability of each temperature and precipitation temporal scale to estimate MIRs, with observations randomly separated into training (n=2/3) and validation (n=1/3) datasets.A backward variable selection with a maximum of four polynomial terms to obtain the lowest Bayesian information criteria using the training dataset was fitted back into a Poisson regression model using the validation dataset (23).MIRs were obtained using the number of positive Cx.pipiens-restuans pools on the logarithm of the number of mosquitoes tested, while mosquito traps in the same city were nested within each PHU to account for correlated data using an auto-regressive working covariance matrix.The methodology of sensitivity and post hoc analyses are described in the Annex material.The SAS macro (%RsquareV) was used to calculate the coefficient of determination (R 2 ), reflecting the proportion of the variance explained by the predictors (24).Only R 2 ≥0.30 are highlighted in the text.The 95% confidence intervals (CI) and p-values are presented, and all postulates associated with the regression models were visually inspected (25).Statistical analysis were performed using version 9.4 of SAS/STAT TM software (SAS Institute, Cary, North Carolina, US).

Results
Since 2003, 172,094 Cx. pipiens-restuans mosquitoes were analyzed by polymerase chain reaction (PCR), resulting in a mean MIR of 3.1 per 1,000 mosquitoes (95% CI; 2.2-4.0), when data were grouped under the 856 mosquito trap-years, which varied according to the sampling year (χ 2 =28.5; p=0.0008) and month (χ 2 =5.6; p<0.0001) but not across the PHU (χ 2 =9.9; p=0.5) (Table 1).For the studied period, mean monthly temperatures near mosquito traps varied from 23.5°C (July) to −19.7°C Abbreviations: CI, confidence interval; N/A, not applicable; -, no information is available a Minimal infection rates using mosquitoes trapped during the month of August and September was 5.0 (95% CI; 2.6-7.4)b Using a linear regression model with a repeated measure on the mosquito trap name c Monthly minimal infection rates were calculated using the dataset containing the 13,830 mosquito pools while the mean monthly temperatures and precipitation were calculated using the 856 mosquito pools grouped by unique trap-year d These numbers are lower than the fixed number of 49 mosquito traps that have been deployed since 2017, because entries from traps within a 1 km 2 radius were merged together e Not available because of multicollinearity (VIF>2.5) in the models  IMPLEMENTATION SCIENCE (January) while monthly cumulative precipitation varied from 0.3 mm (November) to 7.7 mm (September).
The predictive ability of the previous mean temperatures and precipitation to predict the seasonal MIRs was the highest using the previous months of March (R 2 =0.37),May (R 2 =0.36) and December (R 2 =0.35) as well as the previous autumn season (R 2 =0.38), with all other coefficients of determination below 0.30 (Table 2).The results of all the sensitivity and post hoc analysis are fully described in the Annex material.Briefly, including the year of sampling variable in the main regression models increased all R 2 (range: 0.42 to 0.57) while the increased was less The repeated measure was limited to each mosquito pool to allow the model to converge as it was not possible to nest mosquito traps from the same city within each Public Health Unit to account for their proximity substantial when including the PHU variable (range: 0.18 to 0.46) (Table A1).

Discussion
This is the first time that a meteorological-based early warning tool to predict Cx. pipiens-restuans annual MIRs in Québec has been investigated using the province's own WNV surveillance data.The ability of mean temperatures and precipitation to predict MIRs was highest when the data were grouped under the previous months of March, December and May, and previous autumn season (September to November of the previous year).However, the predictive capacity of the model is probably too weak to be useful for an early warning system.The fact that only ~40% of the variance in Cx. pipiens-restuans annual MIRs can be explained by mean temperature and precipitation suggests that others factors are implicated in the epidemiology of WNV and that these factors need to be incorporated into this kind of predictive tool (15).
Many early warning or forecasting systems that are primarily or solely based on meteorological data have been developed to predict other mosquito-borne diseases (i.e.chikungunya, dengue, malaria, yellow fever, Zika) to anticipate vector control responses (24).Our results suggest that presently these models yield insufficient predicting abilities to be useful in real life for Quebec's public health authority.The epidemiology of WNV is known to be complex and adding variables such as environmental drivers (habitat suitability or distribution of vectors and reservoirs, vegetation index, land use) and data on avian hosts (abundance, migrations) usually improve predictions (15).However, the necessity of integrating data from the ongoing surveillance system and refining the geographical scale leads to less user-friendly tools, for public health authorities (26).

Strengths and limitations
One of the strengths of this project is that the geographical proximity of mosquito traps was accounted for in most of the regression models, as this can bias results.Additionally, multiple sensitivity and post hoc analyses brought robustness to our results.The polynomial equations accounted for nonlinear associations between meteorological variables and MIRs, allowing for a better representation of their relationship and a better model fit, while the backward selection of variables proved to be as effective as the visual inspection of generalized additive models.Because MIRs are low numerical values, an attempt to increase the statistical power was made by calculating annual MIRs using mosquitoes trapped during August and September.Though the mean annual MIRs increased for most positive mosquito pools, the coefficient of determination for most regression models decreased unexpectedly; possibly as a consequence of a smaller sample size when including only August and September, since 13 mosquito traps had no mosquitoes captured, which decreased statistical power.
Based on our conceptual framework, it is unknown if an over adjustment bias occurred when adjusting for "year of sampling" in the regression models (27).Despite multiple sensitivity analysis with potential confounding variables, a residual confounding effect cannot be discarded (28).Because mosquito pools were grouped for each trap-year, it was not possible to account for the different types of traps that were employed.This limitation has probably a marginal effect, since 85.2% of mosquitoes were captured by the CDC light traps.It was not possible to account for the use of larvicides during the 2003-2005 and 2013-2014 seasons, which might have influenced the estimates.Because mosquito PCR testing is done independently from mosquito trapping, if a measurement bias was to be present, a decrease in the association would also be suspected.
A selection bias could influence our results because of the geographical distribution of traps which is linked to 1) past human WNV incidence and 2) the influence of weather on human physical activity and exposure to mosquitoes (29,30).Additionally, the fixed location of mosquitoes traps since 2017 comes at the expense of detecting WNV emergence in the less densely populated PHU situated at more northern latitudes.
The limited number of mosquito traps on such a huge territory and the absence of mosquito surveillance during the colder months reduces the statistical power.Hence, a higher number of traps distributed over a wider geographical area would provide a wider range of meteorological data and increase statistical power.Since WNV vectors vary according to regions, and transmission pathways seem to be influenced environmental and geographical variables at a finer scale then the PHU results of this project may not be transferable to other situations.Despite these limitations, Québec's WNV surveillance data will be explored further.The next steps will be to directly predict human WNV incidence using meteorological data and bypass the gap that occurred in entomological data between 2007 and 2012 and to explore the development of an early detection system tool.

Conclusion
All regression models explored have too weak predictive abilities to be useful as a public health tool.Other factors implicated in the epidemiology of the West Nile virus need to be incorporated in a meteorological-based early warning model for it to be useful to the provincial health authorities.

Methods
Because human cases of West Nile virus (WNV) infection are usually reported at the beginning of August (1) and that the objective is to develop an early alert tool able to predict these human cases a couple of months before they occur, temperatures and precipitation during August have not been included in our analysis.
Background screening of the scientific literature was used to build a directed acyclic graph (DAG) (Figure A1) to conceptualize relationships between potential drivers of WNV and identify the variables that need to be adjusted for in the model using the DAGitty tool.

Sensitivity and post hoc analysis
Seven sensitivity analyses were carried out as follows: 1) regression models were fitted using visual inspection of graphs generated by a generalized additive model with a maximum of four degree of smoothing performed by generalized cross validation; 2) the main models were reanalyzed without the data from the four mosquito traps (LAV 034, LAV 904, SHA 002 and WEN 001) that inherited meteorological data of their closest neighbour with a radius superior to 1 km 2 ; 3) by including the Public Health Units (PHU) 2003-2018 sampling strength variable; 4) by including the annual sampling strength by PHU; 5) by attributing the minimal infection rate of mosquitoes trapped only during the month of August and September instead of during the whole summer; 6) by adding the year of sampling in the model; and 7) by adding the PHU in the model instead of in the repeated measure.Following results from the 6 th sensitivity analysis, year of sampling was used as a proxy for unmeasured confounders.An post hoc Poisson analyses was performed by modeling directly the MIR using the year of sampling (categorical) as the explanatory variable.To measure if the meteorological variables were confounders of this association, the year of sampling was included as a continuous variable to facilitate the calculation and interpretation of the difference and ratio between both estimates (β Year of sampling ).
Confounding was present based on a difference and/or ratio of estimates superior to 0.1 decimal or 10% (31).Four additional bivariate analysis were carried out with the year of sampling as an explanatory continuous variable and respectively, the mean annual temperatures, mean annual precipitation, annual sampling strength and PHU sampling strength as the main outcome.Specifying a normal distribution, an "identity" link and an auto-regressive working covariance matrix with mosquito traps in the same city nested within each PHU, yielded estimates representing differences in minimal infection rates (MIRs).

Sensitivity analysis
Fitting regression models after a visual inspection of the generalized additive model graphs led to similar results of the main analysis, except for the month of February (R 2 increased to 0.22) and August (R 2 increased to 0.24) as well as the ten-month grouped (R 2 increased to 0.18), while all other comparisons had difference within five decimal units (Table A1).Excluding the four entomological traps with meteorological data borrowed from the closest neighboring trap influenced the results for the months of May (R 2 decreased to 0.19) and September (R 2 increased to 0.38), summer (R 2 increased to 0.29) and winter (R 2 decreased to 0.16) (Table A1).When including the PHU sampling strength variable, all R 2 were within five decimals of the main results (Table A1).
Including the variable reflecting the annual strength of sampling, increased the predictive abilities of at least five decimals for nine temporal groupings (June, April, February, November, August; summer and spring; the six and ten-months groupings).Using the minimal infection rates of Cx. pipiens-restuans trapped in August and September instead of the whole summer, influenced the R 2 for 14 temporal groupings (decreased for May, April, December, November, September, summer, winter, autumn, four and six-months grouping and increased for June, October, August and spring).When including the year of sampling in the main regression models, all coefficient of determination increased between 0.42 and 0.57.Including the PHU variable in the main regression models increased the predictive ability of at least five decimals for all temporal groupings except July, May and autumn.

Post hoc analysis
When modelling directly the MIR using only the "year of sampling" variable, the R 2 was 0.53.The estimate associated with the year of sampling on a continuous linear scale and as the only explanatory variable of Cx. pipiens-restuans MIRs was 1.09 (95%

Annex
Figure A1: Conceptual framework using a directed acyclic graph between meteorological variables and the main outcomes of Québec's West Nile virus integrated surveillance system Notes: The darker arrows represent the associations evaluated in this project.To estimate the associations between mean temperatures and precipitations and Cx.pipiens-restuans complex minimal infection rates the model needs to be minimally adjusted for the "sampling year" variable.However, since the database already represents the seasonal minimal infection rate for each mosquito trap-year, this variable has not been directly included in the regression models of the principal analysis.Its inclusion was rather evaluated using a sensitivity analysis

Figure 1 :
Figure 1: Flow chart of the entomological data extracted from Québec's Système intégrée des données de vigie sanitaire du virus du Nil occidental

Figure 2 :
Figure 2: Map of the 18 Public Health Units in the province of Québec according to the number of mosquito pools tested between 2003-2006 and 2013-2018

Table 1 :
Descriptions of Culex pipiens-restuans minimal infection rates and meteorological data at different geographical and temporal scales

Table 1 :
Description of Culex pipiens-restuans minimal infection rates and meteorological data at different geographical and temporal scales (continued)

Table 2 :
The ability of meteorological variables to predict Culex pipiens-restuans minimal infection rates, at different temporal grouping scales for the meteorological data A polynomial equation with three terms was used as the model did not converge with four b Summer represents the months of June and July; Spring represents the months of March to May; Winter represents the months of December to February; Autumn represents the months of September to October c a

Table A1 :
Penalized coefficients of determination (Penalized R 2 for all seven sensitivity analyses) Arrows indicate the direction of change above and below five decimal units compared to results of the main analysis in Table2; 1.06-1.11;p<0.0001)whileyearlyfluctuationsare observed when treated on the categorical scale (results not shown).The results of the estimates obtained in all the main regression models using the meteorological variables and year of sampling varied between 1.06 and 1.15, with very low confounding effects on the difference and ratio scales (TableA2).When comparing MIRs between years and using 2018 as the reference, all anterior sampling years had lower values except for 2016 since the upper 95% CI of the estimate overlaps the null value (TableA3).When the year of sampling is treated on a continuous scale, the number of mosquito traps decreased linearly from 2003 to 2018 (β=−1.13[−1.45-−0.81;p<0.0001]).Therefore, the number of mosquito traps and MIRs are inversely correlated between 2003 and 2018 with an increase of 2% in MIRs for each less mosquito trap in time (β=0.98 [0.98-0.99;p<0.0001]).At the level of each mosquito trap location in Québec, mean annual temperatures increased (β=0.04 [0.02-0.06;p=0.0003]) while mean annual precipitation decreased (β=−0.008[−0.02-0.006;p=0.07]) from 2003 to 2018.
b A polynomial equation with three terms was used as the model did not converge with four c Summer represents the months of June and July; Spring represents the months of March to May; Winter represents the months of December to February; Autumn represents the months of September to October d The repeated measure was limited to each mosquito pool to allow the model to converge as it was not possible to nest mosquito traps from the same city within each Public Health Unit e No repeated measure was employed for this model to converge IMPLEMENTATION SCIENCE CI

Table A2 :
Differences and ratios between estimates for sampling year in the model, with and without meteorological data using the 2003-2018 entomological dataset β without meteorological data equivalent to 1.09 (95% CI; 1.06-1.12;p-value <0.0001)β with meteorological data b β without meteorological data equivalent to 1.09 (95% CI; 1.06-1.12;p-value <0.0001) / β with meteorological data c A polynomial equation with three terms was used as the model did not converge with four d Summer represents the months of June and July; Spring represents the months of March to May; Winter represents the months of December to February; Autumn represents the months of September to October e The repeated measure was limited to each mosquito pool to allow the model to converge as it was not possible to nest mosquito traps from the same city within each Public Health Unit a f No repeated measure was employed for models to converge