Ecologic Factors Associated with West Nile Virus Transmission, Northeastern United States

Risk for disease was 4 times greater in the least forested counties.


W est Nile virus (WNV) disease arrived in the United
States in 1999 in New York City, yet how the disease became established and details concerning the nature of the transmission cycle in the United States remain unclear. Experience in the northeastern United States suggests an urban concentration of human WNV disease cases (1,2); however, environmental factors, such as urbanization, that underlie the patterns of transmission to humans have not been explicitly evaluated. We used human surveillance data to describe and quantify the spread of WNV cases in the northeastern United States and empirically tested the hypothesis that human WNV disease is linked to the urban environment independent of human population density.
In the northeastern United States, a mainly urban cycle of WNV transmission is supported by the role of bird and mosquito species. This enzootic cycle occurs in urban bird species; human cases occur in late summer (2)(3)(4)(5)(6)(7). Culex pipiens Linnaeus is the most commonly implicated mosquito vector in the maintenance of WNV in birds (1,2,8,9). In the northeastern United States, this species feeds on birds found in urban areas, such as the American robin (Turdus migratorius), house sparrow (Passer domesticus), and European starling (Sturnus vulgaris) (2,10). The role of Cx. pipiens mosquitoes as primary WNV vector is supported by consistent isolations of WNV from mosquitoes captured in surveillance traps (8,(11)(12)(13)(14) and by associations between virus-infected mosquitoes and dead-bird reports (15).
A more contentious issue is the role of different mosquito species in transmitting, or bridging, WNV between birds and other vertebrates, including humans. Cx. pipiens mosquitoes are known to breed in the organically rich water of artifi cial containers frequently found in urban areas (16)(17)(18). Habitat modeling of potential WNV vectors in the northeastern United States indicates an urban focus for Cx. pipiens mosquitoes (19). However, its tendencies to mostly feed on birds make it an unlikely bridge vector, although other researchers have suggested that this species exhibits late season host switching to humans (5). Aedes vexans and Cx. salinarius mosquitoes have been implicated as bridge vectors in this region (1-3) because of their abundance and more nonspecifi c feeding patterns (20). Although both are present in urban areas, other land uses have been found to be more predictive of their distribution (19). These other studies do not indicate whether human incidence would be linked to the same ecological factors driving enzootic transmission.
In this study, we explicitly tested whether both enzootic and bridge transmission occur in urban areas by evaluating human WNV disease and degree of urbanization within counties. We estimated the initial spatial spread in time to fi rst case in Queens, New York, the site of fi rst WNV detection (21), from 1999 through 2006. We also examined the trend for increasing incidence with decreasing forest cover while attempting to control for surveillance efforts and removing the effect of spatial proximity. The methods provide an example of how surveillance data with low spatial resolution can be used to quantify risk.

Methods
The study was focused in 8 northeastern states (Connecticut, Delaware, Massachusetts, Maryland, New Jersey, New York, Pennsylvania, and Rhode Island) where the same mosquito species are likely to act as primary vectors. States to the north of the study area have had limited numbers of cases and may involve different mosquito species. States farther south and west are likely to involve different species of mosquitoes; hybridization between Cx. pipiens and Cx. quinquefaciatus is more common in southern latitudes (16).

Human Incidence Data
We used annual numbers of human WNV cases reported to the Centers for Disease Control and Prevention (CDC) from 1999 through 2006. Human case data were acquired through multiple sources but met the CDC case defi nition, which includes clinical disease with laboratory confi rmation. Data for 1999 were extracted from the Morbidity and Mortality Weekly Report (22), and data for 2000 were downloaded from the National Atlas website (http:// nationalatlas.gov; 23). Human case data for 2001 through 2006 were downloaded from the US Geological Survey maps page (http://nationalatlas.gov/printable/wnv.html; 24). To protect anonymity, human data from these sources are compiled at the county level. All other data were aggregated by county to match this resolution.

Geographic Data
County boundaries for the United States and 2000 census data were downloaded from the National Atlas website (http://nationalatlas.gov/boundaries and http://national atlas.gov/people), and county centroids were identifi ed to facilitate the calculation of distances between counties. Land-use data were downloaded by state from the US Geological Survey National Land Cover Institute (http:// landcover.usgs.gov/natllandcover.php; 24). Percentage of land cover class by county was extracted by using Fragstats Software (25). Land uses classifi ed as low-intensity residential, high-intensity residential, commercial/industrial/transportation, and urban/recreational grasses were grouped into a class called urban. Land uses classifi ed as deciduous, evergreen, and mixed forest were grouped into a class called forest. These 2 land use types were considered biologically relevant to the study question.

Statistical Analyses
To document evidence for the temporal and spatial spread of WNV disease, we generated cumulative incidence curves by state and by year and examined the distance between counties with cases. Time-to-fi rst-case detection (in years) was used as the outcome predicted by distance to the origin, which was Queens, New York. For distance calculations, we ignored counties reporting no WNV disease cases because the fi rst case is theoretically still to be determined. To visualize WNV disease spread, we plotted the mean incidence by year, using the spatial statistics tools of ArcGIS (26).
Distance measures were then used to adjust for the effect of spatial proximity in the regression analyses (27). Incorporating measures of spatial proximity in a regression model removes the effect of spatial structure that might otherwise result in overestimation of the strength of the association between the outcome, WNV incidence, and the explanatory environmental variables (28,29).
Logistic regression modeling was initially used to identify the relevant predictors and to quantify their relative effects by calculation of odds ratios (ORs). Number of cases per county was standardized by using the 1990 US Census population density. Cumulative WNV disease incidence data from 1999 through 2006 were dichotomized at their median to provide 2 categories of high and low risk. Predictor variables, percent urban, percent forested, county area, and per capita county income were stratifi ed by quartiles. Logistic models were tested by using the Hosmer-Lemeshow goodness-of-fi t test. The best model was selected based on the Akaike information criterion (AIC), which is a measure of fi t that accounts for the number of parameters in the model. Models within 2 AIC units are considered comparable; models within 7 AIC units have less support but are still comparable; and models with differences >10 AIC units are not comparable (30). The relationship between increasing cases and decreasing percentage of forested land was tested by using generalized least-square regression in STATA (31).
A risk model of total incidence was developed by using log (count +1) transformed incidence as the response variable and the variables identifi ed as important in the logistic regression analyses as predictors. To obtain a better fi t, predictor variables were entered as continuous values for this regression. The κ statistic was used to assess agreement greater than chance between the median dichotomized original incidence and the predicted incidence, for which <0.21 is considered slight to poor and >0.61 is considered substantial to almost perfect (32).
All models were initially run using only the land-use predictors; and the Moran I test was used to assess whether closer observations were more similar than those farther apart. This fi nding of an association based on spatial location could indicate that proximity, rather than environmental factors, explains the distribution of disease incidence. Distance variables control for this potential spatial proximity effect and refl ect the presumed biological relationships within the data.
The models were also adjusted for surveillance effort. Human disease surveillance data must be interpreted with knowledge of the biases inherent to its collection (33). County per capita income was used as a measure of potential investment in surveillance and laboratory testing, as has been used in prior studies of surveillance for animal rabies (34).

Associations Based on Spatial Proximity
A cursory examination of the epidemic curve of WNV cases reported from each state during the 8-year study indicated that peak incidence was broadly overlapping in all states ( Figure 2, panel A). However, cumulative distribution functions of total WNV cases ( Figure 2, panel B) by year indicated that New York experienced its median case earlier in the regional epidemic than did other states (Massachusetts, New Jersey, and Connecticut), which suggests a spatiotemporal spread of WNV. Because a spatial component to spread was evident, we evaluated distance between counties to assess the spatial relationship between counties and to control for the effect of spatial proximity. The spatial component alone explained 15% of the variance in time to fi rst case when Queens, New York, was used as the origin (n = 123 counties with cases reported, p = 0.001). After 2004, no new counties reported WNV cases, and the incidence centroids of cases in 2005 and 2006 were close to one another and had shifted back toward the origin, which suggests that the disease may have reached endemicity in the region (Figure 3).  The box plot provides the median, lower, and upper quartiles; the standard deviation; and any data outliers. This plot excludes those counties that did not report cases. The outliers tend to be the few cases that occurred in areas with low populations.
Risk (high or low) for WNV cases was signifi cantly associated (by county quartile) with measures of urbanization and with percentage of forested or urban land. Because these 2 measures were highly correlated, we used only a single measure in the fi nal analysis ( Table 2). Total county area and other demographic indices (age) were not significant predictors and are not shown.
To adjust for surveillance bias and the spatial relationship among proximal counties, we included the variables of county-based per capita income and distance from Queens, New York, respectively ( Table 2). Both forested (χ 2 = 36.67, df = 11, p<0.001) and urban (χ 2 = 33.55, df = 11, p<0.001) predictors were signifi cantly associated with WNV incidence and provided a good fi t (forested: Pearson χ 2 = 209.27, df = 192, p = 0.19; urban: Pearson χ 2 = 202.78, df = 192, p = 0.28). As before, no effect of spatial proximity was found in the residuals (forested: Moran I = -0.007, Z = -0.38, p = 0.35; urban: Moran I = 0.001, Z = 0.93, p = 0.18). Although all models were signifi cant and fi t the data, the latter model was preferred on the basis of AIC (not controlling for spatial proximity AIC forested = 270.7, AIC urban = 281.2; controlling for spatial proximity AIC forested = 264.1, AIC urban = 267.3) and included biologically relevant controls for the effect that spatial proximity might have in estimating the association between the outcome, disease incidence, and environmental variables of interest. A general, dose-dependent trend indicated increasing incidence as measures of urbanization increased (higher incidence with decreasing percentage of pixels classifi ed as forest in each county: χ 2 = 9.47, df = 1, p<0.01; goodness of fi t χ 2 = 3.50,

1542
Emerging Infectious Diseases • www.cdc.gov/eid • Vol. 14, No. 10, October 2008 Cumulative proportion of total cases for the 8 years also highlighting the 2003 regional peak but suggesting a spatial spread where cases started to rise earlier in NY than in states such as DE that were more distant from the epicenter. df = 2, p = 0.17; higher incidence with increasing percentage urban land: χ 2 = 7.13, df = 1, p<0.01; goodness of fi t χ 2 = 1.98, df = 2, p = 0.37). The logistic regression model of dichotomized total incidence for the 8 years of the study, controlling for income (categorical variable by quartile) and for the effect of spatial proximity (distance variables), also showed a distinct trend of increasing incidence with percentage of forest cover; counties with <38% forest cover were 4.4× more likely (95% confi dence interval 1.4-13.2, p = 0.01) to have high WNV incidence than were counties with >70% forest cover ( Table 2).

Predictive Model
We used the predictors identifi ed in the logistic regression analysis to develop a linear regression model to predict total incidence (log count + 1 transformed for a normal distribution), using the quartile percent forested land by county. Per capita income (as a continuous variable) was used to control for surveillance effort. This model explains 9.7% of the variance in the total incidence (log count + 1) (p<0.001); however, the residuals indicated an effect due to spatial proximity (Moran I = 0.0349, Z = 5.925, p<0.001). Controlling for this spatial effect and surveillance effort resulted in a better model (r 2 = 0.20, p<0.001; Moran I = -0.003, Z = 0.26, p = 0.40). The κ statistic indicated good agreement (κ = 0.343, SE = 0.066, Z = 5.22, p<0.001, agreement = 67.16%) between the predicted and the observed outcomes when the binomial categorization of incidence was used and resulted in 51 county incidence entries being correctly identifi ed as being below the median and 86 being correctly identifi ed as being above the median. Errors were primarily in the direction of predicting the incidence above the median. When surveillance and spatial proximity were controlled for, the risk for WNV disease increased by 0.25% for every 1% decrease in forest cover. For more direct comparison with the logistic regression outcome, mov-ing from the highest category of forest cover (>69.59%) to the lowest (<38.29%), resulted in a 6.16% increased risk for WNV disease.

Discussion
This study documents the concentration of WNV cases within urban areas of the northeastern United States and provides a quantitative estimate of the effect of varying degrees of urbanization on the risk for WNV infection at the county level. Land-use data were used to ascribe degree of urbanization as a predictor for WNV disease risk; incidence models were generated, controlling for human population density, environment-based spatial associations in the predictors, and potential biases in WNV incidence reporting resulting from the unequal resource bases among counties.
Beginning in 1999, human WNV cases were reported in counties distant from Queens, New York, the presumed origin of infection. Although the epidemic initially appeared to spread in a west/southwesterly direction in the 8-state region examined, by 2005 the initial epidemic appeared to wane, and reports of disease among newly affected counties dropped to zero. The resulting incidence maps suggest a WNV disease-endemic situation in the northeastern United States. The initial spread was not continuous along neighboring counties; rather, greater incidence was seen in urban counties after controlling for human population density, surveillance bias, and the effect of spatial proximity. The best model indicates 4× the risk for disease in the counties that fall in the lowest incidence quartile of forested land compared with the highest. The predictive nature of the data is also explored with the caveat that additional predictor variables are needed; nonetheless, it indicates increasing risk for WNV disease with decreasing forested lands.
The association between urban land use and human cases indicates that urban/suburban land use enhances en- vironmental conditions for both enzootic and bridge transmission, at least at the county level. The spatial resolution of human surveillance data did not allow for fi ner evaluation of within-urban associations. Brownstein et al. linked human WNV cases to greenness indices in urban areas and found an optimal vegetation index associated with higher human cases (35). Brown et al. found an environmental separation of bridge and enzootic vectors; bridge vectors occurred in areas with vegetation that might be associated with residential areas within a city (36). Finer spatial resolution human data would allow for within-county analyses that might provide better estimations of where the cases (urban, periurban) are occurring. This would improve the predictive power of land use in the models, and the better association between land use and cases might help further elucidate which mosquito species are involved as bridge vectors.
Because of the type and resolution of the data, a sample predictive model, and not a predictive map, is provided. Nonetheless, the data and analysis provided are insightful as potentially predictive models. Additional data, such as bird abundance and perhaps also mammal abundance, are needed (37). Because of the often strict host and habitat preferences of mosquito species, mosquito surveillance data could also improve the predictive power and validity of the model. Our best predictive model explains only 20% of the variance; additional variables such as these might improve the model because the abundance of hosts and mosquito species will have a considerable effect on WNV transmission.
Despite the reluctance to use human surveillance data for models of disease transmission (33), such data can provide information about spatial associations in vector-borne disease as shown here and by others (34,38,39). This type of human surveillance modeling provides some useful insight into the distribution of human WNV cases and supports the current understanding of the transmission cycle.
To predict WNV disease requires understanding of the factors driving both enzootic transmission and bridging to humans. Different data availability and scales are involved in studying these 2 processes. We took advantage of the national coverage of the human incidence dataset to examine the spatiotemporal spread of WNV in this region and to generate a risk model based on land use, adjusted for the effect from spatial proximity. We show that human surveillance data at the county level are consistent with the urban nature of this disease system, as has been found in studies of enzootic transmission, indicating that the 2 processes occur in or near urban areas.