Linking Lyme disease ecology and epidemiology: reservoir host identity, not richness, determines tick infection and human disease in California

Understanding the community ecology of vector-borne and zoonotic diseases, and how it may shift transmission risk as it responds to environmental change, has become a central focus in disease ecology. Yet, it has been challenging to link the ecology of disease with reported human incidence. Here, we bridge the gap between local-scale community ecology and large-scale disease epidemiology, drawing from a priori knowledge of tick-pathogen-host ecology to model spatially-explicit Lyme disease (LD) risk, and human Lyme disease incidence (LDI) in California. We first use a species distribution modeling approach to model disease risk with variables capturing climate, vegetation, and ecology of key reservoir host species, and host species richness. We then use our modeled disease risk to predict human disease incidence at the zip code level across California. Our results suggest the ecology of key reservoir hosts—particularly dusky-footed woodrats—is central to disease risk posed by ticks, but that host community richness is not strongly associated with tick infection. Predicted disease risk, which is most strongly influenced by the ecology of dusky-footed woodrats, in turn is a strong predictor of human LDI. This relationship holds in the Wildland-Urban Interface, but not in open access public lands, and is stronger in northern California than in the state as a whole. This suggests peridomestic exposure to infected ticks may be more important to LD epidemiology in California than recreational exposure, and underlines the importance of the community ecology of LD in determining human transmission risk throughout this LD endemic region of far western North America. More targeted tick and pathogen surveillance, coupled with studies of human and tick behavior could improve understanding of key risk factors and inform public health interventions. Moreover, longitudinal surveillance data could further improve forecasts of disease risk in response to global environmental change.


Introduction
Vector-borne and zoonotic diseases represent a substantial and increasing threat to public health. Many such diseases are caused by pathogens maintained in complex natural transmission cycles by communities of interacting hosts and vectors, the composition of which can influence pathogen amplification and spillover transmission risk [1][2][3]. For example, targeted field and laboratory studies have identified key reservoir hosts of the Lyme disease (LD) agent, Borrelia burgdorferi sensu stricto (hereafter B. burgdorferi), in the eastern [3,4] and western [5][6][7] United States (US). In California, there are multiple reservoir host species that are thought to be important in maintaining B. burgdorferi including the western gray squirrel (Sciurus griseus), dusky-footed woodrat (Neotoma fuscipes), and California kangaroo rat (Dipodomys californicus) [5,[7][8][9]. As such, maintenance of B. burgdorferi in the western US may involve distinct combinations of vector and host species in different regions that may influence whether the pathogen is maintained locally, as well as varying prevalence in sylvatic cycles [6]. Moreover, the relative abundance of these hosts [10,11], their interaction with predators and competitors [12,13], and the rate at which vector ticks feed on these species [14,15] may strongly influence rates of infection [16], and abundance of ticks-key metrics often used to quantify spillover transmission risk.
While the community ecology of Lyme and other zoonotic diseases has important implications for human risk of infection, it is often difficult to link this ecology with actual human transmission or disease incidence. For example, it is a real challenge to replicate detailed investigations of tick-pathogenhost community ecology across large spatial or temporal scales. It is also difficult to link field-level ecological data with human case reporting data that is often aggregated at various geographic levels (e.g. counties) for the purpose of reporting. To bridge this gap in the spatial and temporal scales of field ecology and human case reporting, earlier studies have often relied on proxies of entomologic risk. For example, to capture host community composition and its purported relationship with entomologic risk factors like density of infected nymphal (DIN) ticks, earlier studies have used measures of forest cover or fragmentation due to its effects on abundance of key reservoir host species [17][18][19]. While many studies in the eastern US have linked increased entomologic risk of LD with more highly fragmented forests [17,18,[20][21][22], evidence linking forest fragmentation to human disease incidence is mixed, with null or negative effects often reported [18,19,[23][24][25][26][27][28]. For example, Brownstein et al find lower rates of human LD in more highly fragmented forest habitats, but higher density and infection prevalence of ticks [18]. These contradictory findings suggest that entomologic risk itself-and its ecological proxies like forest fragmentation-may not reliably predict LD transmission in the eastern US [19]. These ambiguous results may also indicate that these proxies, meant to capture key differences in host communities or infection risk (e.g. changes in DIN across landscapes), are confounded by differences in how people are interacting with high risk landscapes [19,29,30]. In the eastern US, a region endemic for LD, it is conceivable that elevated awareness of LD and behavioral avoidance of high risk landscapes may decouple entomologic risk from human disease incidence [19,[30][31][32]. In the far western US where LD is much less common-and public awareness likely much lower [33]-the community ecology and entomologic risk of LD may more reliably predict human disease incidence.
Here, we link the community ecology of reservoir hosts of B. burgdorferi in California-uncovered through targeted, small-scale field studies-with tick infection and human disease incidence to bridge the gap between small-scale community ecology and large-scale LD epidemiology. To do so, we first model the environmental suitability for infected Ixodes pacificus ticks, the primary human vector of B. burgdorferi in California, using a flexible machine learning approach. Along with abiotic and habitat predictors, we use habitat suitability of key reservoir host species, and host community species richness, to both determine the contribution of host species to observed vector infection, and to predict spatially explicit entomologic risk. We then use this modeled distribution of entomologic risk to predict human Lyme disease incidence (LDI) at the zip code level in California. Through this approach, we find: (a) key reservoir host species have the strongest association with infection in ticks; (b) higher infection probability is associated with lower host species richness, but this variable itself is very weakly associated with infection probability; and (c) illustrate that our modeled distribution of entomologic risk is a strong predictor of human LDI in California.

Infected tick suitability: predicted entomologic risk
All species distribution models performed well, with area under the receiver operating characteristic curve (AUC) > 0.91 for all model combinations (table S1). The final model of infected tick suitability selected for further analysis (figure S1; table S2) included climate and vegetation variables, the competent host diversity layer, and individual host species habitat suitability layers for the dusky-footed woodrat, western gray squirrel, and California kangaroo rat (figure 1). We chose this model with a more limited set of host predictor variables to limit complexity of the model and improve interpretation of variable relationships, and because one of our primary goals was to investigate the contribution of key reservoir host species to the predicted distribution of infectious ticks, and to epidemiological patterns of LD. The model yielded similar suitability predictions to other models with a larger number of host species and diversity layers, while having greater model fit and model parsimony.
The fit of this model was high (AUC = 0.94) with the greatest percent contribution coming from the dusky-footed woodrat (27.3%), followed by vegetation cover (20.8%) and minimum temperature in the coldest quarter (11.1%) (table S2). The variables with greatest permutation importance were climatic water The (d) competent host diversity metric was composed of all competent host species layers (defined in table S1), with individual host presence defined as when the species was predicted to be present across all three host models. deficit (29.4%), minimum temperature in the coldest quarter (28.8%), and precipitation in the coldest quarter (14.3%) (table S2).
Marginal and individual response curves indicate that as habitat suitability for dusky-footed woodrats increases, the probability of encountering an infected tick increases substantially (figure S2). While the percent contribution of western gray squirrels (8.8%) and California kangaroo rats (1.2%) was comparatively low, response curves show a similar relationship to that observed for dusky-footed woodrats; namely that as habitat suitability for these host species increases, suitability for infected ticks increases (figure S2). Competent host diversity also had a relatively low percent contribution (6.8%), however, response curves indicate no clear relationship between competent host diversity and infected tick suitability. The individual response curves indicate low suitability for infected ticks when no competent hosts are present, but suitability does not differ systematically from low to high levels of diversity ( figure  S2). Finally, the response curves indicate that the highest probability of encountering an infected tick is in oak woodland ( figure S2).
In alternative model specifications that included overall host community richness (table S1, 'Environmental Competent Host Layers & Competent & Small Mammal & Predator Diversity'), results suggest that higher probability of tick infection is associated with lower host community richness, perhaps suggestive of a dilution effect (figure 2). However, host richness is only very weakly associated with infection probability (percent contribution = 3.4%, permutation importance = 0.9%), suggesting host richness is not a key determinant of tick infection. Due to minimal improvements to overall model fit, and low contribution to model performance, this variable-among other diversity metrics-was dropped from the final model.
Model predictions for infected tick suitability largely fall within the predicted distribution of I. pacificus (figure 3(a)) from earlier studies [34][35][36][37]. Suitability is high in northwestern California, the western foothills of the Sierra Nevada Mountains, as well as (to a lesser extent) coastal central and southern California (figures 3(a) and S1). There is some disagreement between overall predicted suitability for this tick species and infected tick suitability, with some regions of northern California predicted to be suitable for infected ticks, but not for the overall tick distribution model (figure 3(a)). Moreover, predicted suitability for infected ticks is much more restricted in coastal central and southern California and the southern Sierra Nevada foothills than it is for the tick in general (figure 3(a)), whereas overall and infected tick suitability in northern California are comparatively more similar (figure 3(b)).

Contribution of entomologic risk to human disease incidence
To investigate the relationship between entomologic risk for LD and human disease incidence in California, we summarize infected tick suitability by zip code ( figure 4) and, along with sociodemographic and environmental controls, use interval regression to predict zip code level LDI. The results of our main analysis, investigating the effect of entomologic risk as defined by modeled infected tick suitability on human LDI across all of California (tables S3 and S4, figure 5(a)), indicate a significant positive effect of infected tick suitability on zip code level LDI (β = 2.38, P = 0.00048 * * * ). This estimate suggests that for a 10% increase in mean infected tick suitability in a given zip code, we would expect ∼0.238 additional cases of LD per 100 000 population.
When considering infected tick suitability within lands designated as Wildland Urban Interface (WUI) (figure S3) within zip codes, we estimate a similar, though smaller, significant positive effect on human LDI (β = 1.93, P = 0.016 * * ). When we instead consider infected tick suitability within public access lands (figure S3) within zip codes, we estimate a smaller, non-significant positive effect on human LDI (β = 1.00, P = 0.126).
When we run the same models on just the northern sample of zip codes (table S5, figure 5(b)), we estimate very similar effects, with the coefficient estimates slightly larger than in the full state model for the full zip code and WUI models, but smaller for the public lands model. Similarly, when we run the models on the sample of zip codes with at least some predicted suitability for I. pacificus (table   (table S3). S6, figure 5(c)), we find very consistent trends with the full zip code and WUI model estimates falling between those for the full state and northern zip code samples, and the public lands model estimate slightly larger than those from the full state and northern zip code samples, though still non-significant. Moreover, we see very similar patterns in model results when we drop zip codes with missing data for any of the covariates (tables S7-S9), and consistent, significant positive coefficient estimates on infected tick suitability in models that use alternative distributions (table S10).

Discussion
The ecology of vector-borne and zoonotic diseases is rapidly changing as species communities and vector-pathogen-host interactions shift in response to climate and land use change [2,[38][39][40]. Understanding how these ecological changes might influence disease transmission has become a central focus in disease ecology and epidemiology, especially as concerns of novel disease spillover events are elevated in response to the COVID-19 pandemic [41]. Many tick-and other vector-borne pathogens are zoonotic,  (table  S6), each giving coefficient estimates (black) and the 90% (wide bar) and 95% (narrow bar) confidence interval of the estimate for the infected tick suitability (ITS) by zip code, Wildland Urban Interface (WUI), and public lands models. meaning understanding their distribution, emergence, and human risk of transmission will depend on understanding the natural transmission cycles that maintain them [42]. Achieving such understanding requires both deep knowledge of the community ecology of disease, and an ability to quantitatively link that community ecology with human disease incidence. Drawing such a link has been complicated and hotly debated in the context of LD in North America [3,[43][44][45][46], where entomologic risk predicted by community ecology is itself often a poor predictor of variation in human LDI [18, 19, 23-28, 31, 47]. Human behavior and risk avoidance may confound this relationship in the eastern US, where LD awareness is high [19,[29][30][31][32]. However, in contrast we found the community ecology of LD in the far western US-namely the role of key reservoir host species like dusky-footed woodrats-is strongly associated with infection probability in vector ticks, and that this in turn is strongly associated with human disease incidence at the zip code level. Interestingly, our results suggest that while key reservoir host species appear to have an outsized role in determining tick infection, species richness plays, at best, an exceedingly small role in determining LD risk in this region. So, while the relationship between host community richness and tick infection is suggestive of a dilution effect-with lower species richness associated with higher infection probabilityhost richness itself does not meaningfully contribute to our models of entomologic risk of Lyme disease.
The results of our species distribution modelings (SDMs) indicate that habitat suitability for duskyfooted woodrats was the variable most strongly associated with tick infection. Interestingly, the predictions from this model reasonably match those for overall suitability for I. pacificus presented in MacDonald et al [34,35] in northern California. In contrast, infected tick suitability is predicted to be much lower and geographically limited in southern California than overall tick suitability, suggesting lower rates of enzootic transmission in this region [48,49]. While these predictions could be influenced by spatial bias in tick-borne disease surveillance and reported infection locations (figure S4), the more limited suitability predicted for southern California maps onto the reported phylogeography of Borrelia [48][49][50]. Rose et al screened ticks from surveillance efforts across the state of California, yielding B. burgdorferi infections in northern California, but other Borrelia infections in southern California [50]. Thus, the agreement between California Department of Public Health (CDPH) surveillance (e.g. dearth of B. burgdorferi-infected ticks in southern California despite robust tick populations) and our modeled tick population and infected tick suitability suggests we are capturing real differences in entomologic risk for LD that might be associated with the ecology of key reservoir host species.
An additional confounding factor that may also be associated with lower human LDI in southern California is observed differences in host-seeking behavior of juvenile I. pacificus [48,51]. Juvenile ticks, in contrast to northern California [52], are very challenging to collect via dragging or flagging in southern California and may quest for hosts below the surface of the leaf litter [48]. This difference in host-seeking behavior could reduce human-vector encounter rates, particularly for the epidemiologically important nymphal stage [52]. Teasing the contribution of host-seeking behavior apart from predicted suitability for infected ticks in this study is a real challenge, and this could in part explain lower rates of LD transmission in southern California. However, the reported phylogeography of Borrelia in California [50] and our modeled infected tick suitability suggest a low probability of transmission in this region, regardless of differences in juvenile tick host-seeking behavior.
We find our predicted suitability for infected ticks is a significant predictor of human LDI. While our coefficient estimate suggests a relatively small increase in LDI for a 10% increase in infected tick suitability (<1 additional case per 100 000 population), given the low rates of LD reported throughout California, this is not insubstantial [53]. With suitability for I. pacificus populations projected to increase in many regions of California under projected climate and land use change [35,36,54], this could lead to meaningfully higher rates of LD transmission under some future scenarios. Interestingly, while infected tick suitability in the WUI had a similar positive and significant association with LDI-suggesting peridomestic exposure to infected ticks is important in Californiasuitability in public access lands and possible recreational exposure was not significantly associated with LDI. Recent community science surveillance [55] suggests the opposite, that most exposure to hostseeking I. pacificus in California is in recreational settings. This discrepancy may be resolved with more information about rates of recreational use of these open access lands. For example, if recreation activity is high in low-risk public access lands and low in high-risk public access lands, this could explain our result. It may also be the case that those recreationally exposed to host-seeking ticks have higher LD awareness and are more likely to perform tick checks. This may lead to higher rates of transmission in peridomestic settings despite higher exposure to ticks in recreational settings. Finally, it may also be the case that populations living in high risk peridomestic settings also have high recreational exposure in those same zip codes. Higher resolution (e.g. individual level) data on risk exposure would be necessary to tease these possible scenarios apart.
While our results outline a parsimonious path from host community ecology to spatially explicit entomologic risk and broad-scale human LD epidemiology, there are important limitations to our approach. Namely, we were limited by available data to producing a single prediction of infected tick suitability, which lags available LDI data, and to crosssectional analyses of LDI. First, our model of entomologic risk relied on a relatively small number of spatially clustered observations of infected ticks-56 total (after dropping highly spatially clustered observations from a total of 90) across the years 2001-2015. Despite model validation suggesting that our model performed extremely well with very limited prediction of false positives or negatives (AUC > 0.91 across all models), additional observations, particularly in regions with sparse data (e.g. southern California), would no doubt improve the model predictions and inference about important variables associated with entomologic risk. However, despite decades of statewide surveillance and testing of thousands of ticks by the CDPH [50], only a single tick has tested positive for B. burgdorferi in southern California.
This surveillance, and its agreement with patterns of human disease in the state [50], suggest that our infected tick observations are accurately capturing the geographic distribution and spatial clustering of entomologic risk of LD in this region.
Moreover, the finding that suitability for duskyfooted woodrats was the most important variable in predicting the distribution of entomologic risk aligns with decades of field-and laboratory-based investigations [5,56], which have identified the dusky-footed woodrat as a key reservoir host in the western US. While other species have been shown to be competent hosts for B. burgdorferi [6][7][8][9]57] in this region, many of them have more restricted geographic distributions, or behavior that may preclude interactions with questing ticks in southern California (e.g. the primarily arboreal western gray squirrel). Therefore, while some of these other species may play important roles in enzootic transmission cycles in specific regions of the western US (e.g. northwestern California)-which may be uncovered through more targeted regional analyses-from our analysis it is plausible that the dusky-footed woodrat is playing an important role in limiting the geographic distribution of entomologic risk of LD broadly in California.
Second, there were not sufficient infected tick observations from multiple distinct time periods to model changes in infected tick suitability over time. We were instead constrained to pooling our observations to produce a single model of entomologic risk. This model reasonably matches the temporal scale of the WUI and human population data (2010), but lags our high resolution estimates of LDI at the zip code level (averaged across 1993-2005). This temporal mismatch between our measures of entomologic risk and distribution and sociodemographic characteristics of human population on the landscape on the one hand, and LDI on the other, may confound our results. However, from routine CDPH surveillance [58], the distribution and burden of human LD at the county level has not changed meaningfully over time so LDI may also be reasonably consistent over time at the zip code level. Longitudinal, or 'panel,' data when available is very powerful in that it would allow us to investigate how changes in entomologic risk over time are associated with changes in LDI over time. In this context, we would be able to compare zip codes to themselves over time-effectively controlling for confounding variation unique to zip codes, rather than comparing zip codes to each other-which no doubt differ in important unmeasured ways that could confound the relationship between entomologic risk and human disease [59]. Here we include socio-demographic control variables to capture some of this confounding variation, as well as county dummy variables to control for unobserved/unmeasured factors that vary regionally and might influence the relationship between infected tick suitability and LDI (e.g. tick host-seeking behavior). Additionally, spatial scale of analysis can also strongly influence the results of epidemiological analyses, for example highly spatially aggregated data may result in an ecological fallacy [60,61]. However, due to uncertainties surrounding where exposure to infected ticks led to LD transmission for reported cases, some aggregation (e.g. to zip codes) is necessary, as bias may also be a problem with individual level data in this contextfor example, if location of infected tick exposure is unknown, yet the resulting case is reported at the individual household level.

Conclusion
Linking the community ecology of vector-borne and zoonotic diseases with human disease incidence is fundamentally important as the ecology of infectious diseases shifts rapidly with ongoing global change [12,38,62]. Yet, doing so has been a real challenge in many systems, including LD [23,[25][26][27]31] due to challenges of scale and confounding by human behavior [19,[29][30][31][32]60], among others. Here we show, in a LD endemic but low incidence region of western North America, that entomologic risk is strongly influenced by key reservoir hosts, as identified through targeted field and laboratory studies of the community ecology of LD in Californiabut not host richness. Importantly, in contrast with the results of many studies from the eastern US, we show that this modeled spatially explicit entomologic risk is a strong predictor of human disease incidence in California at the spatially refined zip code level. This linking of the community ecology of LD with human disease incidence in this region holds promise for future prediction; as the distribution of the tick vector and vertebrate hosts shift in response to climate and land use change, these ecological changes may yield predictable changes in future LD transmission in California and western North America. With more targeted and standardized tick and pathogen surveillance, along with studies of human and tick behavior, particularly in low incidence regions of the western US, understanding of key risk factors could be improved to help target public health interventions. Moreover, this increasingly available longitudinal surveillance data-coupled with Earth observation data, climate forecasts and models of future vertebrate host distributions-could further improve understanding and more rapid forecasts of changing risk and transmission in response to global environmental change.

Modeling suitability for infected ticks
To model suitability for B. burgdorferi-infected I. pacificus in California, and capture spatially explicit entomologic risk, we employ a SDM approach. We use the maximum entropy (MaxEnt) algorithm, as it handles presence only data well [63][64][65][66]. True absences of infection in tick populations are challenging to identify, as it takes many ticks to be screened for infection from a single location to rule out the possibility of local infection. This is especially true in California where infection rates tend to be low [48,49,67,68], even where enzootic transmission is considered to be endemic [7,[69][70][71]. SDMs for infected tick suitability were built by extending the models of I. pacificus suitability presented in [34,35]. In addition to climate and habitat layers used in those models [34,35], we tested how inclusion of host distribution and diversity metrics influenced predicted suitability for infected I. pacificus ticks.
Infected tick observations were primarily drawn from CDPH surveillance [50]. These data were supplemented with other published studies identified from a literature search to better capture the geographic breadth of I. pacificus in California [48,57,68,69,[72][73][74][75][76]. While B. burgdorferi may be maintained in enzootic cycles involving other tick species [5,49,77], we chose to model B. burgdorferi-infected I. pacificus to more directly capture human transmission risk. A total of 90 infected I. pacificus presence locations were identified ( figure S4).
Climate data layers were sourced from the California Basin Characterization Model [78]. The environmental variables previously determined to be the most ecologically relevant for tick suitability [34,35] were used in our analysis: actual evapotranspiration, climatic water deficit, precipitation, winter precipitation, maximum temperature, maximum temperature in the hottest quarter, and minimum temperature in the coldest quarter. Vegetation cover was derived from the California Department of Forestry and Fire Protection raster layer compiling vegetation data from 1990-2014 (http://frap.fire.ca.gov/ mapping/gis-data/). This vegetation data does not capture ecological disturbances, such as wildfires, but rather the long-term average conditions over this period, as does the climate data. All layers were resampled or aggregated to a 90 m resolution.
To account for the role that host communities play in maintenance of B. burgdorferi, we included data on the range and predicted habitat of key host species in the SDMs. Geographic layers of species range and predicted habitat were retrieved from the United States Geological Survey (USGS) gap analysis program (GAP) Analysis (https://gapanalysis.usgs. gov/apps/species-data-download/). Additional habitat data was sourced from the California Wildlife Habitat Relationships (CWHR) database (https:// wildlife.ca.gov/data/cwhr). Data was downloaded for potential hosts and other associated species (table S11). The GAP and CWHR spatial files were processed for individual species and for various diversity metrics, using the resolution (90 m) and dimensions of the environmental variables (SI Methods) [34,35]. Individual host distribution layers include the western gray squirrel (S. griseus), dusky-footed woodrat (N. fuscipes), California kangaroo rat (D. californicus), and other relevant species (SI Methods). In addition, three diversity layers were calculated from host species binary distribution layers: competent host diversity (competent hosts for B. burgdorferi in California), predator diversity (large and mesocarnivore species), and small mammal diversity (small mammal hosts). Individual species included in each diversity layer are listed in table S11. We used our infected tick presence locations-56 in the final sample, obtained after subsetting the full dataset to include only one observation per 90 m pixel to avoid biasing the model [34]-and 17 710 randomly selected background points to extract all environmental and host spatial data.
We constructed models for various combinations of the above variables (table S1). Our SDM approach followed methodology reported in [34,35], developing our infected tick SDMs using the 'dismo' package [79] to run MaxEnt in R version 3.6.3 (RStudio Team 2020) (SI Methods). The final model selected for further analysis included all climate and vegetation layers, the competent host diversity layer and individual species layers for the western gray squirrel, duskyfooted woodrat, and California kangaroo rat as categorical layers (Environmental Competent Host Layers Diversity model; table S1). We selected this model based on model performance, including maximizing the AUC [80], a combination of a priori ecological understanding of this system (SI Methods) [5,7,8,49,56,81] and minimizing the degree of autocorrelation in the set of environmental variables included for the sake of interpretation of model output [82].

Metrics of entomologic risk
Binary suitability maps masked by land cover were created from the final model predictions to summarize average suitability for infected ticks by zip code and other spatial units (SI Methods). We calculated mean suitability within zip codes by subsetting these binary suitability layers (infected tick and overall tick suitability-SI Methods) by zip code boundaries for California in 2010 [83]. We used 2010 spatial boundaries to reasonably match our infected tick occurrence records (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015) and LDI reported in Eisen et al (mean annual incidence of endemic LD for the period 1993-2005, details below) [53]. We also calculated the total area of each zip code and the 2010 population size of each zip code as reported by the census.
We were further interested in infected and overall tick suitability within the WUI areas of each zip code, where peridomestic exposure to ticks may be elevated [19,29]. The WUI represents a transitional area between wildland and human development defined as intermix, where wildlands are interspersed with human development, or interface, where human development is adjacent to large areas of wildland [84]. WUI polygon layers for 2010 were downloaded from the Silvis lab at the University of Wisconsin (http://silvis.forest.wisc.edu/data/wuichange/). We used the binary suitability maps created above along with the zip code and WUI vector layers to calculate the average suitability within areas classified as WUI within each zip code, as well as area and population metrics (SI Methods).
Finally, we were interested in infected tick suitability within areas designated as open access public lands within each zip code, as a measure of potential recreational exposure to infected ticks [36,55]. A recent community science project found that in California, ticks were most commonly encountered while recreating in natural areas, followed by peridomestic and then man-made environments [55]. While participation among the public in this project may have been biased toward more affluent and educated communities, particularly in the San Francisco Bay Area, the results suggest recreational exposure may be at least as important as peridomestic exposure in California [55]. We summarized infected and overall tick suitability in potential recreational settings defined as 'open access' public lands in the 2020 California Protected Areas Database shapefile (www. calands.org/). We used the same approach to calculate risk in open access public lands as for WUI summarizations, above.
Sociodemographic information for each California zip code was retrieved from the California Office of Environmental Health Hazard Assessment, CalEnviroscreen 3.0 data (https://oehha.ca. gov/calenviroscreen/report/calenviroscreen-30). This dataset includes demographic information (e.g. population by age and ethnic group) by census tract in California, as well as other relevant sociodemographic and health indicators (e.g. rates of cardiovascular disease, education, poverty).
Zip code level LDI was calculated by Eisen et al 2006b as the mean annual incidence of endemic LD for the period 1993-2005, dropping cases within that date range where the likely county of exposure did not match county of residence, or for which residential zip code information was missing. Eisen et al reported LDI by zip code in the following categories: 0, 0.01-1, 1.01-5, 5.01-10, 10.01-20, 20.01-50, and >50 per 100 000 population [53].
The final dataset, summarized by zip code, includes the following variables: (a) mean infected tick and overall tick suitability by zip code, by WUI interface and intermix areas within zip codes, and by California open access public lands within zip codes, (b) total zip code area and population, and the proportion of area and population within the WUI and within public access lands, (c) available sociodemographic information, and (d) categorical LDI.

Statistical analysis of Lyme disease incidence
Since our outcome variable of interest, zip code level LDI, is interval censored-where we know the ordered category into which each observation falls, but not the exact value-we chose to model LDI using interval regression. Interval regression is a generalization of censored regression that can handle various types of interval censoring (e.g. interval, left-or rightcensored) in the outcome variable [85].
We are interested in investigating the role of the community ecology of LD in California in determining human LDI. To do so, we use our modeled infected tick suitability variable-encapsulating information about the habitat suitability of key reservoir host species, climate, and habitat conditions. In addition to these ecological factors, human activity, behavior, and settlement patterns are also key to potential exposure to infected ticks, so we include variables describing sociodemographic characteristics [5,38,[86][87][88][89][90][91], wildland-urban interface [29,34], and open access public lands [36,55] to capture these factors (SI methods). Finally, in the context of this crosssectional analysis, regional variation in long-term average climate conditions and tick host-seeking behavior may also be important to tick abundance and activity, as well as vector-host contact, and thus potential human exposure to infected ticks [51,69]. As such, we include county dummy variables to capture this regional variation, and other roughly constant characteristics of counties that might influence LDI. We calculate robust standard errors clustered at the county level to account for heteroskedasticity and within-county correlation of the errors. All interval regression models were run using the survival package [92] in R version 4.0.2 (RStudio Team 2020), using a Gaussian distribution. We ran multiple models with different specifications (e.g. using different data subsets) to assess the robustness of our results (SI Methods).

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.

Acknowledgments
We thank three anonymous reviewers and members of the Larsen lab at UCSB for their thoughtful feedback on this study, as well as the California Department of Public Health (CDPH) and CDPH biologists for the Ixodes pacificus, Borrelia burgdorferi and Lyme disease surveillance efforts on which this work relies.

Funding
A J M was supported by the National Science Foundation and Fogarty International Center (DEB-2011147). S M was supported in part by the Four Year Fellowship program at the University of British Columbia, Vancouver. S S was supported by the Training Grant Program of the Pacific Southwest Regional Center of Excellence for Vector-Borne Disease funded by the U.S. Centers for Disease Control and Prevention (Cooperative Agreement 1U01CK000516).