Heterogeneous local dynamics revealed by classification analysis of spatially disaggregated time series data

Time series data provide a crucial window into ecological dynamics, yet their utility is often limited by the spatially aggregated form in which they are presented. When working with time series data, violating the implicit assumption of homogeneous dynamics below the scale of aggregation could bias inferences about underlying processes. We tested this assumption in the context of the 2015-2016 Zika epidemic in Colombia, where time series of weekly case reports were available at national, departmental, and municipal scales. First, we performed a descriptive analysis, which showed that the timing of departmental-level epidemic peaks varied by three months and that departmental-level estimates of the time-varying reproduction number, R(t), showed patterns that were distinct from a national-level estimate. Second, we applied a classification algorithm to six features of cumulative incidence curves, which showed that variability in epidemic duration, the length of the epidemic tail, and consistency with a normal distribution function made the greatest contributions to distinguishing groups. Third, we applied this classification algorithm to data simulated with a stochastic transmission model, which showed that group assignments were consistent with simulated differences in the basic reproduction number, R0. This result, along with associations between spatial drivers of transmission and group assignments based on observed data, suggests that the classification algorithm is capable of detecting differences in temporal patterns that are associated with differences in underlying ecological drivers. Overall, this diversity of temporal patterns at local scales underscores the value of spatially disaggregated time series data.


INTRODUCTION
Time series have been used for many years to make inferences about processes that shape the dynamics of a wide range of ecological systems (Turchin & Taylor 1992). This long history has resulted in appreciation of a number of common challenges for time series analysis (Hastings 2010). One such challenge is disentangling the effects of multiple interacting forces, which can include both extrinsic forces, such as weather, and intrinsic forces, such as density-dependent feedbacks (Bjørnstad & Grenfell 2001;Koelle & Pascual 2004). An even more fundamental challenge lies in defining the time series in the first place, especially with respect to space (Levin 1992). The question is, at what spatial scale should ecological data be aggregated for time series analysis?
In practice, the spatial scale at which data are aggregated to form a time series is more often dictated by the scale at which data are available than by the scale that is optimal for inference or prediction. For example, during the recent invasions of chikungunya virus (CHIKV) and then Zika virus (ZIKV) across the Americas, the Pan American Health Organization published weekly case reports aggregated nationally. Despite an abundance of evidence that chikungunya and dengue virusesanother virus transmitted by Aedes aegypti mosquitoesare characterized by spatially focal transmission (Salje et al. , 2017, applications ranging from estimation of time-varying reproduction numbers (Ferguson et al. 2016) to forecasting (Escobar et al. 2016, Del Valle et al. 2018 have utilized data aggregated at national scales for countries as vast and spatially heterogeneous as Brazil and Mexico. Unlike most other countries in the Americas, routine surveillance of Zika in Colombia was reported on a weekly basis in each of its 1,123 municipalities during the 2015-2016 epidemic (INS 2017). Although such case reports are underestimates of the true extent of transmission of many infectious diseases, particularly those with high proportions of asymptomatic infections, they still provide a uniquely valuable resource given the paucity of publicly available data at similar scales in most countries (Chretien et al. 2016). Such data are particularly valuable for Zika, given that a range of spatial scales are relevant for activities related to its prevention and control. On the one hand, vector control activities are planned and budgeted on multiple administrative levels but must be targeted on a very local level. On the other hand, communications, surveillance, and possible vaccination programs are generally planned and implemented only on larger administrative scales.
Our goal in this study was to utilize this unique data set on the ZIKV invasion of Colombia to perform a case study on the characteristics of temporal patterns at different spatial scales in the context of an emerging infectious disease. To do so, we took a three-part approach. First, we performed a descriptive analysis of time series of weekly case reports at three distinct scales in Colombia: national, departmental, and municipal. Second, we performed a classification analysis of proportional cumulative incidence curves at departmental and municipal scales to identify distinct patterns of temporal dynamics at each of these scales. Third, we repeated the classification analysis for data simulated with a mechanistic model of ZIKV transmission to determine the extent to which distinct temporal patterns may reflect distinct ecological drivers. All data and code used in this study are available at https://github.com/TAlexPerkins/TimeSeriesSpatialScale.

Data
The focal point of our analysis was a collection of municipal-level time series of weekly Zika case reports at the municipal level in Colombia spanning August 2015 through September 2016. The primary source of these data was the Colombian National Institute of Health (Instituto Nacional de Salud, INS), which made official weekly reports of the cumulative numbers of suspected and confirmed Zika cases available in real time during the epidemic (Boletín 2018). The version of these data that we used in this analysis were processed in a manner that addressed inconsistencies between data reported at municipal and departmental scales, as described by Siraj et al. (2018). Specifically, to correct for the fact that the total of municipal-level data from 2015 (3,875 cases) was less than the total of national-level data from 2015 (11,712), we imputed the 7,837 missing cases at the municipal level for 2015 by multiplying each municipality's weekly incidence in 2015 by a factor required to achieve better known cumulative totals for each municipality as of the first week of 2016.

Descriptive analysis of weekly case reports
We performed two preliminary analyses of differences in weekly case report patterns at different scales of spatial aggregation. First, we generated a bar plot of national case reports color-coded by which of 33 departments those national cases arose from. Likewise, for each of those departments, we generated a bar plot of departmental case reports color-coded by which of its municipalities those departmental cases arose from. Second, we made estimates of the timevarying effective reproduction number, R(t), for each time series. Following Ferguson et al. (2016), we used the EstimateR function from the EpiEstim library (Cori 2013) in R to estimate R(t) for each time series based on the method introduced by Cori et al. (2013). In brief, this method is based on an assumed distribution of the serial interval (i.e., the timing between onset of primary and secondary cases) that can be used to estimate the number of cases in the previous generation that gave rise to those observed in the present generation, thereby enabling estimation of R(t).

Classification analysis of cumulative incidence curves
We focused our analysis on cumulative, rather than raw, incidence because of the extreme variability in raw incidence patterns in this data set. With raw incidence, time series with a small number of cases appear extremely jagged, and temporal patterns would be difficult to extract. With proportional cumulative incidence, vastly different temporal patterns are more readily comparable, because they all begin at 0 and end at 1 but arrive there by different paths. Others (King et al. 2015) have criticized the use of cumulative incidence data from epidemics, although these criticisms mostly pertain to parameter estimation and forecasting, neither of which we do here. Rather, our goal was to perform a descriptive analysis of diversity in the temporal patterns of an epidemic as viewed from different perspectives spatially.
The cumulative incidence curves that we examined were proportional such that they all reached 1 at the time the last case was reported in a given area. Mathematically, for weekly reported Zika incidence Ii,t in location i in week t, we calculated proportional cumulative incidence as We excluded 2/33 departments and 307/1,123 municipalities from our analysis that reported no Zika cases.
As a basis for classifying cumulative incidence curves, we defined six features of these curves that we hypothesized represent dimensions in which curves from different areas vary: 1.
: Weeks between first and last non-zero Ci,t; 6. 0 : Weeks with Ci,t = 0 between first and last non-zero Ci,t. Four of these features were defined in reference to cumulative normal density curves, ̂( ), that we fitted to each Ci,t. This involved estimating mean and standard deviation parameters of ̂( ) for each Ci,t on the basis of least squares using the optim function in R. We chose these features because they provided a way to quantify the duration of local epidemics (small , short = short epidemic), to capture whether epidemics appeared strongly locally driven (low 2 , large 0 = sporadic transmission fueled by importation), and to characterize shapes that deviated substantially from those predicted by simple epidemic models ( 5% and 95% near zero = "SIRlike" epidemic). Although these idealized scenarios motivated the selection of these features, the fact that all six features were calculated for each Ci,t meant that we were able to capture a wide range of patterns in between these extremes.
We explored variation in Ci,t at both departmental and municipal scales. To describe how variation in Ci,t curves at those scales was distributed across the six-dimensional feature space, we performed a partitioning around medoids (PAM) clustering analysis (Reynolds et al. 2006) on centered and scaled values of the features using the pam function in the cluster library (Maechler et al. 2017) in R. This algorithm identifies medoids of k groups that minimize the sum of distances between each medoid and all group members. We performed this analysis for values of k ranging 2-10 and compared groupings for different values of k on the basis of their average silhouette values. A silhouette value describes how much more dissimilar one point is from points in the next most similar group compared to points in its own group (Rousseeuw 1987). An ideal classification would be indicated by silhouette values for data points in all groupings close to 1. Silhouette values nearer to or below 0 indicate that points do not cluster well with the group to which they are assigned.

Elucidation of driving processes
To aid in the interpretation of the classification analysis of empirical patterns of temporal incidence, we performed identical analyses of simulated patterns of temporal incidence. The value of doing so is that it provides a form of validation of the classification analysis: i.e., demonstrating that it is capable of identifying groups that correspond to known differences in underlying ecological processes. For this analysis, we defined groups of municipalities on the basis of whether the simulated R0 value for a municipality was above or below 1, given the significance of this threshold for determining invasion outcomes. We performed classification analyses on 100 data sets simulated with a stochastic model of ZIKV transmission developed by Ferguson et al. (2016) and tailored to Colombia as described in Appendix S1. This particular model was chosen with the goal of simulating cumulative incidence curves at the municipality level that would be plausible with respect to the six features used in the classification analysis. As a result, the model has limitations that mean that it may not be appropriate for analyses of other, more complicated aspects of spatiotemporal transmission dynamics.
Although the analysis of simulated data provides a test of the algorithm, it does not facilitate inference of whether there truly are differences in ecological processes underlying the empirical data. Doing so convincingly would require more comprehensive analyses, ideally involving data about variables assumed to play an intermediary role in a hypothesized causal pathway between environmental variables and disease incidence (Metcalf et al. 2017). To explore whether there might at least be perceptible associations between ecological factors and groups identified by the classification analysis, we performed a series of one-way analyses of variance at both departmental and municipal scales. Specifically, our objective was to examine whether mean values of relevant environmental variables differed across these groups. Variables that we examined included R0 values derived from Perkins et al. (2016) as described in Appendix S1, and seven variables compiled for municipalities and departments in Colombia by Siraj et al. (2018): Ae. aegypti occurrence probability, two measures of normalized difference vegetation index (NDVI), mean temperature, percent urban land cover, human population, and the gross cell product (GCP), a spatially disaggregated version of the gross domestic product economic index.

Descriptive analysis of weekly case reports
As a whole, the temporal pattern at the national level was consistent with what could be construed as a typical epidemic trajectory, marked by an increase over approximately five months, a peak around the beginning of February 2016, and a steady decline thereafter over a period of approximately eight months (Fig. 1A). Under a standard set of assumptions about epidemic dynamics, this pattern can be used to estimate the temporal trajectory of the effective reproduction number, R(t) (Cori et al. 2013). Applying this technique at the national level yielded estimates of R(t) that began high (range: 1.5-3.5 for the first four months) and gradually declined below 1 by the time the epidemic concluded ( Fig. 1A), all of which is consistent with standard expectations for an epidemic of an immunizing pathogen in an immunologically naive host population. Examination of temporal incidence patterns for each of the four largest departments in terms of total incidence (Valle del Cauca, Norte de Santander, Santander, Tolima) showed that patterns at the departmental level were quite different than those at the national level. First, the timing of peak incidence in the departments in Fig. 1B-1E varied by around three months. Second, the shapes of the incidence patterns in those departments varied, with Valle del Cauca and Santander (Fig. 1B & 1D) showing high incidence sustained over a period of several months and Norte de Santander and Tolima (Fig. 1C & 1E) showing sharper peaks trailed by relatively low incidence for several months after.
This high degree of variability in temporal incidence patterns had substantial impacts on estimates of R(t). At the national level, R(t) estimates never exceeded 3.5, whereas in Santander R(t) was estimated to exceed 5 (Fig. 1D) and in Valle del Cauca it was estimated to exceed 10 ( Fig. 1B), due in both cases to more rapid increases in incidence at the departmental level than the national level. In Norte de Santander, R(t) appeared to twice fall well below 1 but then quickly rise back above 1 (Fig. 1C).
Examination of temporal patterns at the municipal scale revealed even more variability in temporal patterns than at the department level. In the department of Norte de Santander (Fig.  1C), for example, it was clear that one municipality dominated the departmental pattern (Fig.  1F). The municipalities with the second and third highest incidence both experienced short, unimodal patterns of incidence during the first two months, but incidence patterns thereafter were mostly low and erratic ( Fig. 1G & 1H). Other municipalities in the department had only low, erratic incidence with no sign of a distinct epidemic (e.g., Fig. 1J). With the exception of the first few weeks of transmission, estimates of R(t) at the municipal level were characterized by erratic fluctuations and much larger uncertainty than was apparent at the departmental or national level.

Classification analysis of cumulative incidence curves
At the departmental level, there was only modest clustering overall, with the highest average silhouette value corresponding to two groups (0.256), a slightly lower value for three groups (0.254), and falling no lower than 0.201 for up to ten groups (Fig. S1). and 95% were the features that were most important for distinguishing two groups (Fig. S2), and contributed further to distinguishing three groups (Fig. S3). Differences in were associated with a difference of approximately two months in the time elapsed between the attainment of 5% and 80% of cumulative incidence (Fig. 2, top left: blue longer than red), and differences in 95% were associated with a difference of approximately two months in the time elapsed between the attainment of 80% and 99% of cumulative incidence, but for different groups (Fig. 2, top left: red longer than blue). Overall, this meant that the time elapsed between attainment of 5% and 99% of cumulative incidence for both groups was similar, but with one group experiencing epidemics that were fast initially but slow to finish and another group experiencing epidemics that were slower initially but finished more quickly. These patterns were clearest for the curves associated with the medoid of each group (Fig. 2, top) but were generally apparent for the curves associated with the groups as a whole (Fig. S4). Spatially, groups tended to cluster along northern, central, and southern strata (Fig. 3, left), with incidence-weighted cartographs showing that the epidemic was mostly dominated by distinct northern and central strata (Fig. 3, top right).   (left), population (center), and incidence (right). Department assignments to two (top) and three (bottom) groups are indicated by color, with transparency inversely proportional to silhouette value. The one department (Bogotá) with zero incidence is indicated in black and given a weight equivalent to 1/5 of a case to allow for its inclusion in the right column.
There was somewhat stronger clustering at the municipal level, with the highest average silhouette value corresponding to three groups (0.352), somewhat lower values for five and six groups (0.334, 0.326), and no lower than 0.297 for up to ten groups (Fig. S5). and were the features that were most important in distinguishing two groups (Fig. S6), 95% made additional contributions to distinguishing three groups (Fig. S7), and 2 contributed to distinguishing four groups (Fig. S8). Proportional cumulative incidence curves for the group with short and small were the most visually distinct group and remained relatively consistent regardless of the number of groups (Fig. 2, bottom). Some differences among the other groups were also apparent in the proportional cumulative incidence curves, with some having a long tail (Fig. 2, bottom middle: green) or two discrete jumps (Fig. 2, bottom middle: blue). The timing of discrete jumps varied across municipalities, but curves within a group otherwise resembled the curve associated with the medoid for that group (compare Fig. 2 bottom with Fig. S9). Spatially, departments generally consisted of a mixture of municipalities from different groups, and the prominence of some groups in the cartograms varied depending on whether the cartograms were weighted by area, population, or incidence (Fig. 4). The cartograms weighted by population showed that a sizeable portion of the population lives in cities that had no reported cases, such as Medellín and Bogotá (Fig. 4, black in the center column). Among municipalities that did have reported cases, the cartograms weighted by incidence showed that a relatively large proportion of reported cases came from municipal-level epidemics characterized by large and (Fig. 4, right column).

Figure 4.
Cartograms at the municipal level weighted by area (left), population (center), and incidence (right). Municipality assignments to two (top), three (middle), and four (bottom) groups are indicated by color, with transparency inversely proportional to silhouette value. Municipalities with zero incidence are indicated in black and were given a weight equivalent to 1/5 of a case to allow for their inclusion in the right column.

Elucidation of driving processes
We focused our analysis of simulated data at the municipal level given that the simulation model was not equipped to simulate transmission between municipalities, which is likely important for recreating departmental-level patterns. Overall, our model parameterization assumed that R0 > 1 in 34.6% of municipalities. A total of 12.6% (range: 10.4-14.1%) of municipalities had zero simulated cases, with 99.0% (range: 97.0-100.0%) of those having R0 < 1.
Out of 100 simulated datasets, the classification algorithm selected two groups eight times, three groups 80 times, and five and six groups four times each. Average silhouette value was 0.313 (range: 0.288-0.347) when there were two groups and 0.327 (range: 0.291-0.352) when there were three groups (see Fig. S10 for a representative silhouette plot from a randomly selected simulated dataset). Although this indicates a modest preference of the algorithm for three groups, we focused subsequent analyses on the two-group classification due to our desire to evaluate the correspondence between groups selected by the classification analysis and groups defined by R0 above or below 1.
With the two-group classification, 99.1% (range: 90.3-100.0%) of municipalities with R0 > 1 were placed into the group characterized by larger and . Of the municipalities with R0 < 1, 74.0% (range: 36.3-80.5%) were also placed into that group, with the others placed into the group with smaller and (see Fig. S11 for an example from a randomly selected simulated dataset). When municipalities were classified into three groups, a new group characterized by moderately low and and negative 95% contained 18.8% (range: 0.2-36.1%) of municipalities with R0 > 1 and 44.7% (range: 23.0-56.5%) with R0 < 1 (see Fig. S12 for an example from a randomly selected simulated dataset). In the presence of this third group, 79.9% (range: 63.4-89.7%) of municipalities with R0 > 1 and 32.1% (range: 22.8-38.8%) with R0 < 1 were placed into the group characterized by larger and . Visual inspection of five simulated datasets showed that the proportional cumulative incidence curves of municipalities placed in the group characterized by large and generally resembled the curves of municipalities with R0 > 1 (Fig. 5, red). In contrast, proportional cumulative incidence curves of municipalities with R0 < 1 were more diverse than those placed in the group characterized by low and (Fig. 5, blue). A similar pattern was apparent spatially, with municipalities placed in the group characterized by large and generally overlapping with municipalities with R0 > 1, but municipalities with R0 < 1 frequently placed in the group characterized by large and (Fig. 6).

Figure 5.
Proportional cumulative incidence curves at the municipal level from five randomly selected simulated datasets. The left two columns show two different groups classified by the curve classification algorithm, and the right two columns show two different groups defined by whether those municipalities have a R0 above or below 1.

Figure 6.
Cartograms at the municipal level weighted by area based on five randomly selected simulated datasets. Each municipality's status as having R0 > 1 (red) or R0 < 1 (blue) is indicated in the top left panel. In each of five simulated datasets shown in the other panels, municipality assignments to two groups are indicated by color, with transparency inversely proportional to silhouette value.

DISCUSSION
Temporal incidence patterns play a vital role in inferring ecological dynamics and drivers thereof. By analyzing data from the 2015-2016 Zika epidemic in Colombia, we showed that temporal patterns can appear very different depending on the spatial scale at which data are aggregated. Whereas national-level dynamics appeared to follow a unimodal pattern consistent with behavior of standard epidemic models, departmental-level dynamics were somewhat more varied and municipal-level dynamics were the most varied. Combining these observations with a formal classification of temporal incidence patterns and a model-based exploration of mechanisms capable of generating those patterns, we deduced that there is distinct variation in temporal patterns subnationally and that much of that variation may be driven by spatial variation in local conditions. Associations between group assignments and relevant environmental variables were most apparent at the municipal scale, consistent with the hypothesis that linkages between temporal dynamics and underlying ecological processes are strongest at fine spatial scales.
Similar to our findings of differing dynamics at municipal and departmental scales, theoretical analyses of a range of ecological models have proposed that dynamics approach deterministic behavior as spatial scales grow larger and data become increasingly more aggregated (Rand & Wilson 1995). Methods based on long-term dynamics have been proposed for identifying the scales at which behavior transitions from stochastic to deterministic in models of plant competition and predator-prey interactions (Keeling et al. 1997;Pascual & Levin 1999). Epidemics, however, are inherently transient in nature, leaving open the question of how best to define characteristic spatial scales in that context. It is certainly the case that the data from Colombia that we examined displayed greater stochasticity at finer spatial scales. At the same time, the greater variability in temporal patterns that we observed at finer scales suggests that models that aspire to a deterministic representation of behavior at coarser scales must account for spatial structure at finer scales. Indeed, a recent attempt to fit a national-scale transmission model to national-scale time series of Zika case reports from Colombia showed that ignoring subnational spatial structure inhibited that model's fit to the data (Shutt et al. 2017). A theoretical exploration of similar issues concluded that the scale at which spatial structure must be modeled explicitly is expected to vary by pathogen and geographic context, with less mobile pathogens requiring explicit spatial representation at finer scales (Mills & Riley 2014).
Both stochasticity and spatial interaction are expected to contribute to variability in temporal dynamics at local scales (Durrett & Levin 1994). For some municipalities, temporal incidence patterns appeared to be dominated by stochasticity (e.g., those with discrete jumps). For others, there were implications for a role of spatial interaction (e.g., those with two sharp increases or a long tail). Whereas our simulation model was realistic with respect to demography and the inclusion of spatiotemporal variability in local transmission, it made the very simplistic assumption about spatial interaction that importation patterns have identical timing and magnitude in all municipalities. This may have caused municipalities with R0 < 1, particularly those with larger populations, to display patterns that simply reflected the national trend used to drive importation. Analyses of subnational spatiotemporal dynamics in a range of contexts show that importation patterns vary substantially over time and as a function of regional connectivity or being positioned on an international border (Grenfell et al. 2001;Cummings et al. 2004;Dalziel et al. 2013;Rodriguez-Morales et al. 2016). Future work that includes more realistic spatial interaction among subnational units would be helpful for resolving the hypothesis proposed here about the importance of spatial interaction in shaping temporal patterns at each of the spatial scales that we considered.
Our analysis identified intriguing differences in temporal patterns across spatial scales, but at the same time there are important limitations to acknowledge. First, although our conclusions are not dependent on the magnitude of transmission, they do require that patterns in case report data reflect patterns in underlying transmission. With a high rate of asymptomatic infection and the likelihood of extensive variability in reporting rates ), particularly at the municipal level, some caution is due. Second, our ability to ascribe meaning to the groups identified by our classification algorithm was limited by the simplicity of our simulation model, particularly with respect to spatial interaction. Consequently, while this analysis identified important relationships between spatial scale and epidemic characteristics, it does not provide a complete or comprehensive understanding of the spatial transmission dynamics of ZIKV in Colombia. Third, our model relied on a simplified description of seasonal transmission, when in fact patterns of seasonality could vary spatially and interact strongly with introduction timing (Huber et al. 2017).
Previous analyses of Zika (Ferguson et al. 2016;Shutt et al. 2017), as well as chikungunya (Perkins et al. 2015;Escobar et al. 2016;Del Valle et al. 2018), have drawn inferences and made forecasts on the basis of nationally aggregated time series data. These efforts depend on the implicit assumption that spatially disaggregated temporal patterns are homogeneous and consistent with spatially aggregated temporal patterns. Our analysis showed that while national-level patterns may be somewhat reflective of departmental-level patterns, municipal-level patterns of cumulative incidence are diverse and not well approximated by national-level patterns. Although our analysis was limited in its ability to explain the mechanisms that drove these diverse patterns, applying our classification algorithm to simulated data in which driving mechanisms were known showed that spatial differences in driving mechanisms can be associated with perceptible differences in temporal patterns. The initial wave of the Zika epidemic appears to have subsided, but understanding of spatial variation in transmission dynamics remains imperative for time-sensitive applications such as site selection for vaccine trials (Perkins 2017;Asher et al. 2017) and anticipating future epidemics (Ferguson et al. 2016).

Appendix S1. Description of simulation model of Zika virus transmission in Colombia.
We simulated data sets comparable to the observed data using an R implementation of the ZIKV transmission model described by Ferguson et al. (2016) parameterized to match the municipallevel R0 values derived from Perkins et al. (2016). The model by Ferguson et al. had a number of attractive features, including plausible values of a number of parameters common to ZIKV transmission models, realistic accounting of the timing of transmission-relevant processes in mosquitoes and humans, seasonal variation in transmission, and the ability to capture multiple forms of stochasticity associated with transmission and surveillance. In brief, the model assumes that humans transition from a susceptible compartment into a recovered and immune compartment following a period of incubation and infectiousness and that mosquitoes become infectious and remain so following bites of infectious humans and a seasonally variable incubation period. Mosquito population density is also seasonally variable, driven by seasonal variation in larval carrying capacity and adult mortality. A full description of the model can be found in the paper by Ferguson et al. (2016).
To drive the model, we based estimates of the basic reproduction number, R0, on a set of ZIKV epidemic size projections for Latin America made early in the epidemic using relationships between environmental variables and transmission metrics (Perkins et al. 2016). To obtain a single value of R0 for each municipality, we took a weighted sum of the R0 raster at 5 km x 5 km resolution weighted by a raster layer of human population projections (Sorichetta et al. 2015) aggregated to that scale by Siraj et al. (2018). We calibrated these R0 estimates to observed dynamics in Colombia by scaling municipal values of R0 from Perkins et al. (2016) by a constant (2.72) such that the value for the municipality of Girardot, Colombia, matched an estimate of 4.61 derived from an analysis of temporal incidence patterns there (Rojas et al. 2016). The environmental variables that drove spatial variation in these R0 values include temperature, Ae. aegypti occurrence probability, and the gross cell product economic index.
To apply this model to Colombia, we used municipal-level human population sizes derived from WorldPop (Sorichetta et al. 2015) and adjusted seasonally averaged mosquito densities such that seasonally averaged values of R0 matched our municipal-level R0 estimates. Another departure from the original model by Ferguson et al. (2016) that we made was to remove explicit spatial coupling, given the complexity of doing so realistically for all 1,123 municipalities in Colombia. Instead, we simulated imported infections (i.e., infections acquired outside a given municipality) to occur at a daily per capita rate that was proportional to a normal probability density function fitted to the temporal pattern of national-scale incidence (timing of national-scale incidence: mean = 32.57 weeks after the first reported case, standard deviation = 8.85 weeks). Although this approach was not able to capture differences in the timing of importation patterns across municipalities, none of the six features of the cumulative incidence curves that we analyzed depended on the timing of the epidemic in one municipality relative to another. To approximately match the national total of 85,353 suspected Zika cases, the timevarying ZIKV importation function that we used was scaled by a value of 1.55 x 10 -3 . This value was obtained by trial-and-error tuning of example simulations in which a reporting rate of 11.5% was assumed (Kucharski et al. 2016). Also, given that our interest was in short-term dynamics rather than long-term dynamics as in Ferguson et al. (2016), we removed human age stratification from the model.  Table S2. Summary of results from one-way analyses of variance at the departmental scale (n=31). For each relevant environmental variable (rows), we performed an analysis of variance to test for differences in the mean of that variable across two, three, or four groups identified by the classification analysis (columns). The F statistic and p value of each test is shown. Figure S1. Silhouette plots at the departmental level for groups numbering two to ten obtained by partitioning around medoids. Each bar corresponds to the silhouette value of a given department according to the group assignments indicated by different colors in each panel.

SUPPORTING FIGURES
Higher average silhouette values indicate stronger clustering.    Figure S5. Silhouette plots at the municipal level for groups numbering two to ten obtained by partitioning around medoids. Each bar corresponds to the silhouette value of a given municipality according to the group assignments indicated by different colors in each panel.
Higher average silhouette values indicate stronger clustering.     Figure S10. Silhouette plots at the municipal level based on a randomly selected simulated data set for groups numbering two to ten obtained by partitioning around medoids. Each bar corresponds to the silhouette value of a given municipality according to the group assignments indicated by different colors in each panel. Higher average silhouette values indicate stronger clustering. Figure S11. Pairwise plots of features of proportional cumulative incidence curves based on a randomly selected simulated data set, with colors distinguishing group assignment of the municipalities into one of two groups. Histograms show the marginal distributions of the features, and numbers in the upper right half indicate pairwise correlation coefficients between each pair of features. The transparency of each point is inversely proportional to silhouette value. Figure S12. Pairwise plots of features of proportional cumulative incidence curves based on a randomly selected simulated data set, with colors distinguishing group assignment of the municipalities into one of three groups. Histograms show the marginal distributions of the features, and numbers in the upper right half indicate pairwise correlation coefficients between each pair of features. The transparency of each point is inversely proportional to silhouette value.