An Introduced Crop Plant Is Driving Diversification of the Virulent Bacterial Pathogen Erwinia tracheiphila

Erwinia tracheiphila is a virulent phytopathogen that infects two genera of cucurbit crop plants, Cucurbita spp. (pumpkin and squash) and Cucumis spp. (muskmelon and cucumber). One of the unusual ecological traits of this pathogen is that it is limited to temperate eastern North America. Here, we complete the first large-scale sequencing of an E. tracheiphila isolate collection. From phylogenomic, comparative genomic, and empirical analyses, we find that introduced Cucumis spp. crop plants are driving the diversification of E. tracheiphila into multiple lineages. Together, the results from this study show that locally unique biotic (plant population) and abiotic (climate) conditions can drive the evolutionary trajectories of locally endemic pathogens in unexpected ways.

Despite its economic burden, nothing is known about the population structure of E. tracheiphila, the genetic basis of virulence against the two cucurbit genera that E. tracheiphila infects, or why E. tracheiphila only occurs in such a restricted geographic range. To address this knowledge gap, we collected and sequenced an 88-isolate collection sampled from all susceptible host plants across the entire geographic range where E. tracheiphila is known to occur. Via analysis of the genomes of these isolates, we evaluated E. tracheiphila genetic diversity in relation to its plant host range and geographic distribution. We then tested for interactions between the abiotic environment (temperature) and host plant species on E. tracheiphila virulence. We find that these isolates group into three distinct clusters that differ in host plant associations, geographic ranges, and horizontally acquired virulence gene repertoires. Low genetic heterogeneity and an excess of rare alleles within each lineage are consistent with a recent bottleneck and expansion into a susceptible host population. In controlled inoculation experiments, E. tracheiphila is more virulent at temperate than subtropical summer temperatures. Further, we find that cucumber-a crop plant recently introduced into eastern North America-is the most susceptible to E. tracheiphila overall and the only plant species susceptible to infection by isolates from all three lineages. From this, we infer that both genetic factors (i.e., horizontal acquisition of virulence genes) and ecological factors (i.e., foreign crop plant introductions and low genetic diversity in agricultural populations) may have driven the recent emergence and epidemic persistence of E. tracheiphila into cucurbit agricultural populations in temperate eastern North America. recombination and gene flow (46,47), revealed that E. tracheiphila is comprised of three distinct, coexisting phylogenetic clusters, designated Et-C1, Et-C2, and Et-melo ( Fig. 2A; also see Fig. S1 in the supplemental material and see the text file at https://figshare.com/ projects/Recent_emergence_of_a_virulent_phytopathogen/35108). Faint reticulations along the long branches connecting Et-C1 and Et-melo suggest some limited gene flow between these two groups. Et-C2 is on a nonreticulating long branch and shows no evidence of gene flow with either Et-C1 or Et-melo ( Fig. 2A). We refer to these three distinct groups as phylogenetic "clusters" instead of "pathovars," as "pathovar" assignments are often inconsistent with phylogenetic group (48)(49)(50)(51).
The three clusters are present at different frequencies, over different geographic ranges, and have distinct host plant association patterns ( Fig. 2B and Table 1; see also Table S1). The most frequently recovered E. tracheiphila isolates (55 isolates) belong to the Et-melo cluster and were collected exclusively from cucumber and muskmelon. Et-melo also has the largest geographic distribution, encompassing the known range of E. tracheiphila throughout the midwestern and northeastern United States (Fig. 2B). The 26 Et-C1 isolates were recovered from both introduced cucumber and native squash plants collected in the Mid-Atlantic and Northeast (Table 1). Of the 7 Et-C2 isolates, six were recovered from squash and one from cucumber (Table 1), and all Et-C2 isolates were found in the northeastern United States (Fig. 2B). Isolates from all three clusters were found in field-infected cucumber plants, while muskmelon was infected only by the Et-melo isolates, and squash was infected only by the Et-C1 and Et-C2 isolates ( Table 1). All three lineages are geographically restricted to temperate eastern North America (Fig. 2B). This is further north than where wild, undomesticated Cucurbita spp. naturally occur in the American tropics and subtropics (31,35,52).
All three Erwinia tracheiphila lineages have low genetic diversity. To investigate the recent population history of E. tracheiphila, genetic diversity was measured with the Watterson estimator ( W ) and Tajima's D. These were calculated separately within each phylogenetic cluster and within each collection period (collection period one from 2008-2010, and collection period two in 2015). The core genes shared by all isolates within each lineage were assigned as putatively functional (Intact), or mobile DNA/ putatively pseudogenized (Pseudogenized ϩ Repetitive) using published, manually curated gene annotations from the BuffGH reference genome (formerly PSU-1) (20,30). There is low within-cluster nucleotide diversity ( W ) in all three lineages (Table 2) despite clear between-cluster genetic divergence ( Fig. 2A), which is consistent with small effective population sizes. Et-C2, which was observed only in the 2015 collection, has the fewest segregating sites, is represented by the fewest isolates in the smallest geographic range, and has isolates with the shortest branch lengths ( Fig. 2A), which together suggest that Et-C2 may be the most recently emerged lineage (Tables 1 and  2; Fig. 2B). For both Et-C1 and Et-melo, W increased over the 7-year period, although diversity increased 7-fold faster in Et-C1 than Et-melo. The low overall heterogeneity within each E. tracheiphila cluster is compatible with recent emergence from a small founder population and recent divergence into distinct genetic clusters.
In addition to the density of polymorphic sites ( W ), the allele frequencies at these sites also contain information about recent population history. Tajima's D, which measures the degree to which the allele frequency spectrum is compatible with that of a neutral population of constant size, is negative for all three clusters (Table 2). This reflects an excess of rare variants and suggests that these three lineages are experiencing an ongoing population expansion after a bottleneck. The excess of rare alleles is consistent with the hypothesis that these three relatively monomorphic lineages are rapidly spreading within genetically homogenous host plant populations that are susceptible to infection by pathogen variants with the same virulence alleles. All three lineages show evidence of limited within-lineage recombination, although the large number of repetitive regions likely makes recombination estimates inexact (Table 2). While estimated rates of homologous recombination are relatively low for all three clusters, this process may be contributing to lack of within-cluster phylogenetic structure ( Fig. 2A). The phylogenetic network of 88 Erwinia tracheiphila isolates. The network was reconstructed from concatenated alignments of the core gene families identified with OrthoMCL in all 88 E. tracheiphila genomes. Three distinct clusters separated by long branches are named Et-melo, Et-C1, and Et-C2 based on the host plant that they were found to infect (Table 1): isolates from clusters Et-C1 and Et-C2 were found only on squash and cucumber plants, while strains from cluster Et-melo were found only on muskmelon and cucumber. Host plant, year of isolation, location, and assembly metadata for each isolate are listed in Table S1. Scale bar shows number of substitutions per site. Figure S1 shows individual isolate identifiers for each isolate in the network. (B) Geographic distribution of the three clusters. Each of the 88 isolates is plotted as a single circle on the map according to its collection site and colored according to the genetic cluster to which it belongs (see panel A). The isolate-specific locations and year of collection are listed in Table S1.
Estimation of the Erwinia tracheiphila core genome, pangenome, and functional repertoire. The entire E. tracheiphila pangenome of the 88 strains sequenced here, encompassing all core, accessory, and unique genes, is 10,598 gene families (Fig. 3A). The pangenomes of geographically widespread microbes with environmental reservoirs such as Prochlorococcus or Escherichia coli have almost an order of magnitude more gene clusters (53,54). The relatively small E. tracheiphila pangenome size of ϳ10,600 genes is compatible with the hypotheses that E. tracheiphila is a hostrestricted pathogen that recently emerged from a population bottleneck and/or is predominantly circulating in low-diversity agricultural host plant populations.
Of the 4,032 gene families present in at least 95% of sequenced genomes, 2,907 (72%) can be assigned to a functional category of the Clusters of Orthologous Groups (COG) database (55). These "core" gene families are enriched in almost all COG categories associated with cellular processes and metabolism ( Fig. 3B and Table S1). This finding is consistent with these gene families being essential for survival and therefore ubiquitous in all isolates in the population. Only 699 out of 3,720 (18.8%) genes found in fewer than 5% of E. tracheiphila sequenced genomes are assignable to a COG functional category. This set of gene families that are "rare" in the population are enriched in only "Mobilome" (X), suggesting that most rare genes are accessory genes or mobile DNA and are not involved in cellular, metabolic, or information processes ( Fig. 3B and Table S2).
Erwinia tracheiphila clusters vary by hrpT3SS effector content. Many Gramnegative bacterial phytopathogens use a hypersensitive response and pathogenicity type III secretion system (hrpT3SS) to translocate effector proteins directly into the host plant cell. In the plant cell cytoplasm, T3SS effectors may reveal the presence of a pathogen and initiate a cascade of antipathogen defenses, often mediated through salicylic acid (56). Alternatively, effectors may promote pathogen virulence by suppressing induced plant defense responses. E. tracheiphila contains an hrpT3SS locus, and E. tracheiphila suppresses salicylic acid production in a wild gourd host (Cucurbita pepo subsp. texana) (20,44), suggesting that E. tracheiphila may use effectors for suppressing plant-induced defenses during disease development.
We found that the 88 E. tracheiphila isolates collectively carry at least 23 hrpT3SS effector genes ( Fig. 4 and Fig. S2). Because differences in T3SS effector repertoire can drive host plant specificity, we also examined the distribution of effector genes between the three E. tracheiphila clusters. Cluster Et-melo has one unique effector gene, Eop3, which is homologous to the Eop3 gene in Erwinia amylovora (57), the uncharacterized Pseudomonas syringae pv. actinidiae effector HopBN1 (16), and the P. syringae effector HopX1 (58). Two other effector genes, NleD and AvrRpm1, are unique to the Et-C1 cluster. In the BuffGH reference genome, NleD is present in six copies, including in an intact phage region (20,30). The E. tracheiphila NleD genes have 99% amino acid identity to an NleD gene in an active phage region in the emerging mouse pathogen Citrobacter rodentium (59) (Fig. S2). The functional significance for E. tracheiphila having six NleD copies-if there is functional significance-is not yet known. There are no effector genes that are unique only to the Et-C2 cluster, but a gene for effector HopAM1 is present in Et-C2 and Et-C1 isolates, and a gene for HopAF1 is present in Et-C2 and Et-melo isolates (Fig. 4). In P. syringae, HopAM1 manipulates abscisic acid-mediated responses and water availability via stomatal closure (60), but how it affects the virulence phenotype for E. tracheiphila is unknown. In P. syringae, HopAF1 inhibits pathogen-associated molecular pattern (PAMP)-mediated increases in ethylene pro-  Table S1 for notations). The y axis shows the percentage of the gene families within each COG category. The bar to the far right shows the overall percentage of the core and rare gene families that were not represented in COG. 'Mobile' (X) and the number of genes not assigned to a COG are shown with a 100% y axis, while the other categories are shown with a y axis scaled to 40%. Asterisks designate the functional categories that are significantly overrepresented compared to the distribution of all genes in that category (Fisher's exact test, P Ͻ 0.05; Table S2). The percentages of rare and core genes not in COG (far right) are shown for scale but were not included in the statistical tests.
duction, and homologs are widely distributed in many bacterial phytopathogens (61). All five cluster-specific effectors (HopAM1, NleD, AvrRpm1, Eop3, and HopAF1) are physically located far from the hrpT3SS locus, and their evolutionary histories are all consistent with horizontal acquisition (Fig. S2). Phytopathogen effectors are often determinants of host range, and the horizontal acquisition of these five effectors may underlie the split of E. tracheiphila into phylogenetic clusters with distinct virulence phenotypes and host plant association patterns. Cucumber is the only host plant susceptible to all Erwinia tracheiphila lineages. Controlled cross-inoculation experiments were used to test whether the patterns of lineage-specific host plant associations observed in the field were due to random sampling patterns or were reflective of genetic differences. In the greenhouse, three isolates from Et-melo, three isolates from Et-C1, and one isolate from Et-C2 were all cross-inoculated into 2-week-old seedlings of squash, cucumber, and muskmelon. Isolates from Et-melo killed all experimental cucumber and muskmelon plants (Fig. 5). In squash, Et-melo isolates induced localized wilt symptoms, but all squash plants inoculated with Et-melo recovered (Fig. 5). Isolates from Et-C1 and Et-C2 were highly virulent against cucumber, killing 98% of experimental cucumber plants, but less virulent against squash and muskmelon ( Fig. 5 and Table 3). The attenuation of Et-C1 and Et-C2 virulence on muskmelon compared to Et-melo in the greenhouse is likely ecologically important, as none of these strains have yet been isolated from field- infected muskmelon (Table 1). Squash showed variable susceptibility to isolates from Et-C1 and Et-C2, which is consistent with previous reports that this genus is moderately resistant to E. tracheiphila ( Fig. 5 and Table 3) (25). In summary, cucumber is the most susceptible of the three host plant species and is the only host plant susceptible to infection by isolates from all three E. tracheiphila clusters in both the field (Table 1 and Fig. 2A) and greenhouse (Fig. 5).
Subtropical temperatures inhibit Erwinia tracheiphila in vitro growth and in vivo virulence. We tested the effects of temperature on in vitro growth and in vivo virulence to determine whether the temperatures in temperate eastern North America, the only region in the world where E. tracheiphila is known (see "Confirmation of  restricted Erwinia tracheiphila geographic range" in Materials and Methods), are more favorable for E. tracheiphila growth than subtropical temperatures. For isolates from all three clusters, we find that the final concentration indicated as optical density at 600 nm (OD 600 ) after 40 h of in vitro growth is suppressed at warmer 33°C and 37°C incubation temperatures, compared to incubation at cooler temperatures of 28°C or 30°C (P Յ 0.001) ( Fig. 6 and Table 4).
To test the effects of temperature on in vivo virulence, we isolated an E. tracheiphila strain from a field-infected cucumber and a second strain from a field-infected squash. Each isolate was then inoculated into the host species in which it was found. Half of the plants were incubated at average July temperatures measured in Massachusetts (27°C day/18°C night) to represent the temperature in the northeastern United States. This is the region where E. tracheiphila is an annual epidemic, all three E. tracheiphila lineages were found, and cultivated squash, cucumber, and muskmelon are present only due to human agriculture. The other half of the inoculated plants were incubated at average July temperatures measured in Texas (33°C day/23°C night) to represent the subtropical southwestern United States, where the wild squash progenitor (Cucurbita pepo subsp. texana) is native but E. tracheiphila has never been reported (35). At "southwestern U.S." temperatures, only three inoculated squash plants developed localized symptoms in the inoculated leaf, and these three plants recovered. At cooler "northeastern U.S." temperatures, half of the squash plants developed localized wilt symptoms, but only six of these plants developed systemic disease and died within the 25-day experiment ( Fig. 7

DISCUSSION
In our comprehensive study of Erwinia tracheiphila genomic diversity, host plant association patterns, and demographic history, we found that E. tracheiphila is comprised of three distinct, homogeneous phylogenetic lineages that have an excess of rare genetic variants. From this, we infer that these three clusters were recently founded by small populations and are currently experiencing rapid population expansions to fill new agroecological niches (3, 62-64, 134, 135). These inferences about E. tracheiphila demographic history correlate with recent anthropogenic changes to cucurbit agroecosystems in eastern North America. The recent introduction of all cucurbit crop plants into temperate eastern North America, one of the world's most agriculturally intensive regions, likely created a novel ecological niche (33,65,66). Cucumber is the most susceptible plant species in the greenhouse and field and the only plant species highly susceptible to infection by isolates from all three E. tracheiphila lineages. The high susceptibility of cucumber to isolates from all three clusters  in both the field and greenhouse suggests that cucumbers could be functioning ecologically as a highly susceptible reservoir host. This presents the possibility that E. tracheiphila (which was already present in the midwestern United States by 1900 [67,68]) could not have emerged or persisted as an annual epidemic without the humanmediated introduction of cultivated Cucumis spp. into temperate North America in the early 1500s (33). E. tracheiphila has among the most dramatic structural genomic changes-including gene decay through pseudogenization, mobile element invasion and proliferation, and horizontal gene acquisitions-of any bacterial pathogen (20). These structural changes are consistent with a recent evolutionary transition from a progenitor with multiple environmental reservoirs and diverse metabolic capabilities to a pathogen with a narrow, host-specialized ecological niche. However, the species identity, geographic origin, and host relationships of the direct E. tracheiphila progenitor are all unknown, limiting our ability to investigate the evolutionary transition from the E. tracheiphila direct progenitor-presumably a plant commensal or weak pathogen-to a virulent pathogen (62,69,70). The genomic evidence of the recent transition of E. tracheiphila to a virulent, host-restricted pathogen (20) highlights the continuing risk of nonpathogenic environmental microbes acquiring virulence genes via continual and naturally occurring mobile DNA invasion (71). Virulent pathogens are unlikely to persist in ecologically intact habitats with higher plant species diversity and higher diversity of pathogen resistance (R) genes (72)(73)(74). When pathogens evolve or acquire novel virulence genes, this acts as a selective pressure on host plant populations and causes a rise in frequency of plant resistance genes. However, repeatedly planting the same crop plant varieties in agricultural populations interferes with this coevolutionary dynamic by preventing a rise in frequency of effective host plant resistance alleles. Identifying cultivars or wild crop relatives with resistance genes, and crossing them into cultivated crop populations, is one method favored by plant breeders. However, the probability of success from this approach for controlling E. tracheiphila is likely to be low. Cucumber is the best characterized of all cucurbit crops, and this species was found to contain among the lowest genetic heterogeneity of any vegetable crop, with an estimated effective population size of only 500 individuals at the time of domestication (75,76). The E. tracheiphila-cucurbit association is evolutionarily novel (20), suggesting that genetic resistance to E. tracheiphila may not exist in any undomesticated cucurbit populations. Even if the genetic basis of host resistance is identified in wild relatives or rare cultivars of cucumber, squash, or melon and successfully introduced into agricultural populations, E. tracheiphila is amenable to invasion by mobile DNA, including acquisition of virulence effector genes (20). This could function to quickly overcome potential host plant genetic resistance, especially if the same resistance gene(s) is broadly deployed in large, homogeneous crop plant populations (77,78). This potential to rapidly generate novel variants from a recombining source population(s), together with the ability to horizontally acquire virulence effectors, will be important to consider when attempting to design durable resistance strategies for agricultural systems (20,79).
Many-perhaps most-of the economically damaging plant pathogens and insect pests have emerged after the Neolithic Revolution (11,16,63,64,(80)(81)(82)(83)(84)(85). Yet, little effort has been put toward using ecological principles to plan genetic, physiological, and/or structural complexity into agricultural systems to mitigate susceptibility to outbreaks of insect pests or microbial pathogens (10). We hypothesize that the kind of local pathogen (or insect pest) emergence such as what has happened with E. tracheiphila is more common than currently understood. Further, we predict that these local emergence events can in some cases be followed by rapid dissemination through genetically homogeneous agricultural populations. Given the potential of such infections to threaten globalized crop populations, including staple crops that are vital for local and global food security, we urgently need to develop approaches for building sustainable agroecosystems that are rooted in ecological and evolutionary principles.

MATERIALS AND METHODS
Study system. Wild species in the gourd family, Cucurbitaceae, occur in tropical and subtropical regions worldwide, and cultivars from this family are among the world's most widely grown fruit and vegetable crops (34,86). Like many Cucurbitaceae, Cucurbita spp. and Cucumis spp. produce a class of secondary metabolites called cucurbitacins (87)(88)(89). Cucurbitacins are among the most bitter and toxic compounds ever characterized and function as highly effective herbivory deterrents for almost all insect and mammalian herbivores, including humans (89)(90)(91)(92). The exceptions are a few genera of highly coevolved Luperini leaf beetles (Coleoptera: Chrysomelidae), and for these beetles, cucurbitacins function as arrestants and feeding stimulants (90,93,94). Acalymma is a strictly New World genus of highly specialized leaf beetles that has coevolved in Mesoamerica with Cucurbita. In natural settings, Acalymma spp. are obligately dependent on Cucurbita plants in all life stages (95)(96)(97). E. tracheiphila has no known environmental reservoirs and persists only within infected Cucurbita or Cucumis host plants or the digestive tracts of the highly specialized beetle vectors. Beetle vectors are the only documented winter reservoirs of E. tracheiphila (45,89,98). The Eastern striped cucumber beetle (Acalymma vittatum) is the only Acalymma species that has received substantial research attention because of its status as an important agricultural pest and plant pathogen vector in eastern North America (97). A. vittatum, which is the predominant insect vector of E. tracheiphila, occurs only in northeastern and midwestern North America. It is likely that A. vittatum only recently emerged into this geographic area following the domestication and range expansion of Cucurbita for agriculture, as was recently shown for the obligate pollinator of Cucurbita in eastern North America (66,99). In the Old World, Aulocophora species (Coleoptera: Chrysomelidae: Luperini) are obligate cucurbit specialists, although natural history information is almost completely absent for almost all species (100,101).
Confirmation of restricted Erwinia tracheiphila geographic range. Losses from E. tracheiphila are an annual epidemic in temperate eastern North America (22,25,26,29,41,87,98,(102)(103)(104)(105)(106). No losses from E. tracheiphila have been reported anywhere else in the world. To evaluate whether the reported geographic restriction of E. tracheiphila to temperate eastern North America is a reflection of its actual geographic range or an artifact of this pathogen not being recognized outside this range, one of us (L.R.S.) undertook extensive scouting expeditions of wild and cultivated Cucurbita, Cucumis, Luffa, and Lagenaria populations in diverse areas of the world, including the entire southern United States from California to South Carolina; on the west coast of Mexico from Jalisco to Oaxaca; in Europe; and in Southeast Asia. There is one report of E. tracheiphila in New Mexico (107), but this isolate was said to be from a cultivated watermelon (which is not susceptible) and this isolate is not archived, nor do gene sequences from it exist, and we must therefore at this time consider it a single erroneous report.
No E. tracheiphila symptoms were observed in undomesticated populations of Cucurbita digitata in California and Arizona or in undomesticated or domesticated Cucurbita spp. or Cucumis spp. in California, Arizona, New Mexico, Texas, Louisiana, Mississippi, Alabama, Georgia, South Carolina, or Missouri. In Mexico, E. tracheiphila was not found in wild or cultivated cucurbits in the Mexican states of Jalisco, Guerrero, Michoacán, Oaxaca, Guanajuato, or Querétaro. Nor was E. tracheiphila observed in any cucurbits in commercial or academic farms in Thailand, Philippines, or Vietnam. In Europe, E. tracheiphila was never observed in cucumber or squash plants in Spain or Germany. These observations are consistent with the lack of reports of E. tracheiphila outside temperate northeastern and midwestern North America. E. tracheiphila has never been shown to survive outside a few agricultural species of cucurbit hosts and beetle vectors (45,98). Therefore, the isolates collected in this study ( Fig. 2A and B; see also Table S1 in the supplemental material) are hypothesized to cover the entire plant host and geographic range where Erwinia tracheiphila exists.
Collecting single isolates of Erwinia tracheiphila. Single E. tracheiphila isolates were obtained from symptomatic squash (Cucurbita pepo), muskmelon (Cucumis melo), and cucumber (Cucumis sativus) plants in agroecosystems from across the entire geographic range where economic losses from E. tracheiphila are reported ( Fig. 2B; Table S1). In the field, infected plants were visually identified by characteristic wilting symptoms (Fig. 1A). All wilting, symptomatic plants in a given field were gathered to avoid collection bias. Symptomatic vines from infected plants were removed with a sterile knife, immediately placed in separate 1-gal plastic bags, and stored at 4°C for a maximum of 3 days prior to performing bacterial isolations. The reference BuffGH strain (formerly PSU-1) was isolated in 2007 from an undomesticated wild gourd C. pepo subsp. texana plant growing at the Rock Springs Experimental Station in Rock Springs, PA (30). These C. pepo subsp. texana seeds were originally collected from wild populations in New Mexico and Texas and were greenhouse cultivated and then field transplanted for academic research at Pennsylvania State University in University Park, PA (reviewed in reference 87). Isolates collected in 2007 to 2009 were acquired from the authors of reference 51, were collected according to the protocol described there, and are stored at Iowa State University in Ames, IA. E. tracheiphila isolates from 2015 were collected by first washing external dirt and debris from symptomatic vines with tap water and then surface sterilizing the cleaned vines with 70% ethanol. Sterilized vines were cut into 3-to 4-in. sections between nodes with sterile razor blades, and 1/2 in. of the vine sections was soaked in 3 ml of autoclaved Milli-Q water in 15-ml Falcon tubes until pure E. tracheiphila could be seen on the cut surface (Fig. 1B). Sterile loops were then used to transfer E. tracheiphila ooze (Fig. 1B) to King's B (KB) agar plates (1 liter: 20 g protease peptone no. 3, 10 ml glycerol, 1.5 g MgSO 4 ·7H 2 O, 1.5 g KH 2 PO 4 , 15 g Bacto agar). Single isolates were restreaked, and then single colonies from the restreaked plates were grown in shaken liquid KB broth at 25°C for 48 h and cryogenically preserved with 15% glycerol.
DNA extraction, library preparation and whole-genome sequencing. Single colonies from cryogenically preserved glycerol stock were grown on KB agar plates, and single colonies were grown in  Table S1 were generated using a Nextera DNA sample preparation kit (Illumina, San Diego, CA). The libraries were amplified for 8 cycles using the Kapa HiFi library amplification kit (Kapa Biosystems, Wilmington, MA), and the size selection was performed using AMPure XP beads (Agencourt Bioscience Corp., Beverly, MA). Library concentrations were measured using a Qubit DNA quantification kit (Life Technologies, Carlsbad, CA), and the fragment size range detection (100 to 400 bp) was performed using the TapeStation 2200 (Agilent Technologies, Santa Clara, CA). Libraries were pooled using Nextera index kits, and 150-bp paired-end reads were generated with an Illumina HiSeq 2500 sequencing system. Assembly metrics of all strains sequenced for this study were determined with QUAST, with standard settings that retain only contigs larger than 500 bp (108).
Transformation of Erwinia tracheiphila with an mCherry-expressing plasmid. E. tracheiphila strain BuffGH was used for visualization of E. tracheiphila in the xylem of infected squash seedlings. Plasmid pMP7605 carrying a constitutively expressed mCherry gene was electroporated into competent E. tracheiphila cells. For this, we followed protocols described previously (109). Briefly, competent E. tracheiphila cells were prepared by growing E. tracheiphila in 200 ml KB to an OD 600 of 0.02. Subsequently, cells were washed using decreasing volumes, once with chilled sterile Milli-Q water and twice with 10% glycerol, and finally resuspended in 2 ml of 10% glycerol. For electroporation, a 40-l aliquot of competent cells was mixed with 4 l of plasmid DNA, placed in an 0.2-cm cuvette, and electroporated at 2.5 kV for 5.2 to 5.8 ms. Electroporated cells were immediately transferred to 3 ml KB liquid and incubated at room temperature without shaking for 1 h. A cell pellet was obtained, resuspended in 100 l of medium, and then plated in KB agar with ampicillin (100 g/ml). Colonies of fluorescent E. tracheiphila were obtained after 5 days at room temperature.
Genome assembly and annotation. Adapter trimming and quality filtering of raw Illumina reads were performed using the FastX toolkit 0.0.13.2 (136), SeqTK 1.0 (https://github.com/lh3/seqtk/), and FastQC 0.10.1 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Both mapping and de novo assemblies were then generated for each sequenced isolate. For the de novo assemblies, SPAdes 3.1.1 was used with default parameters to assemble the quality-filtered, adapter-trimmed, paired-end reads using k-mer sizes of 21, 33, and 55 and the -careful parameter (110). For ab initio annotations of the assembled de novo whole-genome sequences, Prokka version 1.11 was used with default parameters (111). For the mapping-based assemblies, Mira 4.1 (112) was used to map quality-filtered, adaptertrimmed, paired-end reads from each isolate to the BuffGH PacBio reference strain (30). The functional annotations of all coding sequences (including pseudogenes) were transferred to each genome from the manually curated annotation of the reference BuffGH genome (20), using the RATT function in PAGIT 1.0 (113). We assumed that all pseudogenes are the same in all isolates, which will be confirmable only with long-read PacBio sequencing of these isolates followed by manual annotations.
Phylogenetic relationships of Erwinia tracheiphila isolates. Orthologous gene families present in all E. tracheiphila isolates were identified from the de novo assemblies with OrthoMCL (114) through an all-versus-all BLASTP 2.2.28ϩ search with an E value cutoff of 10 Ϫ5 . The orthologous genes were aligned using MAFFT 6.853 (115). The gene alignments were trimmed with trimAl version 1.2 using the "automated1" option (116). The individual gene alignments were concatenated into the core genome alignment using the publicly available script at https://doi.org/10.5281/zenodo.1318245 (last accessed 8 September 2014). The 237,634-amino-acid concatenated core genome alignment used to reconstruct the network analysis in Fig. 2A is included in the supplementary file at https://figshare.com/projects/Recent _emergence_of_a_virulent_phytopathogen/35108. The evolutionary relationships among the isolates were reconstructed and visualized in SplitsTree v 4.13.1 (47) using the core genome alignment as input.
Determination of within-cluster diversity. The genes from the manual annotations transferred to the mapped assemblies were used in an all-versus-all BLASTP 2.2.28ϩ (117) search with an E value cutoff of 10 Ϫ5 . OrthoMCL (114) was run separately for all the isolates within each lineage to identify the core orthologous gene families within each of Et-melo, Et-C1, and Et-C2. For population genetics analyses, the core genes shared by all isolates within each of the three lineages were designated either Intact, meaning that they are putatively functional based on the manually curated annotations in reference 20, or Pseudogenized/Repetitive, meaning that they either are predicted to be pseudogenes or were predicted to be mobile DNA (genes from bacteriophage, insertion sequences, plasmids, or transposases). The Pseudogenized/Repetitive genes from bacteriophage, insertion sequences, plasmids, or transposases were determined by domain assignments with PfamScan 1.5 (118), ISfinder (January 2015 update) (119), and PHAST (120) as described in reference 20. For Et-C1 and Et-melo clusters sampled at multiple time points, two groups were created: isolates collected from 2008 to 2010 and those collected in 2015. Genetic diversity was quantified for each cluster using Watterson's estimator W per site (121), where W estimates 2N e , where N e is the effective population size and is the mutation rate.
For recombination estimates, quality-filtered reads were mapped to the reference BuffGH sequence (30) with the Burrows-Wheeler alignment (BWA) tool 0.7.4 (122), a pileup was created with SAMtools 0.1.18 (123), and variants were called with VCFtools 0.1.9 if the Phred quality score of the variant site was greater than or equal to 60 (124). Single nucleotide polymorphisms (SNPs) were not called if (i) within 9 bp (three codons) of each other and (ii) with less than 10ϫ coverage or (iii) with more than 150ϫ coverage, since short Illumina reads cannot be accurately placed over repetitive regions. Recombination rates within each pathovar were estimated by using the VCF_to_FASTA.sh (see Text S1 in the supplemental material) script to create whole-genome alignments compatible with Gubbins 2.1.0 (125), which was run for a standard 10 iterations.
Pangenome identification. The Micropan package (126) in R 3.2 (127) was used to identify the core and pangenome of de novo E. tracheiphila isolate assemblies. De novo assemblies (see "Genome assembly and annotation" above) were used to ensure that the entire repertoire of genes present per isolate was included, and the pangenome estimates would not be biased with mapping assemblies based on what was present in the reference genome. The groups.txt output file from the OrthoMCL clustering of protein sequences of the de novo assemblies (see "Phylogenetic relationships of Erwinia tracheiphila isolates" above) and custom R scripts (127) were used to identify genes that were "rare" (present in fewer than 5% of isolates) or "core" (present in more than 95% of the sequenced isolates).
Functional comparison of core and rare genes. The ab initio-predicted genes from each E. tracheiphila sequenced isolate were searched against the Clusters of Orthologous Groups (COG) database (2014 update) (55) using BLASTP 2.2.28ϩ (117). Only the top-scoring match (per gene) with an E value of Ͻ10 Ϫ5 was kept. Each gene was assigned a COG category of the first functional category of the top-scoring match. Genes without significant matches to any sequence in the COG database were not assigned a functional category. A one-way Fisher exact test with corrections for multiple comparisons was used to identify the COG categories enriched in each cluster and graphed with ggplot2 in R (127).
Identification of T3SS virulence genes and reconstruction of effector gene phylogenetic trees. The ab initio coding sequences predicted by Prokka from each E. tracheiphila isolate were compared against a manually curated version of the Pseudomonas Hop protein effector database (http://www .pseudomonas-syringae.org/T3SS-Hops.xls; accessed 28 August 2015, with additional non-Pseudomonas hrpT3SS effectors manually added) using BLASTP with an E value cutoff of 10 Ϫ5 (Text S2). The presence and absence of effector genes were visualized with gplots (128) in R 3.2 (127).
To reconstruct the phylogeny of the cluster-specific effector genes identified in E. tracheiphila, the amino acid sequence of each gene was used as a BLASTP query against the nr database (117). An E value cutoff of 10 Ϫ5 was used to acquire a phylogenetically representative sample of homologs. The sequences were aligned with MAFFT v. 6.853 (115) and trimmed with trimAl 1.2 (116). The maximum-likelihood Average July temperatures for Texas and Massachusetts were determined by a Google search to be 33°C for day and 23°C for night for Texas and 27°C for day and 18°C for night for Massachusetts. All plants were kept in programmable Conviron growth chambers with a 16-h-light/8-h-dark cycle and 60% relative humidity. Plants were observed several times weekly for the initial appearance of wilt symptoms in the inoculated leaf, spread of symptoms to a second leaf, and plant death within a 25-day experimental period, according to references 22, 25, 28, 43, 44, 104, and 133. The sample sizes used in this experiment are n ϭ 44 for Texas 'Dixie' squash, n ϭ 44 for Massachusetts 'Dixie' squash, n ϭ 34 for Massachusetts 'Marketmore' cucumbers, and n ϭ 36 for Texas 'Marketmore' cucumbers. A one-way ANOVA to test the effects of host species (either cucumber or squash) at both Texas and Massachusetts temperatures used the model statements death ϭ state ϩ host species and wilt ϭ state ϩ host species as implemented in R (127).
Data availability. Raw reads from the sequenced isolates (Table S1) are available at the NCBI BioProject PRJNA272881, SRA no. SRP056142. The sequence filtering and analysis pipeline, Micropan parameters for pangenome analysis, modified Hop hrpT3SS database, and 'VCF_to_FASTA.sh' script used to create FASTA alignments of variant calls for recombination analysis in Gubbins are available via Figshare Project 35108 (https://figshare.com/projects/Recent_emergence_of_a_virulent_phytopathogen/35108). The concatenated core genome alignment file (237,634 amino acids) used to reconstruct the network analysis in Fig. 2A and Fig. S1 can be found at https://figshare.com/projects/Recent_emergence_of_a_virulent _phytopathogen/35108.

ACKNOWLEDGMENTS
This study was made possible by NSF postdoctoral fellowship DBI-1202736 to L.R.S. and NIH grant GM58213 to R.K.; O.Z. was in part supported by a Simons Investigator Award from the Simons Foundation.
We thank Bob Freedman and Aaron Kitzmiller at Harvard FAS for computational advice and support; Harvard Odyssey Computational Resources; Taj Azarian for helpful advice and discussion; Scott Chimileski for the Fig. 1B image; Rob Dunn for constructive comments on the manuscript; the Weld Research Building at the Arnold Arboretum for staff support, confocal microscope access, and growth chamber facilities; Miguel Coehlo for Nextera library preparation help; and Eric Alm for providing the original suggestion to sequence dozens of isolates. Infected plant samples were provided by Christian Herter Community Garden, Verrill Farms, Dana Roberts, Kristy's Barn, the University of Vermont Horticultural Farm, and Green Valley Farm. We thank them and all other farms and individuals who contributed isolates. We thank Nora Mishanec and Laura Jenny for laboratory and greenhouse assistance.
Mention of commercial products and organizations in this work is solely to provide specific information. It does not constitute endorsement by USDA-ARS over other products and organizations not mentioned.