Stability of association between Arabidopsis thaliana and Pseudomonas pathogens over evolutionary time scales

Crop disease outbreaks are often associated with clonal expansions of single pathogenic lineages. To determine whether similar boom-and-bust scenarios hold for wild plant pathogens, we carried out a multi-year multi-site survey of Pseudomonas in the natural host Arabidopsis thaliana. The most common Pseudomonas lineage corresponded to a pathogenic clade present in all sites. Sequencing of 1,524 Pseudomonas genomes revealed this lineage to have diversified approximately 300,000 years ago, containing dozens of genetically distinct pathogenic sublineages. These sublineages have expanded in parallel within the same populations and are differentiated both at the level of gene content and disease phenotype. Such coexistence of diverse sublineages indicates that in contrast to crop systems, no single strain has been able to overtake these A. thaliana populations in the recent past. Our results suggest that the selective pressures acting on a plant pathogen in wild hosts may be more complex than those in agricultural systems.


91
Color key to samples on top according to panel (a). Pseudomonas species assignments on the right. P. veronii, P. fragi and P.  Table S1). As expected, Pseudomonas was common, occurring in 92% of samples (97% of the epiphytic 107 and 88% of the endophytic samples) and representing on average 3% of the total bacterial community. The 108 genus was found at similar densities inside and on the surface of leaves (ANOVA, P>0.05) (Fig. S1b), indicating 109 no preferential colonization of either niche. While we did not detect an effect of sampling time on relative 110 abundance (ANOVA, P>0.05), abundance varied across sites (ANOVA, R 2 =12.8%, P=10 -10 ; Fig. S1c), suggesting 111 that certain site-specific characteristics may be particularly conducive to Pseudomonas proliferation. 112 By clustering of Pseudomonas 16S rDNA reads at 99% sequence similarity, we could distinguish 56 113 OTUs (Fig. 1b). The 99% threshold for distance-based clustering of reads resulted in OTU patterns more 114 congruent with a whole genome-phylogeny than the more widely used 97% sequence similarity (Fig. S2). While 115 half of the Pseudomonas OTUs could not be classified at the species level, 13 were classified as P. viridiflava, 116 which belongs to the P. syringae complex, including the most abundant OTU, OTU5. The other classifiable 117 OTUs belonged to the P. fluorescens, P. aeruginosa and the P. stutzeri species complexes (Fig. 1b). 118 To understand the factors shaping Pseudomonas assemblages, we studied variation in OTU presence 119 and relative abundances as an indication of Pseudomonas population structure. Permutational multivariate 120 analysis of variance (PerMANOVA on Bray-Curtis distances, P<0.05) indicated that differences between host 121 individuals were associated primarily with interactions between site, leaf niche and sampling time (20.0% 122 explained variance), with a smaller percentage associated with each factor independently such as site (4.0% 123 explained variance), leaf niche (2.3%) or sampling time (2.7%). An important difference between leaf niches was 124 that endophytic Pseudomonas populations were 2.6 times less diverse than epiphytic populations (Wilcoxon 125 test, P<10 -16 ) (Fig. S1d), pointing to selective bottlenecks inside the leaf being stronger. 126

A single lineage dominates Pseudomonas populations in A. thaliana leaves 127
Pseudomonas viridiflava OTU5 was overall the most common Pseudomonas OTU across samples (Fig. 1b), 128 occurring in 59% of epiphytic and 58% of endophytic samples. Across all samples, OTU5 accounted for almost 129 half of reads in the endophytic compartment (48%, range 0-99.9% in each sample), and it was the most 130 abundant endophytic Pseudomonas OTU in 52% of samples (Fig. 1c). The dominance of OTU5 was less 131 pronounced in the epiphytic samples, where it averaged 23% of all reads (range 0-99.9%), being the most 132 abundant OTU in only 23% of samples. This observation indicates an enrichment of this OTU in the endophytic 133 over the epiphytic compartment (Wilcoxon test P = 0.02, paired Wilcoxon test P = 7.4x10 -9 ) (Fig. 1d). In 134 conjunction with the reduced Pseudomonas diversity in the endophytic compartment, this is evidence for OTU5 135 members being particularly successful endophytic colonizers of A. thaliana. 136 16S rDNA amplicon reads reveal the relative abundance of microbes, but they do not inform on the 137 absolute abundance of microbial cells in a plant, what we term the 'microbial load'. The latter is perhaps a 138 more informative readout of the selective pressure exerted by a single microbial taxon than the relative 139 abundance of a taxon among all microbes. A pathogen might dominate the microbiota, but unless it reaches a 140 certain abundance, there might not be a marked decrease in host fitness (Duchmann et al., 1995;Schneider and   were called on 16S rDNA amplicon sequences, but 163 microbial load was assessed on metagenomic reads, the two assays provided independent measurements of 164 relative and absolute microbe abundance. Among all OTUs, OTU5 was the second most highly correlated with 165 total microbial load (Fig. 2b,c; Pearson correlation coefficient R=0.41, q-value=7x10 -5 ), indicating that the 166 strains represented by OTU5 are not only the most common Pseudomonas strains in these plants, but also that 167 they are either major drivers or beneficiaries of microbial infection in these plants. 168

OTU5 comprises many genetically distinct strains 170
While OTU classification based on 16S rDNA and metagenomic assignment can be indicative of the genus or 171 species-level identity of a microbe, genetically and phenotypically diverse strains of a genus will often be 172 clustered together as a single OTU (Moeller et al., 2016). To discern genetic differentiation within OTU5, we 173 therefore wanted to compare the complete genomes of OTU5 strains. From the same plants in which we had 174 analyzed the metagenomes, we cultured and isolated between 1 and 34 Pseudomonas colonies (mean=11 per 175 plant, median=12). We then sequenced and assembled de novo the full genomes of 1,611 Pseudomonas isolates 176 (assembly pipeline and statistics in Fig. S3). Eighty-seven genomes with poor coverage, abnormal assembly 7 characteristics or incoherent genome-wide sequence divergence were removed from further analysis. The 178 remaining 1,524 genomes were 99.5% complete, as estimated with published methods (Simão et al., 2015), 179 containing on average 5,347 predicted genes (standard deviation 284). Extraction of 16S rDNA sequences from 180 the whole genome assemblies demonstrated that the vast majority of all isolates, 1,355, belonged to the OTU5 181 lineage, as defined previously by amplicon sequencing. 182 Maximum-likelihood (ML) whole-genome phylogenies (Ding et al., 2018) were constructed from the 183 concatenation of 807 genes that classified as the aligned soft core genome of our Pseudomonas collection. 184 Because bacteria undergo homologous recombination, the branch lengths of the ML whole-genome tree may 185 not properly reflect the branch lengths of vertically inherited genes, but the overall topology is expected to 186 remain consistent (Hedge and Wilson, 2014 was not solely the result of a few importation 208 events of divergent horizontally transferred 209 material (Fig. 4a). 210 Comparing the position of strains on the phylogeny and their provenance identified several strains that 211 were not only frequent colonizers across plants, but also persistent colonizers over time, each isolated in at 212 least two consecutive seasons (a). Six OTU5 strains accounted for 51% of sequenced isolates, each with an 213 8 overall frequency of between 4-10%, with several found in over 20% of plants. In contrast, no strain outside of 214 OTU5 exceeded an overall frequency of 5%. Generally, non-OTU5 strains were much less likely to be 215 represented by multiple isolates and were very rarely observed in both seasons sampled. 216 217 OTU5 primarily comprises pathogenic strains, but with distinct phenotypes 218 The P. syringae/P. viridiflava complex, to which OTU5 belongs, contains many well-known plant pathogens-219 although not all P. syringae complex strains are pathogenic, with some lacking the canonical machinery required 220 for virulence (Barrett et   Apr 2016 were likely to partially or completely dominate, i.e., 308 reach frequencies above 50%, within a single plant 309 ( Fig. 5a,b). 310 Strain diversity per host individual not only 311 differed between clades, but also between seasons, 312 with the distribution of strain frequency per plant 313 changing over time. We measured the Shannon 314 Index H' (Hill, 1973)    compounds is likely to differ within OTU5 or between OTU5 and other clades, we generated a custom 346 database of all effector genes and structural genes or enzymes for compounds known to be associated with 347 host colonization in the genus Pseudomonas. We then used this database to independently annotate the effector 348 as well as phytohormone and toxin biosynthesis gene content of each isolate (Fig. S6, Table S2). 349 OTU5 strains lacked all known genes for coronatine and syringomycin/syringopeptin synthesis, but 350 auxin synthesis modules were found in almost all isolates (including isolates outside of OTU5). Genes for 351 pectate lyase synthesis were broadly conserved both in and outside of OTU5 (Fig. S6) within OTU5 encode two major flg22 variants, which are highly divergent from one another (Table S2). It is reasonable to hypothesize that the plant host has evolved mechanisms that suppress the disease 425 effect of OTU5. By itself, many OTU5 strains can reduce plant growth in gnotobiotic culture by more than 50% 426 or even kill the plant. In natural populations, the pathogenic effect appears to be mitigated, since we isolated 427 OTU5 strains from plants that did not appear to be heavily diseased. Two leaves were removed and independently processed, and the remaining rosette was flash-frozen on dry ice. 452 The flash-frozen material was processed for metagenomic sequencing and 16S rDNA sequencing of the v4 453 region. The removed leaves were placed on ice, washed in 70%-80% EtOH for 3-5 seconds to remove lightly-454 associated epiphytes. Sterilized plants were ground in 10 mM MgSO4 and plated on King's Broth (KB) plates 455 containing 100 µg/mL nitrofurantoin (Sigma). Plates were incubated at 25°C for two days, then placed at 4°C. 456 Colonies were picked randomly from plates between 3-10 days after plating, grown in KB with nitrofurantoin 457 overnight, then stored at -80°C in 15-30% glycerol. 458 459

16S rDNA v3-v4 amplicon data analysis 485
Amplicon data analysis was conducted in Mothur (Schloss et al., 2009). Paired-end reads were assembled 486 (make.contigs) and reads with fewer than 5 bp overlap (full match) between the forward and reverse reads 487 were discarded (screen.seqs). Reads were demultiplexed, filtered to a maximum of two mismatches with the tag 488 sequence and a minimum of 100 bp in length. Chimeras were identified using Uchime in Mothur with more 489 abundant sequences as reference (chimera.uchime, abskew=1.9). Sequences were clustered into OTUs at the 99 490 % similarity threshold using VSEARCH in Mothur with the distance based clustering method (dgc) (cluster). 491 Individual sequences were taxonomically classified using the rdp classifier method (classify.seqs, consensus 492 confidence threshold set to 80) and the greengenes 16S rDNA database (13_8 release). Each OTU was 493 taxonomically classified (classify.otu, consensus confidence threshold set to 66), non-bacterial OTUs and OTUs 494 with unknown taxonomy at the kingdom level were removed, as were low abundance OTUs (< 50 reads, 495 split.abund). The confidence of OTU classification to the genus Pseudomonas was at least 97%. The most 496 abundant sequence within each Pseudomonas OTU was selected as the OTU representative for phylogenetic 497 analyses. 498 All statistical analyses were conducted in R 3.2.3 (Team and Others, 2010). In order to avoid zero 499 values, relative abundance data was transformed using a log (x+a) formula where a is the minimum value of the 500 variable divided by two. Normality after transformation was assessed using Shapiro Wilk's normality test. 501 Factors influencing Pseudomonas relative abundance were studied using multi-factorial ANOVA. When 502 necessary, sites PFN and K6911 were excluded from the analysis, as they had missing data points (Fig. 1a). 503 Mean differences were further verified with Wilcoxon's non-parametric test. Differences between Pseudomonas 504 populations were assessed by calculating Bray-Curtis dissimilarities between samples using the "vegdist" 505 function of the vegan package (Oksanen et al., 2007). These distances were used for principal coordinates 506 analysis using the "dudi.pco" function of the ADE4 package (Dray et al., 2007), and for PERMANOVA to study 507 the effect of different factors on the structure of Pseudomonas populations using the "Adonis" function of the 508 vegan package.\ 509 The 16S rDNA analysis of 192 plants in Eyach for which also metagenomic shotgun data were 510 generated (see below) involved the amplification of the v4 region using the published primers 515F-806R 511 (Schmidt et al., 1991) on an Illumina Miseq instrument with 2x250 bp paired end reads, which were 512 subsequently merged. Sequences were clustered with uclust (Edgar, 2010), 99% identity, and taxonomically 513 assigned using the RDP taxonomical assignation (Wang et al., 2007). 103 OTUs were assigned to the genus 514 Pseudomonas. One OTU aligned with 100% identity over its entire length to OTU5, the most abundant OTU 515 identified in the cross-population survey of the v3-v4 region described above. 516 517 Metagenomic assessment of bacterial load 518 Total DNA was extracted from flash-frozen rosettes by pre-grinding the frozen plant material to a powder 519 using a mortar and pestle lined with sterile (autoclaved) aluminum foil and liquid nitrogen as needed to keep 520 the sample frozen. Between 100 mg and 200 mg of plant material were then transferred with a sterile spatula 521 to a 2 mL screw cap tube (Sarstedt) containing 0.5 mL of 1 mM garnet rocks (BioSpec). To this, 800 µL of 522 room temperature extraction buffer was added, containing 10 mM Tris pH 8.0, 10 mM EDTA, 100 mM NaCl, 523 and 1.5% SDS. Lysis was performed in a FastPrep homogenizer at speed 6.0 for 1 minute. These tubes were 524 spun at 20,000 x g for 5 minutes, and the supernatant was mixed with ⅓ volume of 5M KOAc in new tubes to 525 precipitate the SDS. This precipitate was in turn spun at 20,000 x g for 5 minutes and DNA was purified from Bacterial DNA, both genomic and plasmid, was extracted using the Puregene DNA extraction kit (Invitrogen). 548 Single bacterial colonies were grown overnight in Luria broth+100 µg/mL Nitrofurantoin in 96-well plates. 549 Plates were spun down for 10 minutes at 8000g, then the standard Puregene extraction protocol was followed. 550 The capacity of the protocol to extract plasmid DNA was verified by extracting the DNA from a strain whose 551 plasmids were previously identified (Pst DC3000) (Buell et al., 2003). Primers specific to these plasmids 552 successfully amplified the puregene-extracted sample. interactions with the host, we augmented the Prokka annotation with several additional annotation sets. We 571 predicted genes on the raw genome FASTA sequences using AUGUSTUS-3.3 (Stanke and Waack, 2003) and -572 genemodel=partial -gff3=on -species=E_coli_K12 settings. The protein sequence of each predicted gene was 573 extracted using a custom script. 574 We annotated effectors using BLASTP-2.2.31+ (Altschul et al., 1990) specifying the AUGUSTUS 575 predicted proteomes as query input and the Hop database (http://www.Pseudomonas-syringae.org/T3SS-576 Hops.xls) as reference database. We filtered the BLASTP results with a 40% identity query to reference 577 sequence threshold, a 60% alignment length threshold of query to reference sequence and a 60% length ratio 578 threshold of query and reference sequence (empirically determined). Hits of interest were manually extracted 579 and controlled using online BLASTP and NCBI conserved domain search. 580 Toxins and phytohormones were annotated using the same BLASTP settings as described for effectors. 581 We used custom NCBI protein databases including a set of genes involved in the toxin synthesis pathway. A 582 strain was scored as toxin pathway encoding if all selected components of a pathway were present. Hrp-hrc 583 clusters were also annotated using the formerly described BLAST and filtering settings and P. syringae pv tomato 584 DC3000 and P. viridiflava PNA 3.3 as reference sequences. 585 586

Pan-genome analysis and phylogenetics 587
The panX pan-genome pipeline was used to assign orthology clusters (Ding et al., 2018) and build alignments of 588 these clusters that were then used for phylogenetic analysis in RAxML (Stamatakis et al., 2005). The parameters used were the following: divide-and-conquer algorithm (-dmdc) was used on the diamond 590 clustering, a subset size of 50 was used in the dmdc (-dcs 50), a core genome cutoff of 70% (-cg 0.7). 591 Whole-genome phylogenies of the strains were constructed using RAxML (Stamatakis et al., 2005) 592 using the gamma model of rate heterogeneity and the generalized time reversible model of substitution. The 593 phylogenies were built from all SNPS present in the concatenated core genomes of strains identified by panX. 594 Eight-hundred and seven genes were considered as core. We performed 100 bootstrap replicates in RAxML to 595 establish the confidence in the full tree. 596 Within the 1355 isolates belonging to OTU5, 107 distinct strains were represented. One 597 representative of each strain was picked at random, then recombination importation events were identified 598 among these 107 strains using ClonalFrameML (Didelot and Wilson, 2015). ClonalFrameML estimated a high 599 recombination rate within OTU5, estimating that a substitution in the tree was six times more likely to result 600 from a recombination event than a mutation event. Specifically, ClonalFrameML estimated the following 601 parameters: the 1/δ parameter (inverse importation event tract length in bp) was estimated as 7.79x10 -3 /bp 602 (var=2.18014 -9 ) and the Posterior Mean ratio between the probability of recombination (R) and the nucleotide 603 diversity, θ, was R/θ =1.19 (var=5.07x10 -5 ). The estimated sequence divergence between imported tracts and 604 the acceptor genome ν=0.04 (var=1.13x10 -8 ). The relative effect of recombination over mutation r/m=(R/ θ) x 605 ν x δ = 6.18. Recombination tracts were removed from the alignments, and the remaining putatively non-606 recombined strict core genes (present in all 107 genomes) were used for subsequent dating of coalescence. 607 To estimate the age of OTU5 we considered only those ortholog groups that were conserved across 608 all 107 OTU5 strains. These orthologs were concatenated and ClonalFrameML (Didelot and Wilson, 2015) was 609 used to identify recombination tracts that could inflate the branch length of members of the OTU as described 610 above. TMRCA of the OTU was estimated by calculating the mid-point-root to tip sequence divergence for a 611 representative of all 107 strains within OTU5, then dividing the median value of this distance by the neutral 612 substitution rate (Kimura, 1968) (we used here the point estimate of 8.7x10 -8 with our estimate of sd=6.0x10 -8 613 (McCann et al., 2017)). While we consider all sites (degenerate and non-degenerate) in the putatively non-614 recombined core, in addition to the fact that substitution rate is likely inaccurate for the longer timescale 615 analysed in the present study, both of these inaccuracies would likely lead to the underestimation of the age. 616 617

Pathogenicity assays 618
The plant genotype Eyach 15-2 (CS76399), collected from Eyach, Germany, was previously determined to 619 represent a plant genetic background common to the geographical region. Seeds were sterilized by overnight 620 incubation at -80°C, then 4 hours of bleach treatment at room temperature (seeds in open 2 ml tube in a 621 desiccator containing a beaker with 40 ml Chlorox and 1 ml HCl (32%)). The seeds were then stratified for 622 three days at 4°C in the dark on ½ MS media. Plants were grown in 3-4 mL ½ MS medium in six-well plates in 623 long-day (16 hours) at 16°C. 12-14 days after stratification, plants were infected with single bacterial strains. 624 Bacteria were grown overnight in Luria broth and the relevant antibiotic (either 10 µg/mL of 625 Kanamycin or Nitrofurantoin), diluted 1:10 in the morning and grown for 2 additional hours until they entered 626 log phase. The bacteria were pelleted at 3500 g, resuspended in 10 mM MgSO4 to a concentration of 627 OD600=0.01. 200 µl of bacteria were drip-inoculated with a pipette onto the whole rosette. Plates were sealed 628 with parafilm and returned to the growth percival. Seven days after infection, whole rosettes were cut from 629 the plant and fresh mass was assessed. 630 For growth assays of dead bacteria, we performed growth and dilution of bacteria as above, then 631 boiled the final preparation at 95°C for 38 minutes. Plants were treated with the dead bacteria in the same 632 manner as described above. 633 902   903   Table S1. Collection Locations and dates for all samples.  904  Table S2. Genes annotated in plant-associated pathways and amino acid sequence of flg22 variants in OTU5. 905 Fig. S1. Changes in Pseudomonas populations colonizing A. thaliana leaves. 906     Fig. S2. Pseudomonas OTUs and whole-genome phylogeny. 16S rDNA sequences were extracted 936 from 1,524 Pseudomonas isolates with whole-genome sequences. Clustering based on 99% or 97% sequence 937 similarity of the v3-v4 region of the 16S rDNA is compared to an ML core genome phylogenetic tree. Colors 938 indicate group, ranked by abundance. The most abundant group corresponds to OTU5 (Bordeaux color), with 939 clustering at 99% sequence identity being more consistent with the core genome tree than 97% clustering. 940 Related to Fig. 1, 2 and S6. Toxin and phytohormone distribution. Toxins and phytohormones were annotated in the 960 1,524 genomes with a custom database and the genetic elements described in Table S2. Related to Fig. 6. 961