Comparative Genomics Identified a Genetic Locus in Plant-Associated Pseudomonas spp. That Is Necessary for Induced Systemic Susceptibility

Microbiome-associated bacteria can have diverse effects on health of their hosts, yet the genetic and molecular bases of these effects have largely remained elusive. This work demonstrates that a novel bacterial locus can modulate systemic plant immunity. Additionally, this work demonstrates that growth-promoting strains may have unanticipated consequences for plant immunity, and this is critical to consider when the plant microbiome is being engineered for agronomic improvement.

been attributed to production of the antibiotic 2,4-diacetylphloroglucinol (DAPG) (4). However, the molecular basis of many traits, such as induced systemic resistance (ISR), has remained elusive, and multiple distinct bacterial traits, including production of siderophores, lipopolysaccharide (LPS), and salicylic acid, have all been implicated (5).
We previously reported two Pseudomonas spp. that elicit induced systemic susceptibility (ISS) on Arabidopsis and can promote growth under nutrient-limiting conditions (6,7). These Pseudomonas strains suppress a subset of salicylic acid (SA)-dependent responses and promote resistance to herbivores (7). Although it is possible that ISS-inducing strains contain multiple genetic loci that affect plant growth and pathogen resistance, we hypothesized that a single bacterial trait may be responsible for both the growth and immunity phenotypes of ISS strains. Growth and immunity have a reciprocal relationship in plants, leading to growth-defense tradeoffs to the extent that plant stunting has been used as a proxy for autoimmunity (8). As a result, we hypothesized that suppression of plant immunity by Pseudomonas strains that trigger ISS may be a consequence of PGP activity. The genomes of ISS strains do not contain genes for the ACC (1-aminocyclopropane-1-carboxylate) deaminase enzyme prevalent in other Pseudomonas PGP strains (3); thus, we hypothesized that there may be a distinct mechanism of growth promotion in these strains.
Because of the high density of sampling and genome sequencing within P. fluorescens and related species, we reasoned that if ISS is an overlooked consequence of growth promotion, then (i) we should be able to identify additional ISS strains by sampling known PGP strains and additional root-associated strains, and (ii) assuming that a single unique locus was responsible, a comparative genomics approach should reveal the underlying genetic basis of ISS.
Here, we report that ISS is relatively common among Pseudomonas strains within the P. fluorescens species complex. We identified new ISS isolates, including previously described PGP or environmental isolates and new isolates from Arabidopsis roots. Using comparative genomics, we identified a single bacterial locus that is unique to Pseudomonas ISS strains. We show that the putative ISS locus is necessary to elicit ISS. While the function of genes in the locus remains elusive, a subset have previously been implicated in pathogenesis, and we found that the locus contributes to rhizosphere growth. Collectively, these data indicate that a single microbial locus contributes to a systemic immune response in a plant host.

RESULTS
ISS is a common feature of growth-promoting Pseudomonas spp. We previously reported that two strains of Pseudomonas (CH229 and CH267) elicit induced systemic susceptibility (ISS) to the foliar pathogen Pseudomonas syringae pv. tomato DC3000 under conditions where a well-characterized ISR strain (Pseudomonas simiae WCS417 [9]) conferred resistance to P. syringae pv. tomato DC3000 (6,7). To the best of our knowledge, descriptions of Pseudomonas-elicited ISS against bacterial pathogens are limited to Pseudomonas sp. strains CH229 and CH267, which were independently isolated from the rhizospheres of wild Arabidopsis plants in Massachusetts (USA). We reasoned that if ISS is common among Arabidopsis-associated Pseudomonas spp., we would be able to identify additional ISS strains from roots of Arabidopsis plants growing at distinct sites.
We isolated 25 new fluorescent pseudomonads from wild-growing Arabidopsis plants from additional sites in Massachusetts and in Vancouver, Canada. We generated ϳ800-bp sequences of a region of the 16S rRNA gene where strains CH229 and CH267 are 99.5% identical, but each has only Ͻ96% identity to the well-characterized ISR strain WCS417. Reasoning that new ISS strains would be closely related to CH267 and CH229, we selected 3 new isolates (1 from Massachusetts [CH235] and 2 from British Columbia [PB101 and PB106]) that were Ͼ97% identical to CH267 by 16S rRNA sequencing and another 3 (from British Columbia; PB100, PB105, and PB120) that were Ͻ97% identical to CH229 and CH267 (see Fig. S1 in the supplemental material). We tested these 6 new rhizosphere Pseudomonas isolates for their ability to trigger ISS.
Consistent with the hypothesis that ISS may be common among closely related PGP Pseudomonas strains, we found that 2 of the 3 strains that were most closely related to CH267 (CH235 and PB101) elicited ISS (Fig. 1). Two strains with Ͻ96% identity to CH267 failed to trigger ISS: PB105 triggered ISR, and PB100 had no effect on systemic defenses (Fig. 1). PB106 and PB120 consistently enhanced susceptibility in all experiments, but to a more moderate degree (P Ͻ 0.1). Collectively, these data indicate that the ability to elicit ISS on Arabidopsis ecotype Col-0 may be a common feature among some, but not all, closely related strains of Pseudomonas spp. isolated from the Arabidopsis rhizosphere.
Because ISS seemed restricted to strains that were closely related to CH267, we obtained several additional isolates with similar 16S rRNA sequences, including Pseudomonas sp. strain UW4, Pseudomonas sp. strain Pf0-1, and Pseudomonas vancouverensis strain DhA-51. We also tested a growth-promoting strain, Pseudomonas sp. strain WCS365, that is more distantly related and to our knowledge has not been tested for ISR/ISS (Table 1). We found that UW4 and DhA-51 elicited ISS, while Pf0-1 and WCS365 did not (Fig. 1). Pseudomonas sp. strains UW4 (10) and WCS365 are well-characterized growth-promoting strains. Pseudomonas sp. strain Pf0-1 (11) is an environmental isolate. Pseudomonas vancouverensis strain DhA-51 is also an environmental isolate (12) and was previously shown to be closely related to Pf0-1 (13). Because DhA-51 is an environmental isolate that triggers ISS, these data show that the ability to trigger ISS is not specific to rhizosphere isolates.
To gain insights into the distinguishing features of ISS strains, we sequenced the genomes of the 6 new isolates (CH235, PB100, PB101, PB105, PB106, and PB120) from Arabidopsis roots as well as P. vancouverensis DhA-51 (UW4, WCS365, CH267, and CH229 had been sequenced previously). Whole-genome sequencing was used to assemble draft genomes (see Materials and Methods). We generated a phylogenetic tree using 122 conserved genes as described previously (7,14). We found that all ISS strains are closely related to one another and fall within a monophyletic group which corresponds to the Pseudomonas koreensis, P. jessenii, and P. mandelii subgroups of P. fluorescens identified in a recent phylogenomic survey of Pseudomonas spp. (Fig. 2B)  not every isolate in this clade is an ISS strain; notably, Pf0-1, which has no effect on systemic immunity despite being closely related to CH229, is not an ISS strain. We reasoned that the absence of the ISS phenotype in Pf0-1 should facilitate the use of comparative genomics by allowing us to separate the phylogenetic signature from the phenotypic signature of ISS strains. Eleven genes in a single genomic locus are unique to ISS strains and predict ISS. To identify the potential genetic basis of the ISS phenotype, we used a previously described database of orthologous genes for Pseudomonas spp. (14) to identify genes that are present in ISS strains (CH229, CH235, CH267, and UW4) but are absent in the closely related strain that has no effect on systemic defenses (Pf0-1). We used only the ISS strains with the most robust phenotypes for this analysis. We identified 29 predicted protein-coding genes that were absent in Pf0-1 but present in all of the other strains. Of these, 12 were small (Ͻ100 amino acids [aa]) hypothetical proteins. The remaining 17 predicted protein-coding genes were prioritized for further analysis and are shown in Table S1. Intriguingly, 11 of the 17 ISS unique genes are found in a single genomic locus.
We surveyed the genomes of other Pseudomonas strains tested for ISS to determine if the presence of the 17 genes identified by our comparative genomics approach correlated with the ISS phenotype. We found that the 11 clustered genes were present in ISS strains (DhA-51 and PB101) and the strains with intermediate phenotypes (PB120 and PB106) but were absent in the non-ISS strains WCS365, WCS417, and PB105 (Fig. S2). The remaining 6 genes were all present in WCS365 and/or other non-ISS strains (Fig. S2). We chose to focus on the 11 ISS-unique genes (referred to here as "ISS locus") for further study.
We found that the 11 genes in the ISS locus are found at a single genomic locus in all 4 of the ISS strains ( Fig. 2A; also, see Fig. S3 in the supplemental material). The flanking regions are conserved in the non-ISS strain Pf0-1 ( Fig. 2A), indicating a recent insertion or deletion event. Within this locus, there is a single gene that is conserved in Pf0-1 in addition to two genes that are unique to each individual strain, suggesting multiple changes to this genomic region in recent evolutionary history. While all 11 genes are within the same genomic region in the ISS strains, the variability of this locus between closely related strains suggests that it may be rapidly evolving.
We surveyed the genomes of sequenced isolates available in our collection for the presence of the ISS locus. We found a number of closely related strains from various environmental sources that contained the ISS locus, as well as a more distantly related strain (Pf-5) (Fig. 2B). We tested 2 strains that contain the ISS locus (Pf-5 and GW456-L13) as well as 2 that do not (FW300-N1B4 and FW300-N2C3) and found that the presence of the ISS locus correlated with the ISS phenotype, including the distantly  Collectively, these data show that the presence of the 11 candidate genes in the ISS locus identified by our comparative genomics approach is predictive of the ISS phenotype.

The ISS locus is necessary for ISS.
To test if the ISS locus is necessary for ISS strains to induce systemic susceptibility, we deleted the entire 15-kb locus, including the region spanning the 11 genes identified in our initial comparative genomics screen in strains CH267 and UW4 ( Fig. 2A). We tested these deletion mutants for their ability to induce systemic susceptibility and found that deletion of the entire 11-gene locus (ΔISSlocus) resulted in a loss of the ISS phenotype in both CH267 and UW4 (Fig. 3A). This indicates that the ISS locus is necessary for ISS.
The functions of the majority of the genes in the ISS locus are not apparent based on similarity to genes of known function. A predicted 2,544-bp gene was annotated in the CH267 genome and others as speE2 due to the similarity of the predicted C terminus to that of the well-characterized spermidine synthase gene speE1 (Ppu-tUW4_02826 and CP336_12795 in UW4 and CH267, respectively). CH267 speE2 has similarity to the characterized spermidine synthase gene speE in P. aeruginosa (25% predicted amino acid identity to P. aeruginosa PA1687 [16]). A second speE-like gene in the genomes of UW4 and CH267, annotated as speE1, is outside the ISS locus (Ppu-tUW4_03691 and CP336_28780 in UW4 and CH267, respectively) and is highly similar to the P. aeruginosa speE gene (ϳ84.0% predicted amino acid identity) (16).
To test if the speE2 gene is necessary for ISS, we also constructed an in-frame deletion of just the speE2 gene in both CH267 and UW4. We found that deletion of speE2 abolished the ISS phenotype in both CH267 and UW4 (Fig. 3A). To determine if speE2 is the only gene within the ISS locus that is necessary for induction of ISS, we generated a complementation plasmid where the CH267 speE2 gene is expressed under the control of the lac promoter (p lac -speE2). We introduced this plasmid into the ⌬speE2 deletion and ΔISSlocus deletions in CH267. While p lac -speE2 complemented the CH267 ⌬speE2 deletion, it failed to complement the ΔISSlocus deletion (Fig. 3B), indicating that speE2 is not the only gene within the ISS locus that is required for ISS. core Pseudomonas genes. Genome sequencing of new strains shows that the island is present in strains that enhance susceptibility but not in those that trigger ISR or have no effect. (C) Two strains with the island (GW456-L13 and Pf-5) and two without (N1B4 and N2C3) were tested for ISS/ISR. Only those with the island significantly enhanced susceptibility. Data are averages for 3 biological replicates, with 2 leaves from each of 4 plants (n ϭ 8) per experiment. Means Ϯ SEM are shown. Letters indicate P Ͻ 0.05 by ANOVA and Tukey's HSD test. Because deletion of speE2 in CH267 and UW4 results in the specific loss of the ISS phenotype, the speE1 and speE2 genes are not functionally redundant. speE1 and speE2 differ in length and predicted structure (Fig. 4A). speE1 encodes a predicted 384-aminoacid protein and contains a predicted polyamine synthase domain with a predicted decarboxylated S-adenosylmethionine (dSAM) binding motif. speE2 encodes a protein predicted to have 847 amino acids. Similar to speE1, the C terminus of the speE2 product contains a predicted dSAM-binding domain; however, the product of speE2 contains predicted transmembrane domains at its N terminus (Fig. 4A). Spermidine synthases generate spermidine by transferring the aminopropyl group of dSAM to putrescine. Previous structural and mutagenesis analysis on human and Thermatoga maritima SpeE1 enzymes revealed common residues important for catalysis (D276, D279, D201, and Y177 in human SpeE1 and the corresponding residues D173, D176, D101, and Y76 in T. maritima SpeE1) (17,18). The catalytic mechanism was proposed to be initiated by the deprotonation of the putrescine amino group by the conserved aspartic acid D276 or D173 with the aid of the side chains of D201 or D101 and Y177 or Y76 as well as the main-chain carbonyl of L277 or S174, setting up a nucleophilic attack on dSAM. In addition, residue D279 or D176 is thought to play a role in substrate binding (17,18).
To determine if SpeE2 has the potential to be a spermidine synthase, we performed an amino acid sequence alignment to see if the catalytic residues from classic spermidine synthases are conserved in SpeE2. We found that although the tyrosine residue is conserved, SpeE2 consists of different residues at the corresponding aspartic acid positions. The proposed catalytic residue D276 or D173 in the human and T. maritima enzymes corresponds to E624 in SpeE2, while residues D201 or D101 and D279 or D176 have been converted to T556 and P627 (Fig. 4B). Furthermore, we generated a sequence similarity network for SpeE2 with enzymes found in the PF17284 protein family and found that SpeE2 belongs to a distinct cluster away from any functionally characterized enzymes (Fig. 4C). Interestingly, the SpeE2 active-site residue substitutions are almost completely conserved within and unique to the SpeE2 cluster (Fig. 4C), suggesting that while Pseudomonas sp. strain CH267 SpeE2 is unlikely to act as a spermidine synthase, it may have a distinct function.
Additional roles for the ISS locus in host interactions. While speE2 is necessary for ISS, the failure of ΔspeE2 to complement the 11-gene ISS locus deletion (Fig. 3C) indicates that at least one other gene in the ISS locus is likely required for ISS. We tested whether speE2 is always associated with the same larger locus across the genus Pseudomonas. When we analyzed our entire computational data set of Ͼ3,800 genomes from across Pseudomonas, we found that there was a strong correlation for the presence or absence of 9 of 11 genes (r Ͼ 0.9) (Fig. 5A). Moreover, these 9 co-occurring genes were frequently found in the same genomic region, as there were moderate to strong correlations for 9 of the 11 genes co-occurring in the same 50-kb genomic region (Fig. 5B). From a phylogenomic standpoint, we found that these genes were broadly distributed throughout the genus Pseudomonas and co-occurred even in taxonomic groups far outside the P. fluorescens clade (Fig. 5C). Within the P. fluorescens clade, the ISS locus genes are frequently found in some clades, such as the P. koreensis and P. jessenii clades, which contain most of our isolates (Fig. 5D). However, some clades are missing these genes entirely, such as the plant-associated Pseudomonas corrugata clade (Fig. 5D). Together, these genomic data indicate that despite their polyphyletic distribution among divergent clades of Pseudomonas spp., the genes in the ISS locus likely participate in conserved or similar functions.
Within the 9 genes that have a high frequency of co-occurrence, we identified a 6-gene predicted operon in the ISS locus with identical domain structure and organization that is involved in stress resistance and virulence in Francisella tularensis (19) (Fig. 6A). Another similar operon is associated with aerotolerance and virulence in Bacteroides fragilis (20). Returning to our comparative genomics database, we found that these 6 genes constitute an operon that is broadly conserved in the Pseudomonas clade and is paralogous to the 6-gene operon in the ISS locus (Fig. 6A). This raises the Multiple-sequence alignment of predicted amino acid sequence of CH267 SpeE2 and the relatively distantly related Pf-5 SpeE2 gene along with SpeE1-like proteins from CH267, E. coli, Homo sapiens, and Arabidopsis thaliana. Although the catalytic (blue) and binding site (yellow) residues are conserved in all SpeE1 homologues, both SpeE2 genes have changes in these regions (gray). (C) Sequence similarity network (SSN) of SpeE2 and protein sequences found with the PFAM domain code PF17284. Sequences that have the conserved residues D201/D101, D276/D173, and D279/D176 similar to the human and T. maritima SpeE1 are blue, while sequences that had conserved residues T556, E624, and P627 similar to SpeE2 are red. Clusters with only 1 sequence were removed for simplicity. Beskrovnaya et al. ® possibility that these six genes within the ISS locus contribute to host-bacterial interactions across diverse bacterial taxa and both plant and animal hosts (Fig. 6A).

DISCUSSION
Plant root-associated ("rhizosphere") microbes perform a diversity of functions that benefit their plant hosts, including nutrient uptake and defense. Functional characterization of individual plant-associated bacterial and fungal strains of potential agronomic importance (i.e., growth promoters or nitrogen fixers) is widespread (5). However, closely related strains of bacteria can have very distinct effects on plant growth and defense (13), and these effects can be dependent on environmental context (1). The lack of known correlations between microbial genotype and potential effects on plant hosts presents a challenge when one is attempting to infer the effect that a microbe may have on its plant host from sequence identity alone.
Our use of comparative genomics and isolate phenotyping to identify the genetic basis of a complex microbial-derived trait indicates that this is an effective approach to identifying important microbial traits to improve plant health. For comparative genomics to be effective, traits should be controlled by single or limited genomic loci, and phylogeny should not be predictive of function. In this case, a close relative of ISS strains, Pseudomonas sp. strain Pf0-1 (Ͼ99% identical by full-length 16S rRNA to the ISS strains), does not affect systemic defenses (Fig. 1), which allowed us to use comparative genomics to identify the underlying basis. We previously used this approach to find the genomic basis of a pathogenic phenotype within a clade of commensals (14). It has been previously observed that phylogeny is not predictive of function for ISR strains (13), suggesting that comparative genomics may be appropriate to find the basis of additional plant-associated traits.
We found that the ISS locus contains genes involved in both triggering ISS and promoting rhizosphere colonization. Loss of the entire locus results in a loss of ISS and a decrease in growth in the rhizosphere; however, loss of speE2 impairs ISS but not rhizosphere growth, suggesting that there may be multiple plant association functions encoded in this locus. The functions of the speE2 gene and other genes within the ISS locus are not readily apparent from similarity of their products to previously characterized enzymes. As spermidine and other polyamines should directly enhance plant resistance through generation of reactive oxygen species (ROS) (21), it is possible that the speE2 enzyme converts spermidine or another polyamine to a non-defenseinducing molecule. The highly conserved nature of the portions of speE2-like genes encoding active-site residues suggests a novel function in this enzyme. An operon with a similar configuration is also present in mammalian pathogens and has been implicated in virulence. (B) The ISS locus, but not the speE2 gene, promotes rhizosphere colonization. We tested the ΔISSlocus and ⌬speE2 mutant in CH267 and UW4 using a 48-well plate-based rhizosphere colonization assay. Data shown are from 5 days postinoculation. *, P Ͻ 0.05 between mutants in a genetic background by ANOVA and Tukey's HSD test; ns, not significant.
While enhancement of systemic susceptibility is not an obviously agronomically useful plant trait, several ISS strains promote growth and enhance resistance to insect pests (6,7). Using ISS strains might be beneficial for crops where insects are the primary pressure on crop productivity. However, the ubiquity of ISS elicited by plant growthpromoting strains illustrates the complexity of host-microbe interactions and should be considered when the microbiome is being engineered.

MATERIALS AND METHODS
Plant growth conditions. For all experiments, plants were grown in Jiffy-7 peat pellets (Jiffy Products) under a 12-h light/12-h dark cycle at 22°C. Seeds were surface sterilized by washing with 70% ethanol for 2 min followed by 5 min in 10% bleach and 3 washes in sterile water. Seeds were stored at 4°C until use. Unless otherwise indicated, seeds were sowed in peat pellets (Jiffy-7) and placed in a growth chamber with 12-h days and 75 M cool white fluorescent lights at 23°C.
Bacterial growth and 16S rRNA sequencing. Pseudomonas strains were cultured in LB or King's B at 28°C. New Pseudomonas strains were isolated from the roots of wild-grown Arabidopsis plants in eastern Massachusetts and British Columbia as previously described (6). New Pseudomonas isolates were preliminary identified based on fluorescence on King's B and confirmed by 16S rRNA sequencing.
ISS assays. ISS and ISR assays were performed as described elsewhere (7,22). Briefly, Pseudomonas rhizosphere isolates were grown at 28°C in LB medium. For inoculation of plant roots for ISR and ISS assays, overnight cultures were pelleted, washed with 10 mM MgSO 4 and resuspended to a final optical density at 600 nm (OD 600 ) of 0.02. Jiffy pellets were inoculated 9 days after seed germination with 2 ml of the appropriate bacterial strains at a final OD 600 of 0.02 (5 ϫ 10 5 CFU/g Jiffy pellet). For infections, the leaves of 5-week-old plants were infiltrated with P. syringae pv. tomato DC3000 at an OD 600 of 0.0002 (starting inoculum, ϳ10 3 CFU/cm 2 leaf tissue). Plants were maintained under low light (Ͻ75 M) and high humidity for 48 h. Leaf punches were harvested, ground, and plated to determine CFU counts.
16S rRNA sequencing, bacterial genome sequencing, assembly, and phylogenomics. Bacterial DNA preparations were made using Qiagen Purgene kit A. 16S rRNA was amplified using 8F and 1391R and sequenced using 907R. Bacterial genomic library preparation and genome sequencing were performed as previously described (7). Briefly, bacterial DNA was isolated using Qiagen Purgene kit A and sonicated into ϳ500-bp fragments. Library construction was performed as previously described (7). Each genomic sample was individually indexed, pooled, and sequenced using MiSeq V3 paired-end 300-bp reads. After barcode splitting, approximately 500,000 to 1 million reads were used for each sample to assemble draft genomes of the Pseudomonas strains CH235, PB100, PB101, PB105, PB106, and PB120 and P. vancouverensis DhA-51. Genome assembly was carried out as previously described (7).
Phylogenomic tree building. To generate the 29-taxon species tree used in Fig. 2B and Fig. 4E, we made use of an alignment of 122 single-copy genes we previously found to be conserved in all Pseudomonas strains (14). From this amino acid alignment, we extracted 40,000 positions, ignoring sites where Ͼ20% of the taxa had gaps. Using RAxMLv8.2.9, we inferred 20 independent trees under the JTT substitution model using empirical amino acid frequencies and selected the one with the highest likelihood. Support values were calculated through 100 independent bootstrap replicates under the same parameters.
To build the 3,886-taxon phylogeny of the Pseudomonas genus in Fig. 5C and Fig. S1, the same 122-gene alignment was used. For computational feasibility, the alignment was randomly subsampled to 10,000 amino acid positions, again ignoring sites that were highly gapped (Ͼ20%). FastTree v2.1.9 was used to build the phylogeny using default parameters. The phylogeny was rooted to a clade of Pseudomonas identified as an outgroup to all other Pseudomonas spp. as previously described (14). To more easily visualize this tree, we collapsed monophyletic clades with strong support (as determined by FastTree's local Shimodaira-Hasegawa test) that correspond with major taxonomic divisions identified by Hesse et al. (15).
To build the tree for the Pseudomonas fluorescens subclade seen in Fig. 5D and Fig. S2, we identified 1,873 orthologs specific to the P. fluorescens clade found in Ͼ99% of all strains in the clade and then aligned them all to the hidden Markov models generated by PyParanoid using hmmalign, prior to concatenation. This alignment had 581,023 amino acid positions, which we trimmed to 575,629 positions after masking sites with Ͼ10% of taxa with gaps. From this alignment, we randomly subsampled 120,000 sites for our final phylogenomic data set. Using RAxMLv8.2.9, we inferred 20 independent trees in the JTT substitution model using empirical amino acid frequencies and selected the one with the highest likelihood. Support values were calculated through 100 independent bootstrap replicates under the same parameters.
Comparative genomics. Comparative genomics analyses were performed by using a previously described framework for identifying PyParanoid pipeline and the database we built for over 3,800 genomes of Pseudomonas spp. Briefly, we had previously used PyParanoid to identify 24,066 discrete groups of homologous proteins which covered Ͼ94% of the genes in the original database. Using these homolog groups, we annotated each protein-coding sequence in the newly sequenced genomes and merged the resulting data with the existing database, generating presence-absence data for each of the 24,066 groups for 3,886 total Pseudomonas genomes.
To identify the groups associated with induction of systemic susceptibility, we compared the presence-absence data for 4 strains with ISS activity (Pseudomonas strains CH229, CH235, CH267, and UW-4) and 1 strain with no activity (Pseudomonas strain Pf0-1). We initially suspected that ISS activity was due to the presence of a gene or pathway (i.e., not the absence of a gene) and thus initially focused on Genetic Basis of Induced Systemic Susceptibility ® genes present only in Pf0-1. We identified 29 groups that were present in the 4 ISS strains but not in Pf0-1.
To obtain the correlation coefficients in Fig. 4D and Fig. 5A, we coded group presence or absence as a binary variable and calculated Pearson coefficients across all 3,886 genomes. To calculate the correlation coefficients in Fig. 5B, we split the genomic database into 50-kb contiguous regions and assessed group presence or absence within each region. Because this data set is heavily zero inflated, we ignored regions that had none of the 11 groups, taking the Pearson coefficient of the 11 genes over the remaining regions.
Initial annotation of the ISS groups was based on generic annotations from GenBank. Further annotation of the 11 groups specific to the ISS locus was carried out using the TMHMM v2.0 server, the SignalP 4.1 server, and a local Pfam search using the Pfam-A database from Pfam v31.0. To identify homologous genes in the genomes of Francisella tularensis subsp. holarctica and Bacteroides fragilis YCH46, we relied on locus tags reported in the literature, which we confirmed using annotation based on another Pfam-A domain search.
Deletion of the speE2 gene and 11-gene ISS locus. Deletions in strains CH267 and UW4 were constructed by a two-step allelic exchange as described elsewhere (23). The flanking regions directly upstream and downstream of the 11-gene ISS locus or the speE2 gene were amplified and joined by overlapping PCR using genomic DNA as the template and primers listed in Table 2. Following digestion, the product was ligated into the pEXG2 suicide vector that contains the sacB gene for counterselection on sucrose (24). The recombinant plasmid was then transformed into calcium-competent Escherichia coli DH5␣ by heat shock. After confirmation of correct insertion by PCR and sequencing, the plasmid was transformed into WM3064 (25). Conjugation of the plasmid into CH267 and UW4 from WM3064 was performed by biparental mating on King's B medium supplemented with diaminopimelic acid, and transconjugants were selected using 10 g/ml gentamicin and 15 g/ml nalidixic acid. The second recombination, leading to plasmid and target DNA excision, was selected for by using sucrose counterselection. Gene deletions in CH267 and UW4 were confirmed by PCR amplification of the flanking regions with primers listed in Table 2, agarose gel electrophoresis, and Sanger sequencing.
Complementation of the speE2 gene. The speE2 gene was amplified by PCR using CH267 genomic DNA as the template, as well as the primers listed in Table 2. Following restriction digestion, the ϳ2.6-kb insert was ligated into the pBBR1MCS-2 vector at the multiple-cloning site located downstream of a lac promoter. Ligation mixture was then introduced into E. coli DH5␣ by heat shock, and transformants were selected using LB medium supplemented with 25 to 50 g/ml kanamycin. The presence of the correct