Evolutionary divergence of the Wsp signal transduction system in β- and γ-proteobacteria

Bacteria rapidly adapt to their environment by integrating external stimuli through diverse signal transduction systems. Pseudomonas aeruginosa, for example, senses surface-contact through the Wsp signal transduction system to trigger the production of cyclic di-GMP. Diverse mutations in wsp genes that manifest enhanced biofilm formation are frequently reported in clinical isolates of P. aeruginosa, and in biofilm studies of Pseudomonas spp. and Burkholderia cenocepacia. In contrast to the convergent phenotypes associated with comparable wsp mutations, we demonstrate that the Wsp system in B. cenocepacia does not impact intracellular cyclic di-GMP levels unlike that in Pseudomonas spp. Our current mechanistic understanding of the Wsp system is entirely based on the study of four Pseudomonas spp. and its phylogenetic distribution remains unknown. Here, we present the first broad phylogenetic analysis to date to show that the Wsp system originated in the β-proteobacteria then horizontally transferred to Pseudomonas spp., the sole member of the γ-proteobacteria. Alignment of 794 independent Wsp systems with reported mutations from the literature identified key amino acid residues that fall within and outside annotated functional domains. Specific residues that are highly conserved but uniquely modified in B. cenocepacia likely define mechanistic differences among Wsp systems. We also find the greatest sequence variation in the extracellular sensory domain of WspA, indicating potential adaptations to diverse external stimuli beyond surface-contact sensing. This study emphasizes the need to better understand the breadth of functional diversity of the Wsp system as a major regulator of bacterial adaptation beyond B. cenocepacia and select Pseudomonas spp. Importance The Wsp signal transduction system serves as an important model system for studying how bacteria adapt to living in densely structured communities known as biofilms. Biofilms frequently cause chronic infections and environmental fouling, and they are very difficult to eradicate. In Pseudomonas aeruginosa, the Wsp system senses contact with a surface, which in turn activates specific genes that promote biofilm formation. We demonstrate that the Wsp system in Burkholderia cenocepacia regulates biofilm formation uniquely from that in Pseudomonas species. Furthermore, a broad phylogenetic analysis reveals the presence of the Wsp system in diverse bacterial species, and sequence analyses of 794 independent systems suggest that the core signaling components function similarly but with key differences that may alter what or how they sense. This study shows that Wsp systems are highly conserved and more broadly distributed than previously thought, and their unique differences likely reflect adaptations to distinct environments.

greatest sequence variation in the extracellular sensory domain of WspA, indicating potential adaptations 23 to diverse external stimuli beyond surface-contact sensing. This study emphasizes the need to better 24 understand the breadth of functional diversity of the Wsp system as a major regulator of bacterial 25 adaptation beyond B. cenocepacia and select Pseudomonas spp. 26

Introduction 40
Biofilms are extremely recalcitrant in nature, which is driven largely by the extracellular matrix 41 comprising diverse compounds produced by individual cells [1][2][3][4][5][6][7][8] . This matrix manifests structured 42 community growth and dynamically generates sharp chemical gradients to drive both phenotypic and 43 genetic diversification. Exopolysaccharides (EPS) are a major component of this matrix, and they play a 44 critical role in cell-cell and cell-surface adhesion [9][10][11][12] . EPS production or export is modulated by the 45 second messenger cyclic di-GMP 5,13-21 in many organisms, which promotes opportunistic pathogens like 46 Pseudomonas aeruginosa to persist for years in the lungs of cystic fibrosis patients and cause extensive 47 damage 18,22,23 . Clinical isolates of P. aeruginosa often display phenotypic heterogeneity as either smooth, 48 mucoid, or small colony variant (SCV) phenotypes 22,[24][25][26] . Sequence analyses of SCVs commonly 49 identify mutations in wsp (wrinkly spreader phenotype) genes that increase the intracellular pool of cyclic 50 di-GMP [27][28][29][30][31] . 51 dataset. The final dataset contained 43 residue pairs where each residue is highly conserved in its 252 respective alignment subset; the residue pair identifies a non-conservative mutation within the H-system 253 compared to the R-system that may be essential in cyclic di-GMP independent H-system signaling. 254

Motif assessment of WspB and WspD: Assessment of conservation scores in WspB and WspD 255
found that the conserved sites of these proteins were not similar. To visualize this difference, we collected 256 the conservation scores of residues with a high conservative threshold (≥ 0.80) that were continuous for 3 257 or more residues. The identified islands of conservation plus 2 flanking residues on both ends were 258 selected from the alignment to generate a sequence logo. Sequence logos were generated using WebLogo The core signaling components of the enteric chemotaxis (Che) system and the Wsp system in 265 Pseudomonas spp. and B. cenocepacia are expected to be mechanistically similarly due to the high 266 sequence conservation of the functional domains (Fig. 1A). However, the overall sequence homology 267 among these three systems is relatively low, and the Wsp proteins between P. fluorescens Pf0-1 and B. 268 cenocepacia HI2424 share greater homology (Table S1). This suggests that the Wsp system in B. 269 cenocepacia HI2424 functions similarly to the Pseudomonas Wsp system, but the former lacks WspR and 270 is instead predicted to phosphorylate WspH, which lacks WspR's GGDEF enzymatic domain required for 271 cyclic di-GMP production 43,44 (Fig. 1). To test if the Wsp system in B. cenocepacia HI2424 regulates 272 cyclic di-GMP production and/or activates an unidentified cognate diguanylate cyclase, we collected cyclic di-GMP production and biofilm formation in Pseudomonas spp. 19,32,39 . B. cenocepacia HI2424 278 mutants utilized here, wspA (S258W) and wspE (D652N), were previously isolated and demonstrated to 279 increase biofilm formation 43,44 . Conversely, the mutation in wspR (A113D) is predicted to terminate Wsp 280 signaling as the mutated protein could no longer be phosphorylated 35 . The mutation in wspH (L135F) has 281 been shown to produce a smooth colony phenotype with defective biofilm formation 43 , suggesting a 282 reduced signaling output, if any at all. 283 We first assessed colony morphologies of the wsp mutants for the iconic wrinkled phenotype that 284 reflects increased cyclic di-GMP production across diverse species 19,27,28,[32][33][34]37,39,[45][46][47] . Mutations in the 285 wspA and wspE genes of both P. fluorescens Pf0-1 and B. cenocepacia HI2424 exhibit similar wrinkled 286 phenotypes, while the wspR mutant of P. fluorescens Pf0-1 and the wspH mutant of B. cenocepacia 287 HI2424 both exhibit smooth morphologies ( Fig. 2A) as expected at low cyclic di-GMP levels. We next 288 quantified the intracellular levels of cyclic di-GMP in the same set of mutants to test whether the altered 289 colony morphologies indeed reflect increased cyclic di-GMP production. As predicted, cyclic di-GMP 290 levels in wspA and wspE mutants of P. fluorescens Pf0-1 are significantly greater than the WT, unlike the 291 wspR mutant (Fig. 2B). In contrast, none of the mutants of B. cenocepacia HI2424 significantly differ in 292 cyclic di-GMP levels compared to the WT (Fig. 2B). This indicates that either the Wsp system in B. 293 cenocepacia HI2424 does not regulate cyclic di-GMP production or that its influence on cyclic di-GMP 294 production is rapidly buffered. Regardless, there is a clear contrast here between the colony morphologies 295 and cyclic di-GMP levels in the two species. Although the wrinkled colony morphology was positively 296 correlated with biofilm production in these B. cenocepacia HI2424 wsp mutants 44 , this appears to be 297 achieved through a cyclic di-GMP independent mechanism. This is not particularly surprising given that 298 WspH lacks a diguanylate cyclase domain and instead possesses a hybrid histidine kinase domain. A 299 recent wsp phenotype suppressor study in B. cenocepacia HI2424 implicated that Bcen2424_1436 300 (RowR) is critical for WspH signaling and subsequent polysaccharide synthesis 44 . RowR is a predicted 301 DNA-binding response regulator of an uncharacterized two-component transduction system, however its 302 direct interaction with WspH remains to be resolved. 303 Despite the overall sequence similarity, the Wsp system in B. cenocepacia HI2424 appears 304 functionally divergent from the WspR system in Pseudomonas. However, comparable wsp mutations in 305 B. cenocepacia HI2424 and clinically persistent Pseudomonas spp. converge on biofilm formation, 306 producing similar clinically relevant phenotypes 22,59,60 . To date, the WspH system has only been reported 307 in B. cenocepacia HI2424 43,44 and its relative uniqueness remains unknown. Similarly, the shared 308 characteristic synteny of the wsp operon has only been described in four Pseudomonas strains 27,28,33,61 , 309 and the extent of its taxonomic distribution also remains unknown. 310 311 Wsp systems are exclusive to the β-and γ-proteobacteria and the WspH system is restricted to 312

Burkholderia 313
To assess the phylogenetic distribution and the evolutionary history of the Wsp system, we 314 constructed a database of Wsp homologs from all publicly available bacterial genomes in GenBank (see 315 Methods). Sequence similarity alone makes it difficult to bioinformatically distinguish Wsp homologs 316 from those that function in chemotaxis ( Fig. 1A) 62,63 . Fortunately, chemotaxis genes are infrequently 317 encoded as a single operon 64,65 in contrast to all annotated wsp genes (Fig. 1B). We thus identified 318 syntenic Wsp homologs of P. fluorescens Pf0-1 and B. cenocepacia HI2424 with at least 30% sequence 319 identity. Our analysis discovered 794 unique wsp gene clusters with conserved wspA-wspF synteny. 320 Importantly, using the wsp genes from P. fluorescens Pf0-1 (Data S1) or B. cenocepacia HI2424 (Data 321 S2) as independent queries generated overlapping results, indicating that synteny is a robust search 322 parameter for identifying previously unannotated Wsp systems. All 794 identified wsp gene clusters were 323 associated with either a wspR homolog (588) or a wspH homolog (206) as observed in P. fluorescens Pf0-324 1 or B. cenocepacia HI2424, respectively (Fig. 1B). We also found that no identified genome contains 325 both H-and R-systems, and either system is present at a single instance per genome. For the sake of 326 simplicity, we refer to wsp clusters with a wspH homolog as H-systems and those with a wspR homolog 327 as R-systems. 328 We next constructed a phylogenetic tree to visualize the taxonomic distribution of the H-and R-329 systems, which reveals that they are present exclusively in β-and γ-proteobacteria (Fig. 3A). The R-330 system is distributed across both the β-and γ-proteobacteria while the H-system is limited to 331 Burkholderia. The twelve Bordetella strains in our dataset form two distinct clades, and three 332 Burkholderia strains (i.e. Proto-Burkholderia) branch out earlier from the remaining Burkholderia and 333 Paraburkholderia clades. Proto-Burkholderia possesses the R-system as do the Paraburkholderia, 334 suggesting that the R-system pre-dates the H-system found exclusively in Burkholderia. Further 335 expanding the Burkholderia clade shows that the R-system is present in only one strain (Fig. 3B)

Phylogenetic incongruence reflects multiple horizontal transfer events of the R-system 341
We evaluated potential horizontal transfer events of the Wsp system by assessing the 342 incongruence between the species and Wsp phylogenies 66,67 . A phylogeny of the 794 Wsp systems was 343 constructed using the concatenated peptide sequences of the core Wsp signaling proteins (WspA, WspB, 344 WspC, WspD, WspE, and WspF), which diverges into five distinct clades (Table S2, Fig. S1). The 345 sequence variations in the Wsp signaling core alone clearly differentiate the H-and R-systems even in the 346 absence of WspH and WspR. Significant differences between the species (Fig. 3) and Wsp (Fig. S1) trees 347 strongly suggest that the Wsp system has been subjected to multiple horizontal transfer events as 348 summarized in Fig. 4. 349 We observe that the R-system likely originated in Azoarcus (clade 1) then radiated throughout 350 the β-proteobacteria and into Pseudomonas (Fig. S1), the sole member of the γ-proteobacteria (Fig. 3). In 351 clade 2, the R-systems of Pseudomonas, Bordetella, and Achromobacter share a common node with 352 Cupriavidus (Fig. 4A), yet these genera are taxonomically distinct (Fig. 4B). This observation, as 353 supported by high bootstrap values in both phylogenies, indicates that Cupriavidus or its ancestor served 354 as the common source for the horizontal transfer of the R-system into the phylogenetically distant 355 Pseudomonas, Bordetella, and Achromobacter genera (Fig. 4B). Interestingly, each genus in clade 2 356 comprises opportunistic pathogens that are most frequently associated with the respiratory environment 357 (Table S3, Fig. S2). The R-systems in clades 4 and 5 of the remaining β-proteobacteria appear to share a 358 common ancestry (Fig. 4C) but show that the R-system from the ancestor of Burkholderia and 359 Paraburkholderia likely transferred horizontally into Ralstonia and Pandoraea (Fig. 4D). This is clearly 360 observed in the phylogenies where Ralstonia and Pandoraea taxonomically diverged before Burkholderia 361 ( Fig. 4D), but the acquisition of their Wsp systems occurred after Burkholderia (Fig. 4C). Interestingly, 362 the Wsp system of Paraburkholderia is not confined to a single node in the Wsp phylogeny and is instead 363 scattered among Ralstonia, Pandoraea, and Burkholderia (Fig. 4C). This indicates that the Wsp systems 364 of Paraburkholderia have high sequence variation which could manifest unique functions such as 365 responding to diverse stimuli or interacting with other response regulators. 366 Much like the taxonomic phylogeny, the H-system in Burkholderia forms a monophyletic clade, 367 indicating that the signaling core of the H-system is highly conserved but is distinct from the R-system. 368 The unique presence of the R-system in B. cepacia MSMB1184WGS presents an interesting case. Our 369 analysis suggests that the H-system of this strain was independently replaced with an R-system after the 370 radiation of the H-system in Burkholderia (Fig. 4C). Another possibility is that the R-system of this strain 371 is related to the original source for the evolution of the H-system. Although many of the organisms in our 372 dataset are associated with respiratory infections, we found that the Wsp systems are present in both 373 opportunistic and non-pathogenic species (Table S3, Fig. S2). This suggests that horizontal transfer 374 events of Wsp predate adaptations to the human host and that individual Wsp systems have functionally 375 diverged beyond the emergence and integration of the wspH gene. 376

Sequence conservation of Wsp systems and acquired mutations converge on predicted functional 378 domains and unannotated regions 379
Missense mutations in wsp genes that impact biofilm formation are commonly identified in 380 clinical isolates and experimental evolution studies 19,27,28,[32][33][34]37,39,[45][46][47] . Such mutations likely occur in 381 key functional residues but the Wsp system as a whole remains poorly annotated. We thus aligned the 382 amino acid sequence of all Wsp proteins in our dataset to generate a consensus sequence, and annotated 383 functional domains based on homology to the enteric chemotaxis (Che) system and empirical studies of 384 the Pseudomonas Wsp system (Tables S4 and S5). We then assessed the 794 sequence alignments for 385 conservation using a weighted Shannon entropy algorithm where a conservation score near one indicates 386 high conservation and a conservation score near zero indicates weak conservation 54 . Regions exceeding a 387 conservation score of 0.8 are highlighted in red since this threshold reliably captures known functional 388 residues 54 . Lastly, we compiled all wsp missense mutations from the literature that have been predicted to 389 impact the signaling of the respective Wsp system, then mapped them to our consensus sequence (Table  390 S4). We summarize the sequence conservation, newly annotated functional domains, and sites of 391 missense mutations in Fig. 5, and the sequence conservation profiles of H-and R-systems independently 392 in Fig. S3. The H-systems share consistently higher sequence conservation across all Wsp proteins, which 393 reflects its phylogenetic isolation (Fig. S3). In general, our results complement previous reports and 394 speculations on Wsp function but also reveal surprising patterns. 395 WspA exhibits strikingly high sequence conservation across four distinct regions. Reflecting on 396 the significance of the trimer-of-dimer interaction (signaling) domain of WspA 19 , the corresponding 397 region is extremely conserved. This particular region also shares 74.5% sequence similarity to the enteric 398 chemotaxis Tsr signaling domain 68,69 , and we found that the signaling domain in WspA likely extends 399 beyond those reported 19 to include residues 378-432 (Table S5). Mutations within this signaling domain 400 likely alter the stability of the trimer-of-dimer interactions, resulting in either increased or decreased 401 signaling to WspE 19 . Two additional regions exhibit high conservation (residues 291-300 and 499-508) 402 which we predict to function as the methylation sites (CH3) by WspC (Table S5). Although methylation 403 of WspA has never been experimentally confirmed 62 , high sequence conservation coupled with homology 404 to the chemotaxis system strongly suggests that these sites are likely methylated. Lastly, the chemotaxis 405 Tsr contains a docking site for the methyltransferase CheR (Fig. 1A), which is present as the last five 406 residues of the Tsr C-terminus 70 . Although WspA does not contain the same motif, we observe high 407 conservation of the last 9 amino acids (536-545) at the C-terminus. This region likely represents the 408 docking site for WspC and WspF. Interestingly, there is relatively low sequence conservation in the 409 extracellular sensory domain (Fig. 5). The same pattern holds true across the R-system but the H-system 410 shows much greater levels of conservation (Fig. S3). These results suggest that these two systems may 411 respond to unique extracellular stimuli, which could also apply across the R-system as well given the 412 extent of sequence variations observed. has 7 motifs ranging between 5-19 residues. Comparisons of these motifs reveals that they are unique to 424 their respective proteins and have no known or predicted function. Although we were unable to 425 bioinformatically predict functional domains in either protein, their unique conservation signatures 426 described here likely represent sites for interacting with WspA and WspE (Fig. 1A), and potentially with 427 other non-Wsp proteins to modulate subcellular localization. Interestingly, no mutation has ever been 428 reported in the wspB gene for the H-system (Fig. 5). 429 Both the conserved regions and reported mutations in WspC and WspF largely associate with our 430 annotated domains (Fig. 5). WspC is predicted to function as the main activator of the Wsp system and 431 WspF acts as the repressor (Fig. 1A). Consequently, wspF mutations from the literature exclusively act to 432 turn on the Wsp system while mutations in wspC exclusively turn off the Wsp system (Fig. 5). However, 433 there is a striking pattern here where no mutation has ever been reported for either protein of the H-434 system. This is very surprising given that WspC and WspF are predicted to function as the main switch of 435 the Wsp system. Collectively, these results suggest that WspC and WspF indeed function to methylate 436 and de-methylate WspA as predicted, but the methylation state of WspA may have reduced influence on 437 the activity of the H-system compared to the R-system. 438 We observe high conservation in the REC domain (648-702AA) of WspE (Fig. 5) and mutations 439 within specific residues that appear to activate WspE in a WspA-independent manner 18 . The HATPase 440 domain shows high conservation, which is expected as this region is essential for binding to ATP and 441 ultimately phosphorylating WspF and WspR/WspH. We observe high conservation in unannotated 442 regions that flank the HATPase domain that are frequently mutated in the H-system but entirely 443 unaffected in the R-system. Given that WspE interacts with either WspR or WspH, this particular region 444 may be uniquely important for the H-system. It is likely that these WspE activating mutations observed 445 exclusively in the H-system manifest a conformational change that initiates HATPase function in the 446 absence of a stimulus. The region between the HPT and HATPase domains in the chemotaxis CheA 447 constitute the P2 and P3 domains which are responsible for CheY (response regulator) and CheB 448 (methylesterase) docking and CheA dimerization, respectively (Fig. 1A). However, blast assessments 449 reveal no significant similarities between WspE and CheA for these regions, and therefore WspE lacks 450 this annotation. It is possible that the H-system mutations in the 3' adjacent HATPase region my affect 451 phosphorylate an unknown response regulator 43 . We observe comparably high conservation in our large 457 dataset within the WspH/R REC domain (Fig. 5). As expected, we observe weak conservation of this C-458 terminus in our WspH/R alignment and greater conservation when H-and R-systems are compared 459 independently (Fig. S3). Interestingly, only the GGDEF region of the WspR C-terminus exhibits high 460 conservation. 461 462

Identification of residues that are uniquely conserved between H-and R-systems 463
Among the 2,899 consensus amino acid residues of the H-and R-systems, we have identified 43 464 residues that are likely important for the specialized function of the H-system (see Methods). These 43 465 residues are highly conserved in both H-and R-systems but also unique to each system (Table 1), and 466 represent non-conservative substitutions that occurred within the H-system 57 . Selective pressures have 467 forced nearly all H-systems to retain these residues, suggesting that they are essential to H-system 468 signaling. Notably, 23 of the 43 identified residues are in the C-terminus of WspH, which does not 469 encode an enzymatic domain like WspR, but is instead a histidine kinase speculated to phosphorylate an 470 unknown response regulator that stimulates biofilm production 43 . Seven residues occur in the REC 471 domain of WspH, which is predicted to interact with and become phosphorylated by WspE. WspE of the 472 H-system has three substitutions, with two occurring in the histidine phosphotransfer (HPT) relay domain 473 and one in an unannotated region of the C-terminus. We predict that these residues are unique to WspE increased biofilm formation associated with elevated cyclic di-GMP production in diverse bacterial 497 species. However, our study clearly demonstrates that these wsp mutations in B. cenocepacia HI2424 do 498 not alter cyclic di-GMP production in contrast to comparable wsp mutations in P. fluorescens Pf0-1. 499 Despite the striking difference in the functional output of H-and R-systems, they share high levels of 500 sequence conservation within and beyond key functional domains that overlap with the enteric 501 chemotaxis system. One major difference we observed was the complete absence of reported mutations in 502 wspC and wspF genes of the H-system in contrast to those of the R-system. We also identified 43 highly 503 conserved amino acid residues across all Wsp proteins that are uniquely modified in the H-system. These 504 specific residues likely differentiate mechanistic variations between the H-and R-systems, and we suspect 505 that the methylation state of WspA exerts a relatively reduced role in the signaling cascade of the H-506 system compared to the R-system. 507 The Wsp proteins of the R-system exhibit greater overall sequence variation compared to those of 508 the H-system. This is not surprising since the H-system is phylogenetically restricted to Burkholderia spp. 509 and the R-system is much more divergent. However, nearly all of the annotated functional domains are 510 highly conserved across the R-system with the exception being the extracellular sensory domain of 511 WspA. The external stimulus of the Wsp system long remained a mystery until a recent study 512 demonstrated surface-contact to be the main stimulus in P. aeruginosa 38 . The relatively large sequence 513 variation observed within this extracellular sensory domain indicates strong potential for independent 514 adaptations to diverse external stimuli. We found that WspB and WspD proteins exhibit the least amount 515 of sequence conservation, yet we did not observe any instance of a wsp operon lacking either of these 516 proteins, strongly indicating that they are both functionally important. In contrast to the enteric 517 chemotaxis system which utilizes CheW to physically bridge the methylation and phosphorylation 518 signaling modules, the Wsp system is thought to utilize both WspB and WspD in an analogous manner. 519 However, there is clear evidence in P. aeruginosa that WspB and WspD are not functionally redundant 19 520 and we observed distinct sequence conservation patterns between these two proteins. We also found that 521 all mutations reported in WspB and WspD proteins act to deactivate the R-system while mutations in 522 WspD of the H-system exclusively act to turn it on. Furthermore, no mutations have been reported for 523 WspB of the H-system. Given the tremendous influence of WspD to P. aeruginosa's Wsp signaling 524 File S1. Python script executed on Data S1-S2. 540 File S2. R script executed by File S1 to download RefSeq genome assemblies. 541 File S3. R script executed by File S1 to parse File S2 output for Bac120 set creation. 542 File S4. R script executed by File S1 to parse File S3 output and finalize Bac120 dataset. 543

Acknowledgements 544
We thank M. Bentley for developing the initial codes for phylogenetic analyses. This study was 545

Figure 1. Comparison of the Wsp signal transduction system to the enteric chemotaxis system. A)
A schematic comparison of the chemotaxis (Che) system of E. coli to the WspR system of Pseudomonas and the WspH system of Burkholderia. The Che system modulates the direction of flagellar rotation in response to the binding of attractants to the receptor (e.g. serine and Tsr). The Wsp system is reported to respond to surface contact in P. aeruginosa, but the signal output varies between activating WspR (diguanylate cyclase) or WspH (function unknown). The panel on the right depicts sequence conservation of relative proteins among the Che, WspR, and WspH systems as presented numerically in Table S1. Dark green proteins show the greatest conservation while light green proteins show the least conservation. CH3 = methyl group, P= phosphate. B) The wsp genes in P. fluorescens Pf0-1 and B. cenocepacia HI2424 share synteny except that wspR is the terminal gene of a monocistronic operon and wspH is absent in the latter; wspH appears to be encoded as an independent transcription unit upstream from the wsp operon. The mucoid patches observed in the WT P. fluorescens colony represent rsmE mutants that naturally emerge (Kim et . al, 2014). B) LC-MS/MS data show that the wrinkly phenotype correlates with increased levels of cyclic-di-GMP in Wsp mutants in P. fluorescens (orange bars) but not in B. cenocepacia (teal bars). Plotted are the mean of three replicates, error bars represent the standard deviation, n.s. denotes no significant difference, and ** denotes significant difference (ANOVA p < .0001; TukeyHSD p < 0.01).

A) B)
Figure 3. Phylogenetic analysis shows that the Wsp system is restricted to the and proteobacteria and wspH is unique to Burkholderia. A) A maximum likelihood species tree of organisms with the H-system (teal) or the R-system (orange). For simplicity, the tree was collapsed at the genus level with the values within parentheses indicating the number of strains in each branch (detailed in Table S3). 588 strains possess the R-system, and 206 strains possess the H-system which is restricted to Burkholderia. B) An expansion of the Burkholderia-Paraburkholderia subgroup shows that only one strain (Burkholeria cepacia MSMB1184WGS) possesses the R-system. An independent phylogenetic assessment of Wsp proteins identified five unique Wsp system clades (    Pseudomonas [6] Bordetella [6] Bordetella [19] Achromobacter [15] Cupriavidus [207] Burkholderia [3] (Proto)Burkholderia Paraburkholderia [2] Pandoraea [9] Paraburkholderia [14] Ralstonia Wsp Phylogeny A C Figure 5. Evaluating the conservation of Wsp proteins identifies key residues that are likely essential to all Wsp signaling systems. The 794 peptide sequences for each protein were used to generate individual alignments. Annotation data is derived from NCBI CDD (conserved domain database), Prosite, or Che homology as indicated in Table S5. Reported naturally occurring missense mutations from the literature in the H-system are shown in blue and those in the Rsystem are shown in orange. Engineered missense mutations reported in the literature are indicated in white. Mutations that turn on the respective Wsp system are indicated as circles while those that turn off the system are indicated as triangles. The y-axis represents the Shannon Entropy evaluation for each protein alignment where weighted values near 1 indicate high sequence conservation and values near zero indicate weak sequence conservation. Regions where the weighted Shannon Entropy metric equals or exceeds 0.8 are shaded in red and denote regions likely to have functional or structural importance. Many of the previously identified mutations reside in these regions of high conservation but the functional role of these residues remains unknown.

Wsp on-state
Wsp off-state H-system mutation R-system mutation Engineered mutation .88478 * Reflects the amino acid position within the H-system and R-system alignments where the identified residue is highly conserved in its respective alignment but represents a nonconservative mutation when compared. § Domain evidence is found in Table S5 (HPT = Histidine phosphotransfer domain, REC = receiver domain). † The amino acid at the specified position * within the R-system and H-system alignments are highly conserved as reflected by the R-system and H-system conservation scores Figure S1. Phylogenetic analysis of the core Wsp proteins indicates that the R-system predates the H-system. The H-system (teal) and R-system (orange) gene tree was constructed using the amino acid sequences of the core Wsp proteins (WspA, WspB, WspC, WspD, WspE, and WspF). The phylogeny is rooted to Wsp homologs in Caulobacter crescentus reported in  Figure S2. Opportunistic pathogens with a Wsp system are frequently associated with the respiratory environment. Organisms represented in the Wsp dataset were classified as either opportunistic pathogens (red) or no evidence of pathogenicity (rose) based on a literature search at the species level. Species identified as opportunistic pathogens were then sub-categorized by common site of isolation (blue). Counts of unique species for each genus are reported on the x-axis. Genera without any indication of pathogenicity were excluded from this figure but the full analysis with references is summarized in Table S3 Figure S3. Discrete conservation of Wsp proteins within H-or R-systems. Amino acid sequences used to generate Figure 5 were divided into an H-system plot (blue) or an R-system plot (orange). Annotation data is derived from NCBI CDD (conserved domain database), Prosite, or Che homology as indicated in Table S5. Reported naturally occurring missense mutations from the literature in the H-system are shown in blue and those in the R-system are shown in orange. Engineered missense mutations reported in the literature are indicated in white. Mutations that turn on the respective Wsp system are indicated as circles while those that turn off the system are indicated as triangles.

Supplementary Figures and Tables
The y-axis represents the Shannon Entropy evaluation for each protein alignment where weighted values near 1 indicate high sequence conservation and values near zero indicate weak sequence conservation. The horizontal dashed line indicates where the weighted Shannon Entropy metric equals 0.8, denoting residues of greater functional or structural importance.
Wsp on-state Wsp off-state H-system mutation R-system mutation Engineered mutation Figure S4. Sequence logos of conserved regions in WspB and WspD. Individual domains of high conservation between WspB and WspD show little to no similarity. Sequence logos were generated to represent the highly conserved regions in Figure 5 for WspB and WspD. Islands of high conservation were identified by the 0.8 Shannon entropy metric. Two residues immediately flanking the conserved region are included in the Sequence logos to ensure the entire conserved region is depicted. The values indicated on the x-axis denote the relative amino acid position within the coding sequence. No overlapping signatures were found between the WspB and WspD, which is surprising given their proposed similar function.
WspB WspD Table S1. Sequence conservation assessment of the Wsp signal transduction system and the enteric chemotaxis Che system. † Conservation scores assessed through Shannon entropy analyses (see Methods). Score ranges from 0 to 1 with 0 indicating no conservation and 1 indicating complete conservation. The score was determined for each residue in the alignment. The average conservation score of the protein alignment is reported for each system comparison.