Evolutionary forecasting of phenotypic and genetic outcomes of experimental evolution in Pseudomonas

Experimental evolution is often highly repeatable, but the underlying causes are generally unknown, which prevents extension of evolutionary forecasts to related species. Data on adaptive phenotypes, mutation rates and targets from the Pseudomonas fluorescens SBW25 Wrinkly Spreader system combined with mathematical models of the genotype-to-phenotype map allowed evolutionary forecasts to be made for several related Pseudomonas species. Predicted outcomes of experimental evolution in terms of phenotype, types of mutations, relative rates of pathways and mutational targets were then tested in Pseudomonas protegens Pf-5. As predicted, most mutations were found in three specific regulatory pathways resulting in increased production of Pel exopolysaccharide. Mutations were, as predicted, mainly found to disrupt negative regulation with a smaller number in upstream promoter regions. Mutated regions in proteins could also be predicted, but most mutations were not identical to those previously found. This study demonstrates the potential of short-term evolutionary forecasting in experimental populations. Impact statement Conservation of genotype-to-phenotype maps allows successful prediction of short-term evolution in P. protegens Pf-5 and lays the foundation for evolutionary forecasting in other Pseudomonas.

species and it cannot provide a test of prediction from general principles. In many 131 cases mutants isolated after selection for high-level antibiotic resistance also lacks the 132 complexity that is inherent to many phenotypic traits where the genotype-to-133 phenotype map involves a large number of functional interactions and complex 134 regulation ( Figure 1C). This complexity comes to light in that adaptive mutations to 135 new environments are commonly found in global regulators of gene expression such 136 as genes involved in the stringent response, DNA binding proteins, supercoiling and 137 core genes for RNA and protein synthesis (Barrick, et al. 2009;Conrad, et al. 2009 When the wild type SBW25 is placed into a static growth tube the oxygen in the 151 medium is rapidly consumed by growing bacteria (Figure 2A). However oxygen 152 levels at the surface are high and mutants that are able to colonize the air-liquid 153 interface have a major growth advantage and rapidly increase in frequency ( Figure  154  interface (Spiers, et al. 2002;Spiers, et al. 2003). The WS phenotype is caused by 161 mutational activation of c-di-GMP production by a diguanylate cyclase (DGC) 162 ( Figure 2C) (Goymer, et al. 2006). While many different DGCs can be activated to 163 reach the WS phenotype, some are greatly overrepresented due to larger mutational 7 target sizes leading to a hierarchy of genetic routes to WS ( Figure 2D) (McDonald, et 165 al. 2009;Lind, et al. 2015). The genotype-to-phenotype map to WS has been 166 characterized in detail (Goymer, et al. 2006;McDonald, et al. 2009;Lind, et al. 2018) 167 allowing the development of mathematical models of the three main pathways to WS 168 (Wsp, Aws and Mws) and the prediction of evolutionary outcomes ( Figure 2E) (Lind, 169 et al. 2018). 170 171 172 Figure 2. The Pseudomonas fluorescens SBW25 "wrinkly spreader" model system 173 has several properties that could allow its extension to other species (A) The ancestral 174 strain that has smooth colony morphology on agar plate is inoculated into static 175 growth tubes and incubated for several days. Depletion of oxygen in the medium 176 leads to competition for access to the oxygen-replete surface which is colonized by 177 mutants with enhanced ability for cell-cell adherence and adherence to the wall of the 178 tube. The most successful of these mutant types is the wrinkly spreader that has a 179 distinctive colony morphology due to overproduction of exopolymeric substances 180 (EPSs) of which a cellulosic polymer is the main structural component (Spiers, et  regulators. In SBW25 cellulose-based mats are most successful (encoded by the wss 244 operon). A secondary exopolysaccaride (PGA), encoded by pgaABCD, can also be 245 used to form a stable mat (Lind, et al. 2017b). Fuzzy spreaders (FS) forms rafts that 246 collapse after becoming too large and the mutational cause is inactivation of fuzY, access to oxygen is limiting for growth. This could be achieved simply by changes in 262 gene expression in the wild type, but for an experimental evolution study a mutational 263 solution is sought and environmental conditions are chosen so that the wild type strain 264 does not colonize the air-liquid interface. However changing environment presents a 265 further challenge because, as discussed above, it often leads to a different spectrum of 266 adaptive mutations. Thus a foundational requirement for an extended experimental 267 evolution system to be successful for different species is that the evolutionary 268 solutions are robust to differences in environmental conditions. 269 cellulose-based biofilms are superior in other species as well and that they will be the 278 primary structural solution when available as for P. syringae, P. putida and P. 279 stutzeri. For the three species lacking genes for cellulose biosynthesis, other EPSs are 280 predicted to be used. Based on studies of P. aeruginosa the primary EPS required for 281 pellicle formation at the air-liquid interface in this species is Pel, encoded by the 282 pelABCDEFG operon, which is also present in the Pf-5 genome and is predicted to be 283 the primary phenotypic solution for these species. The genome of P. savastanoi lacks 284 genes for biosynthesis of cellulose and Pel as well as other EPSs that are known to be 285 able to support mat-formation, such as PGA and there is not sufficient data at this 286 point to make a prediction of which one is likely to be the primary phenotypic Disabling mutations are expected to be more common than enabling mutations and 304 therefore the prediction is that most mutations will be in genes where loss-of-function 305 mutations produce an adaptive phenotype ( Fig 2D) (Lind, et al. 2015). This is the case 306 for the large majority of mutations in SBW25 including those activating main DGCs disruption of the genes underpinning the FS phenotype (fuzY, PFLU0478) and CC 309 phenotype (nlpD, PFLU1301) (McDonald, et al. 2009;Ferguson, et al. 2013;Lind, et 310 al. 2017b). Next in the hierarchy of mutations are promoter mutations, increasing 311 transcription, and promoter capture events (Lind, et al. 2015). Less common are 312 intragenic activating mutations that enable a particular function by for example 313 increase in catalytic activity or strengthening of interactions to another molecule or 314 another domain of the same protein (Lind, et al. 2015). Gene duplications occur at a 315 high rate and clearly have the ability to increase gene expression of DGCs, but they 316 have not yet been found to cause WS in SBW25, possibly because a two-fold increase 317 in gene expression is insufficient. 318 319 Prediction of pathways used 320 There are at least 16 different pathways to the WS phenotype in SBW25 with similar 321 fitness, but they are used at frequencies that vary over several orders of magnitude 322 based on the differing capacity to translate phenotypic variation into phenotypic 323 variation ( Figure 2D) (Lind, et al. 2015). Mutations in three pathways, Wsp, Aws, and 324 Mws account for >98% of WS mutations and based on a detailed understanding of the 325 molecular functions of the genes involved of each pathways mathematical models 326 predicting at which relative rates the pathways should be used were constructed 327 ( Figure 2E) (Lind, et al. 2018). The prediction results varies depending on the rates of 328 disabling and enabling mutations, but if it is assumed that disabling mutations are at 329 least an order of magnitude more common than enabling mutations the models predict 330 that Wsp will account for about 53%, Aws 28% and Mws 19% of the WS mutations 331 ( Figure 2E) (Lind, et al. 2018). 332 333 Less common promoter mutations will also appear at rates at least a magnitude lower, 334 but which DGCs that will be transcriptionally activated cannot be easily predicted 335 except for assuming it will be homologs of the ones used in SBW25. These DGCs 336 must be catalytically active and also be localized to the membrane (Farr, et al. 2017). 337 Possibly the subset of DGCs that are primarily activated by mutations to their 338 promoters is mainly determined by mutation rate and a higher mutation rate might be 339 caused by higher transcription and also influenced by gene direction (Sankar,et al. 340 DGC is likely to rule out these DGCs. The DGCs that can be activated by intragenic 343 activating mutations cannot now be predicted beyond the simple prediction that these 344 are the same genes as in SBW25 (Lind, et al. 2015). 345 346

Prediction of mutated genes 347
In addition to predicting the relative rates of the three main pathways, the previously 348 described mathematical model can also predict which proteins are likely to be 349 mutated (Lind, et al. 2018). High rates of WS mutations are predicted for WspF, 350 WspA, WspE, AwsX and AwsR and MwsR ( Figure 2E). A significantly lower rate of 351 enabling mutations is also predicted to occur in WspC, WspR and AwsO ( Figure 2E   that are subject to negative regulation ( Figure 1D). In addition the prediction that 445 promoter mutations would be the second most common type of mutation was 446 successful with two mutations found upstream of the aws operon, which were 447 predicted to disrupt the terminator of a high expression ribosomal RNA operon 448 representing a promoter capture event. Promoter mutations were also found upstream 449 of PFL_3078, which is the first gene of a putative EPS locus (PFL_3078-3093) that 450 has not previously been described and that is only present in closely related strains. WspE, AwsX, AwsR and MwsR, but in most cases they were not identical to those in 469 SBW25 ( Figure 5 - L1024Q G1084R -129 g-> a (2) In total 60 wells were inoculated and subjected to experimental evolution for five 477 days after which air-liquid interface colonization was observed for the majority of the 478 wells. Mutants with clearly visible changes in colony morphology were isolated from 479 43 wells. Typically a single type of divergent colonies was observed and one colony 480  Figure 6A, 6B). 500 The other type was less wrinkly, had similar motility as the wild type and promoter 501 mutations upstream of the PFL_3078-3093 operon ( Figure 6A, 6B). 502 Two types of fitness assays were performed, similarly as previously described (Lind, 504 et al. 2015) to measure differences in fitness between the different DGC mutants and 505 the alternative phenotypic solution with the mutation upstream of PFL_3078. The first 506 assay measures "invasion fitness" where the mutant is allowed to invade a wild type 507 population from an initial frequency of 1%. This confirms that the mutations are 508 adaptive and that mutants can colonize the air-liquid interface. The invasion assays 509 showed that all reconstructed mutants could rapidly invade an ancestral wild type 510 population ( Figure 7A, Figure 7 -source data). Although there were significant 511 differences between selection coefficients of the mutants (one-way ANOVA p < 512 0.0001), no mutant was significantly different from the most common mutant (WspF 513 V271G, two-tailed t-test P > 0.01). 514

515
The second fitness assay measures "competition fitness" and here each mutant is 516 instead mixed 1:1 with the most common WS type (WspF V271G) at the start of the 517 competition. The competition assay showed that the ancestral wild type was rapidly 518 outcompeted by the mutants also at a 1:1 initial ratio ( Figure 7B). There was 519 significant variation in fitness between the WS mutants (one-way ANOVA P < 520 0.0001) and the AwsX had significantly lower selection coefficient (two-tailed t-test P 521 < 0.009) compared to the reference WspF V271G and one of the MwsR mutants 522 (E1081K) had significantly higher selection coefficient (two-tailed t-test P < 0.003). 523 The alternative phenotypic solution used by the PFL_3078 promoter mutant resulted 524 in the lowest fitness (s = -0.1, two-tailed t-test p < 0.005) meaning that it is expected 525 to be rapidly outcompeted by the WS mutants ( Figure 7B). 526  test this prediction the pel operon (PFL_2972-PFL_2978) was deleted from Pf-5 and 544 combined with previously characterized WS mutations and fitness was measured. 545 Both invasion fitness ( Figure 8A) and competition fitness ( Figure 8B) was 546 significantly lower (two-tailed t-tests p < 0.01) compared to isogenic strains with an 547 intact pel operon ( Figure 7A, 7B) except invasion fitness for the AwsX mutant (two-548 tailed t-tests p < 0.08, one outlier). This suggests that Pel polysaccharide serves as an 549 important structural component for colonizing the air-liquid interface and that its 550 production is activated by mutations leading to increased c-di-GMP levels. Although 551 deletion of pel in WS mutants resulted in less wrinkly colony morphology it did not 552 result in a smooth ancestral type. Neither did deletion of pel abolish the ability to 553 colonize the air liquid interface ( Figure 8C) or the ability to invade wild type 554 populations ( Figure 8A). This suggests that production of an additional EPS 555 component is induced by increased c-di-GMP levels caused by mutations in Wsp, 556 Aws and Mws, at least in the absence of pel. As expected if the motility defect 557 observed for WS mutants are primarily caused by high c-di-GMP levels rather than 558 high production of Pel, the motility was also reduced for Wsp, Aws and Mws mutants 559 with the pel operon deleted ( Figure 8D). The extension of the P. fluorescens SBW25 experimental evolution system to related 577 species shows promise for true testing of evolutionary forecasting models. While 578 there is a diversity of DGCs and EPSs between species leading to differences in 579 forecasts, the conserved role of c-di-GMP and limited number of phenotypes allow 580 the use of previous data to improve predictions and makes the experimental system 581 robust to changes in environmental conditions. The test of initial forecasts for P. 582 protegens Pf-5 presented here provides support for the ability to predict some aspects 583 of both genetic and phenotypic evolution while recognizing that the probability of 584 specific mutations cannot in most cases be predicted. prediction that overproduction of structural exopolysaccharides, rather than fuzzy, 603 cell-chaining or mucoid types, would be the primary solution was successful. One of 604 the two phenotypes found here used the Pel EPS, which could be predicted based on 605 its role in P. aeruginosa. However it appears to use a secondary EPS as well, that 606 remains to be identified, given that mutants lacking Pel but with activated DGCs still 607 colonize the air-liquid interface and have a distinct colony morphology. The second 608 phenotype used another EPS, encoded by PFL_3078-3093, which had not previously 609 be described and given that several EPS loci are usually encoded in Pseudomonas 610 genomes its use could not be predicted. However, repeating experimental evolution 611 using other Pseudomonas species is likely to provide more information about which 612 EPSs can be used to colonize the air-liquid interface and their relative fitness to allow 613 improved phenotypic predictions. Deletion of the pel operon, the unidentified 614 secondary EPS and PFL_3078-3093 and subsequent experimental evolution could 615 reveal less fit phenotypic solutions that are expected to exist including fuzzy types 616 caused by defects in LPS modification, cell-chaining types with defects in cell 617 division, adhesive proteins or mucoid types using alginate or levan, two EPSs with 618 lower structural stability. 619 620 The general prediction of types of mutations, as described in the hierarchy in Figure  621 2D, was also successful although the relatively few mutants identified here did not 622 allow for detection of rare activating mutations or double inactivating mutations. The 623 existence of such types could be confirmed by deletion of the main pathways (Wsp, 624 Aws, and Mws) followed by experimental evolution as previously described for 625 SBW25 (Lind, et al. 2015). mutations were found over 9 kb upstream of the aws operon, in between a ribosomal 632 RNA operon and the recCBD operon, which encodes key genes for recombination. 633 The molecular effects of these mutations have not been further investigated, but the 634 resulting WS phenotype is dependent of the presence of the aws operon, deletion of 635 which reversed the phenotype. This is consistent with an up-regulation of c-di-GMP 636 by AwsR presumably caused by increased transcription. The mutation is located in 637 the predicted terminator of the ribosomal RNA operon and increased transcriptional 638 read-through could put the aws operon under control of a very strong rrn promoter 639 that is most highly transcribed during exponential growth. This could explain the 640 relatively mild colony morphology phenotype as well as high motility of this WS 641 mutant ( Figure 6). The mathematical null model (Lind, et al. 2018) successfully predicted that Wsp 644 would be the most commonly used pathway followed by Aws, and Mws the most 645 rare. However, the number of mutants isolated here is rather small and the high 646 frequency of Wsp mutants seems mainly to be caused by a mutational hot spot in 647 wspF. Still the prediction that the three pathways together would contribute the large 648 majority of adaptive mutations (40 out of 43) is not trivial given that in SBW25 at 649 least 13 additional pathways are available to the high fitness WS phenotype (Lind, et 650 al. 2015). It is also worth noting that direct use of mutation rate data from SBW25 651 (Lind, et al. 2018) would result in poorer predictions than the mathematical null 652 model due to a strong mutational hot spot in awsX in that species. 653 654 For the multi-protein pathways Wsp and Aws, the null model predicted ( Figure 2E structure and the other one directly disrupting the phosphorylation active site in the 672 signal receiver domain. No mutations were found in the surface exposed regions 673 hypothesized to be involved in interactions with WspA and WspE, which could be 674 due to differences in function between SBW25 and Pf-5 or simply that they appear at 675 lower frequency and would be detected if additional mutations were isolated. The sole mutation in WspE is, as predicted, located in the direct vicinity of the phosphorylation 677 active site. Mutations in AwsX were amino acid substitutions throughout the gene as 678 well as in frame deletions inactivating the gene as predicted. Mutations in AwsR and 679 MwsR were also found in predicted regions, but no mutations were found in the 680 periplasmic region of AwsR, which is the most commonly targeted region in SBW25. 681 Known mutational hot spots in awsX, awsR and mwsR in SBW25 (Lind, et al. 2018) 682 were not conserved in Pf-5 resulting in divergent spectra of mutations, while mutated 683 regions and predicted functional effects remain conserved between the two species. 684

685
The diversity of phenotypic solutions observed after experimental evolution is 686 dependent on fitness differences between the phenotypes, but also on the rate of 687 which phenotypes are introduced by mutations, which is dependent on the genetic 688 architecture underlying the trait as well as mutational biases. The Pf-5 strain has at 689 least three DGC pathways (Wsp, Aws and Mws) that are subject to negative 690 regulation leading to prediction of a high rate of WS mutants, which are then expected 691 to outcompete other phenotypic solutions. If instead only one of these pathways were 692 present, a larger diversity of phenotypes would be expected to be observed with 693 relative fitness becoming less important as the first mutant that gains a foothold at the 694 air-liquid interface will have a large advantage and priority effects, i.e. being first, 695 will increasingly determine which adaptive mutants are observed. 696 697 Given that the mutational target upstream of PFL_3078-3093 is likely be relatively 698 small and that these mutants are rapidly outcompeted by all WS types tested, their 699 relatively high frequency (3/43) is unexpected. Possibly this is due to a higher 700 mutation rate at these sites (Sankar, et al. 2016) or that population structure limits 701 direct competition between these different phenotypes and reduces the importance of 702 relative fitness. In SBW25 low fitness phenotypes that colonize the air-liquid 703 interface based on LPS modification or cell-chaining are observed prior to the rise of 704 WS to high frequencies (Lind, et al. 2017b)  known about the effects of (for now) unpredictable mutational biases and differences 714 in fitness between different WS types many of the forecasts will inevitably fail. 715 However hopefully they will fail in interesting ways thereby revealing erroneous 716 assumptions. The ability to remove common genetic and phenotypic pathways 717 provides a unique opportunity to also find those pathways that evolution does not 718 commonly use. This is necessary to determine why forecasts fail and update the 719 predictive models for the next cycle of prediction, experimental evolution and mutant 720 characterization to define the information necessary to predict short-term evolutionary 721 processes. Sanger sequencing were performed by GATC biotech and used to sequence candidate 774 genes to find adaptive mutations and to confirm reconstructed mutations 775 (oligonucleotide primer sequences are available in Table S3).

Reconstruction of mutations 778
Nine mutations representing all candidate genes found using Sanger or Illumina 779 sequencing were reconstructed in the wild type ancestral P. protegens Pf-5 to show 780 that they are the cause of the adaptive phenotype and to be able to assay their fitness 781 effects without the risk of secondary mutations that might have occurred during 782 experimental evolution. A two-step allelic replacement protocol was using to transfer 783 the mutation into the ancestor.  Table S3. second assay instead measures competition fitness in a 1:1 competition against a 825 reference mutant strain, which here was chosen to be the WspF V271G mutant 826 because it was the most commonly found in the experimental evolution study and thus 827 is highly successful either because of a high rate of emergence, i.e. a mutational hot 828 spot, or higher fitness than most other WS mutants. In addition, the WspF V271G 829 mutant has a temperature sensitive colony morphology phenotype in that it is highly 830 wrinkly at 30°C, but only have a very mild phenotype when grown at room 831 temperature, thus allowing it to be distinguishable from both the smooth ancestor and 832 all other wrinkly mutants isolated here. 833 834 Fluorescent reference strains of the wild type ancestor and the WspF V271G mutants 835 were created using a miniTn7 transposon (miniTn7(Gm) PA1/04/03 Gfp.AAV-a) 836 (Lambertsen, et al. 2004) that allows integration at a defined locus (attTn7) in the 837 chromosome. This allows the colonies to be distinguished not only by morphology 838 but also by fluorescence under blue/UV light and gentamicin resistance, which 839 provides a way to ascertain that secondary adaptive mutants that might occur during 840 the competition experiment do not bias the results (for example the ancestor could 841 evolve WS types or a WS mutant can evolve to cheat on the other type by inactivation 842 of EPS production or reduced c-di-GMP signalling). Introduction of the transposon into P. protegens Pf-5 was performed by tri-parental conjugation from E. coli with 844 helper plasmids pRK2013 (conjugation helper) and pUX-BF13 containing the 845 transposase genes) using the same conjugation protocol described above. 846

847
The invasion assay was performed by mixing shaken overnight cultures of the 848 competitor 1:100 with the GFP-labeled reference ancestor followed by 1000-fold 849 dilution and static incubation at 36°C for 48 h in TBSGM medium in deep well plates 850 (1 ml per well, using only the central 60 wells). For the competition assay, the GFP-851 labeled reference strain WspF 271G was mixed 1:1 with the competitor and diluted 6-852 fold and grown for 4 h (shaken at 30°C), before plating to determine initial ratios, to 853 ensure the cells were in a similar physiological state at the start of the competition. 854 The competition cultures were then diluted 1000-fold in TBSGM medium and grown 855 in deep well plates (1 ml per well, using only the central 60 wells) static for 24 h at 856 36°C. Selection coefficients (s) were calculated as previously described (Dykhuizen 857 1990), where s = 0 is equal fitness, positive is increased fitness and negative is 858 There is no simple way to find all genes that can function as structural or regulatory 885 genes to allow colonization of the air-liquid interface. Thus the selection in Figure 3B