Disentangling the effects of genetic architecture, mutational bias and selection on evolutionary forecasting

4 Peter A. Lind , Eric Libby , Jenny Herzog and Paul B. Rainey 5 6 New Zealand Institute for Advanced Study, Massey University at Albany, Auckland, 7 0745, New Zealand. 8 Department of Molecular Biology, Umeå University, SE-901 87 Umeå, Sweden. 9 Santa Fe Institute, Santa Fe, New Mexico, United States of America. 10 Department of Mathematics, Umeå University, SE-901 87 Umeå, Sweden. 11 Department of Microbial Population Biology, Max Planck Institute for Evolutionary 12 Biology, Plön 24306, Germany. 13 Ecole Supérieure de Physique et de Chimie Industrielles de la Ville de Paris (ESPCI 14 Paris-Tech), CNRS UMR 8231, PSL Research University, 75231 Paris, France. 15 16 17

progress has been made for phenotypically diverse asexual populations subject to 49 strong selection. Effective approaches have drawn upon densely sampled sequence 50 data and equilibrium models of molecular evolution to predict amino acid preferences 51 at specific loci (Luksza and Lassig 2014). Predictive strategies have also been 52 developed based on selection inferred from the shape of coalescent trees (Neher, et al. 53 2014). In both instances the models are coarse-grained and sidestep specific 54 molecular and mutational details. 55 56 There is reason to by-pass molecular details: mutation, being a stochastic process, 57 means that for the most part details are likely to be idiosyncratic and unpredictable. 58 But an increasing number of investigations give reason to think otherwise -that 59 adaptive molecular evolution might follow rules (Pigliucci 2010;Stern 2013;Laland, 60 et al. 2015). This is particularly apparent in studies of parallel molecular evolution 61 (Colosimo, et al. 2005 To what extent do these details matter? 72 73 Mutations arise randomly with respect to utility, but genetic architecture can influence 74 the translation of mutation into phenotypic variation: the likelihood that a given 75 mutation generates phenotypic effects depends on the genotype-to-phenotype map 76 (Alberch 1991 evolution is re-run mutations generating the adaptive type arise in one of three DGC-158 encoding pathways (Wsp, Aws, or Mws) ( Figure 1A). Subsequent work revealed that 159 when these three pathways are eliminated from the ancestral type that evolution 160 proceeds along multiple new pathways (Lind, et al. 2015). Preferential usage of Wsp,161 Aws and Mws pathways stems from the fact that they are subject to negative 162 regulation and thus, relative to pathways subject to positive regulation, or requiring 163 promoter-activating mutations, gene fusion events, or other rare mutations, present a 164 large mutational target. 165 166 Given repeatability of WS evolution, knowledge of the Wsp/Aws/Mws pathways, 167 plus genetic tools for mechanistic investigation -including capacity to obtain WS 168 mutants in the absence of selection -the WS system offers a rare opportunity to 169 explore the feasibility of developing bottom-up strategies for evolutionary 170 forecasting. Our findings show that mechanistic-level predictions are possible, but 171 also draw attention to challenges that stem from current inability to a priori predict 172 locus specific mutational biases and environment-specific fitness effects. 173

175
Obtaining an unbiased measure of pathway-specific mutation rates to WS 176 Knowledge of the rate at which mutation generates WS types via each of the Wsp, 177 Aws and Mws pathways -unbiased by the effects of selection -provides a 178 benchmark against which the predictive power of null models can be appraised. To 179 achieve such measures we firstly constructed a set of genotypes containing just one of 180 the three focal pathways: PBR721 carries the Wsp pathway but is devoid of Aws and 181 Mws, PBR713 carries the Aws pathway but is devoid of Wsp and Mws, while 182 PBR712 harbours the Mws pathway but is devoid of Wsp and Aws. Into each of 183 these genotypes a promoterless kanamycin resistance gene was incorporated 184 immediately downstream of the promoter of the cellulose-encoding Wss operon and 185 fused to an otherwise unaffected Wss operon ( Figure 1B) WS to be obtained without the biasing effects of selection for growth at the air-liquid 197 interface ( Figure 1B). The results are shown in Figure 2. 198 199 The mutation rate was highest for the Aws pathway (6.5 × 10 -9 ); approximately 200 double that of Wsp (3.7 × 10 -9 ) and an order of magnitude higher than that of the Mws 201 pathway (0.74 × 10 -9 ) (Figure 2). The rate at which WS mutants arose from the 202 ancestral genotype in which the three pathways are intact (11.2 × 10 -9 ) was 203 approximately the sum of the rates for the three pathways (11.0 × 10 -9 ) confirming 204 that the Wsp, Aws and Mws pathways are the primary routes by which WS types 205 evolve (Lind, et al. 2015 Much is known about the function and interactions among components of each of the 216 three focal pathways. This knowledge allows development of models that capture the 217 dynamic nature of each pathway and thus allow predictions as to the likelihood that 218 evolution will precede via each of the three pathways. An unresolved issue is the 219 extent to which these models match experimental findings. Following a brief 220 description of each pathway we describe the models. 221 222 produces c-di-GMP and a phosphodiesterase (PDE) domain that degrades c-di-GMP. 257 Little is known of the molecular details determining its function, but both catalytic 258 domains appear to be active (Phippen, et al. 2014 If the specific effect of changing each nucleotide (and sets of nucleotides) was known 263 then models for each pathway would not be required, but such knowledge does not 264 exist. We therefore take a simplifying approach in which attention focuses on the 265 interactions between components that correspond to reactions whose rate can either combinations of enabling, disabling, and no effect changes to reaction rates and 291 determine the likelihood that a WS type is generated. For the Wsp system this 292 amounts to 3 6 or 729 combinations. An example of one set of the possible mutations 293 (m i ) in Wsp is 1, −1, 0, 0, 0, 0 (an increase in r 1 , a decrease in r 2 , but no change in r 3 , 294 r 4 , r 5 , or r 6 ( Figure 3A)). 295

296
Predicting the pathways that evolution follows and genetic targets 297 To determine whether mutations producing WS occur more often in Wsp compared to 298 Aws or Mws pathways, we adopt a Bayesian approach in which the probability that a 299 particular pathway is used is decomposed into two terms: the probability that a 300 particular set of mutations (m i ) occurs in Wsp (or Aws, or Mws) represented as P (m i 301 ∈ Wsp) and the probability that those mutations give rise to a wrinkly spreader 302 represented as P (WS |m i ∈Wsp) (or Aws, or Mws). 303 304 (1) 305 306 To estimate P(m i ∈Wsp) we assume fixed probabilities of enabling and disabling 307 mutations and compute the product. Thus, the probability of m i = 1, −1, 0, 0, 0, 0 is 308 p e p d (1 − p e − p d ) 4 , where p e is the probability of a mutation with an enabling effect 309 and p d is the probability of a mutation with a disabling effect. Recognising the value 310 of accommodating the possibility of localised mutational bias we note that p e and p d 311 can be adjusted for the affected reactants. The second term, P (WS |m i ∈Wsp), 312 requires knowing both how gene products interact and how these interactions result in 313 a phenotype. This information is estimated based on the pathway dynamics 314 represented in Figure 3 and The results of simulations are shown in Figure 4. Figure 4A shows that the Wsp 319 pathway is predicted to be the target of mutation 1.3 -2.1 times more often than the 320 Aws pathway while Figure 4B shows that the Mws pathway is predicted to be the 321 target of mutation 0.7 -1.0 times less often that the Aws pathway. While these results 322 agree with the experimental data showing Mws to be least likely pathway to be 323 followed, the predictions are at odds with the mutation rate data showing WS types to 324 be twice as likely to arise from mutation in Aws, versus Wsp. The causes of this 325 discrepancy are described in the following section. The distribution of mutations for each of the three pathways is indicative of bias. As 391 shown in Figure 5B, almost 29% of all WS-causing mutations (adjusted for 392 differences in mutation rates between the three pathways) were due to an identical 33 393 base pair in-frame deletion in awsX (Δt229-g261, ΔY77-Q87), while a further 13 % 394 were due to an identical mutation (79 a->c, T27P) in awsR. At least 41 different 395 mutations in Aws can lead to WS: if mutation rates were equal for these sites the 396 probability of observing 20 identical mutations would be extremely small. In fact 10 million random samplings from the observed distribution of mutations failed to 398 recover this bias. While the Wsp pathway also contains sites that were mutated more 399 than once (six positions were mutated twice, one site three times and one five times), 400 sources of mutational bias in Wsp were less evident than in Aws ( Figure 5B). 401

402
The mathematical models presented above assumed no mutational bias thus the lack 403 of fit between mutation rate data and predictions from the models. Nonetheless, 404 changing specific reaction rates within the models readily incorporates such 405 knowledge. For example, the mutational hotspot in awsX affects reactions r 2 and r 3 in 406 the Aws differential equation system ( Figure 3B, Figure 3 -figure supplement 2). 407 The effect of a five-fold change in the probability of enabling/disabling change in 408 these reactions leads to the prediction that the Aws pathway is more likely to generate 409 WS types than Wsp for most probability values (see Figure 4D). The only area of 410 parameter space in which evolution is more likely to utilise the Wsp pathway is for 411 rare mutations that have a high probability of enabling change (>> 10 -2 ). One 412 interesting consequence is that it changes the phase-space over which evolution of 413 WS via mutations in the Wsp pathway is more likely with respect to the Aws 414 pathway. In Figure 4A, evolution is most likely to proceed via the Wsp pathway when 415 the probability of disabling change is greater than the probability of enabling change. 416 In contrast, when the likelihood of producing WS types is affected by the mutational 417 hotspot in awsX, then evolution will proceed via Wsp only when the probability of 418 enabling change is greater than the probability of disabling change ( Figure 4D  defining the genotype-to-phenotype map, which successfully predicted mutational 625 targets and the relative likelihood that evolution followed each of the three principle 626 pathways. Importantly, genetic architecture is likely to be transferable between 627 different species, which stands to allow the formulation of general principles and 628 evolutionary rules (Lind, et al. 2015). Despite the simplicity of the mathematical null 629 models, which contain only general information about functional interactions, we 630 successfully predicted mutational targets including previously unknown mutations in 631 wspA. Without information about fitness and mutational biases, however, only order 632 of magnitude predictions of mutant frequencies can be made. Thus, it is possible to 633 predict that Wsp, which is subject to negative regulation will be more common than a 634 DGC that requires enabling mutations (Lind, et  The differential equation models describe the interactions between proteins in each of 780 the three WS pathways. In order to solve the differential equations, two pieces of information are required: i) the initial concentrations of the molecular species and ii) 782 the reaction rates. Although this information is unavailable a random-sampling 783 approach was used to generate different random sets of initial concentrations and 784 reaction rates. Each random set was used to establish a baseline of potential WS 785 expression making it possible to evaluate whether a set of mutations results in a WS 786 type. Effectively, this approach allows sampling of the probability distribution P (WS 787 |m i ∈ Wsp) used in our Bayesian model. 788

789
We randomly sample 1,000 different sets of reaction rates and initial concentrations 790 from uniform priors: reaction rates were sampled randomly from a uniform 791 distribution on log space (i.e. 10 U[−2,2] ) and initial concentrations of reactants were 792 sampled from a uniform distribution U[0,10]. For each set, the appropriate differential 793 equation model was integrated and the steady state concentration of the compounds 794 that correspond to a wrinkly spreader (RR in Aws, R* in Wsp and D* for Mws) 795 computed. This served as a baseline for the non-WS phenotype that was used for 796 comparison to determine whether combinations of mutations result in increased WS 797 expression. After obtaining the baseline, we implemented particular combinations of 798 enabling/disabling mutations (a m i ). Ideally, a distribution linking enabling/disabling 799 mutations to a fold change in reaction rates would be used, but this information is 800 unavailable. In order to progress the effect sizes for enabling and disabling mutations 801 were sampled from 10 U[0,2] and 10 U[−2,0] , respectively, and then multiplied by the 802 reaction rates. The differential equations were then solved for the same time that it 803 took the baseline to reach steady state. The final concentration of R* ( Figure 3A), RR 804 ( Figure 3B) and D* ( Figure 3C) was then compared to the baseline and the number of 805 times out of 1,000 that the WS-inducing compound increased served as an estimate of 806 P (WS|m i ∈ Wsp). The probability distribution stabilized by 500 random samples 807 and additional sampling did not produce significant changes (data not shown). 808

809
The absence of empirical data on reaction rates, initial concentrations, and expected 810 mutation effect size meant using a random sampling approach requiring estimates for 811 parameter ranges. Parameter ranges were chosen to be broad enough to capture 812 differences spanning several orders of magnitudes while allowing numerical 813 computations for solving the differential equations. To assess the effect of these 814 ranges on the results, the sampling procedure was repeated for WSP for three 815 different parameter regimes i) an expanded range for initial concentrations [0-50], ii) 816 an expanded range for reaction rates 10 [-3,3] , iii) a compressed range for mutational 817 effect size 10^[ -1,1] . This analysis shows that qualitative results are robust to these 818 changes (see Figure S1).    AwsX increase rate mutation rate five times. Because the mutational hotspot increases 1201 the likelihood that AwsX's reactions will be altered, r 2 and r 3 have an increased 1202 probability of enabling or disabling change. As a consequence, r 2 shows the biggest 1203 change as it contributes up to 70% (as opposed to 35% without the mutational 1204  Figure S1. Parameter sensitivity analysis. To assess the effect of the chosen parameter (A) ranges on our results, we redid our sampling procedure for WSP for three different parameter regimes: (B) an expanded range for initial concentrations [0-50], (C) an expanded range for reaction rates 10 [-3,3] , (D) a compressed range for mutational effect size 10 [-1,1] . We found that our qualitative results are robust to these changes.