Distribution of mutation rates challenges evolutionary predictability

Natural selection is commonly assumed to act on extensive standing genetic variation. Yet, accumulating evidence highlights the role of mutational processes creating this genetic variation: to become evolutionarily successful, adaptive mutants must not only reach fixation, but also emerge in the first place, i.e. have a high enough mutation rate. Here, we use numerical simulations to investigate how mutational biases impact our ability to observe rare mutational pathways in the laboratory and to predict outcomes in experimental evolution. We show that unevenness in the rates at which mutational pathways produce adaptive mutants means that most experimental studies lack power to directly observe the full range of adaptive mutations. Modelling mutation rates as a distribution, we show that a substantially larger target size ensures that a pathway mutates more commonly. Therefore, we predict that commonly mutated pathways are conserved between closely related species, but not rarely mutated pathways. This approach formalizes our proposal that most mutations have a lower mutation rate than the average mutation rate measured experimentally. We suggest that the extent of genetic variation is overestimated when based on the average mutation rate.


INTRODUCTION
Natural selection acting on extensive standing genetic variation is usually considered the main determinant of adaptive evolution. This largely ignores biases in the origin of genetic variation, which may play a major role in guiding evolution [1][2][3][4][5][6]. Although it is debated in general [7][8][9][10], the importance of such biases is obvious when there are multiple genetic pathways to an adaptive phenotype with similar fitness. Practical consequences are likely on short time scales as well, for example in pathogen evolution, colonization of new ecological niches or in weak mutation, strong selection contexts in experimental evolution. In this article, we focus solely on the role of mutational biases in determining evolutionary outcomes in bacteria and do not consider natural selection, population size or population structure. This simplification does not imply that these factors are less important. Origin-fixation models can be used to include the evolutionary processes governing the fixation dynamics of mutants [11], but our aim here is to demonstrate the role of mutational processes for the predictability of short-term evolution.
Biases in the production of genetic variation may result from two main causes. First, some genes may be more commonly observed to mutate simply because they have a larger number of mutations that can produce the phenotype, and thus represent a larger target size. For instance, when adaptive mutations may increase the expression of a gene either through a positive regulator or through a negative regulator, then evolution often promotes the latter since loss-of-function mutations are more common that gain-of-function mutations [12,13]. Secondly, differences in the rates of specific mutations leading to an adaptive phenotype may also produce a bias. Bacterial mutation rates vary from <10 −12 gen −1 for specific deletions between distant endpoints without sequence homology [14], to 10 −3 gen −1 for duplications between long sequence homologies [15], with order-of-magnitude differences even for the same type of base pair substitution in the same genomic region [16]. Mutational hotspots are often the main determinant for the adaptive mutants found in experimental evolution studies, which suggests that they are relatively common [17][18][19][20][21][22].
Experimental studies have measured mutation rates using two methods aiming at minimizing the effect of selection: mutation accumulation experiments, which are possible in a wide range of model organisms [23], and Luria-Delbrück fluctuation tests [24,25], designed specifically for microbes.
In mutation accumulation experiments, populations pass through small bottlenecks so as to reduce their effective size and let mutations fix largely independently from their fitness effect. The number and spectrum of mutations that have occurred over all generations can be determined using genome resequencing. Estimates of the average genomic mutation rate in bacteria range from 0.8×10 −10 to 5.3×10 −10 bp −1 gen −1 for base pair substitutions, with the transition G:C>A:T being ten times more common than the transversion G:C>C:G, while insertion/deletion rates range from 0.2×10 −10 to 2.0×10 −10 bp −1 gen −1 [26][27][28][29][30][31][32][33][34]. However, practical limitations imply that each of these estimates is typically based on an observation span of less than 100 mutations. This means that repeated mutations are not observed, and differences in the rate of specific mutations cannot be assessed.
In fluctuation tests, mutants are selected from independently grown microbial cultures. Estimations of the mutation rate to a particular phenotype, for example antibiotic resistance, relies on the Luria-Delbrück distribution for the number of mutants per culture [24,25]. The spectrum of mutations can then be accessed by sequencing independent mutants and calculating the rate of each mutation as a fraction of the total mutation rate. Since the number of possible mutations to the adaptive phenotype is limited, it is possible to find repeated mutations and to measure differences in the rate of specific mutations. As mutation rates are not measured at the genomic level but often only in a single gene, the genomic context may impact the results. Fluctuation tests also tend to overestimate the average mutation rate per mutation since rare, unobserved mutations increase the number of possible mutations while leaving the total mutation rate almost unchanged.
Here, we use numerical simulations to investigate how biases in the production of genetic variation affect evolution in the absence of differences in fitness between mutants. To account for the wide range of mutation rates in a single genome, we model them as a distribution of mutation rates (DMR) rather than as a constant value. We only intend this theoretical model to shed light on the limitations of empirical studies. Because this approach concerns a DMR over a population of mutations, it involves a different scale than the distribution of (mean) mutation rates over a population of individuals, including mutators or cells in hypermutable states [35]. Combining insights from mutation accumulation experiments and fluctuation tests leads to the conclusion that a model DMR must have a higher average than median, because it has a long tail of few high-rate mutations. This means that the mutation rate for most mutations is lower than traditionally assumed, as methods for estimating mutation rates are sensitive to extreme cases encountered in mutational hotspots.
Considering the evolution from a wild-type to a phenotypically defined mutant, we show that characterizing the genotypeto-phenotype map (GPM) underlying this transition requires isolating a very large number of mutants when there are major differences in the number of possible mutations per gene. In particular, finding mutations in the same genes repeatedly is no reliable clue that other possible mutations are negligible, as long as the total number of mutants sequenced remains insufficient.
We also find that accurate experimental estimation of the genes' relative mutational target size requires sequencing a much larger number of mutants than traditionally thought. Assuming a log-normal DMR, we show with a 95 %-confidence interval that experimental data is compatible with a median mutation rate more than an order of magnitude lower than the average mutation rate. Finally, we show that the estimated log-normal DMR limits the predictability of experimental evolution on the level of individual mutations, but not at the level of genes, if there are many possible adaptive mutations per gene.
We interpret our results using the well-established wrinkly spreader (WS) bacterial experimental evolution model system in Pseudomonas fluorescens SBW25. Here, adaptive mutants, called wrinkly spreaders, rapidly appear during static growth and colonize the oxygen-replete air-liquid interface. Many different genetic pathways can lead to the WS phenotype, but pathways under negative regulation are mutated much more frequently, because they have a larger number of possible mutations. Rare pathways, with similar fitness, are only observed after the commonly used pathways are deleted [12]. It is also possible to measure the mutation rate to the wrinkly spreader phenotype in the absence of selection using fluctuation tests [21]. This revealed strong mutational hotspots that have a major impact on what mutants are available for natural selection to act upon [21].
The experimental results and models from these studies [12,21] were used to extend predictions to other Pseudomonas species [36]. An experimental test for Pseudomonas protegens Pf-5 revealed that WS mutations were mainly found in the same genes after experimental evolution but that mutational hotspots were not conserved between species. Thus, knowledge about the GPM was extendable to another species but information about mutational hotspots did not improve predictions. Although differences in competitive fitness between wrinkly spreader mutants was a major contributor to the outcome of experimental evolution in these studies [21,36], we do not expect such differences to be conserved between species or between slightly different environments. Thus, it is not possible to produce a prediction of relative fitness effects for another Pseudomonas species without experimental data. Therefore, we assume equal fitness for all wrinkly spreader mutants and our null model for evolutionary forecasts only assumes a similar GPM where the uncertainty added by mutational hotspots can be included by sampling from a DMR. While we here focus on the extended wrinkly spreader model system to produce experimentally testable predictions, our results are general and can be applied to other model systems where the origin of genetic variation plays a major part in determining evolutionary outcomes.

Modelling approaches
We focus on the transition from the wild-type to a mutant phenotype in a given species and consider exclusively mutation rates, because they are the origin of genetic diversity. In particular, we do not model selection (in terms of variation in the fitness of mutants), population size or population structure, and we treat molecularly different mutation types (transitions, transversions, insertions, deletions) equally from a modelling point of view. These modelling choices are not meant to diminish the importance of these factors in determining evolutionary outcomes. Fig. 1 provides an overview of our modelling approaches. We assume a fixed set of mutations and use two representations: mutation pie charts (MPCs) and a distribution of mutation rates (DMR).
When focusing on the relative probability of each mutation, independently from absolute rates, it makes sense to use a pie chart representation: an MPC has probability 1, distributed among its shares. Each share accounts for one mutational unit, which may represent a mutation (in a coding or non-coding region), a gene or a pathway. Thus, we model the observation of a mutation's occurrence in the laboratory as a single, independent, random draw sampled from an MPC. In order to compare different MPCs, we introduce a metric summarizing it into a single number, the unevenness index U. It is commonly used in ecology and is based on Shannon's diversity index. The unevenness index quantifies the global disbalance between the shares of the MPC, from perfect evenness, where all shares have the same size (U=0), to maximal unevenness, where one share takes up the whole MPC (U=1). Other metrics are possible (Appendix C). Approaches used. Graphical abstract of the approaches used in this article, showing the interpretation of the mutation rate for mutational elements defined on a molecular basis, the corresponding mutation pie chart (MPC) with relative mutation rates, as well as a conceptual distribution of mutation rates (DMR) considering absolute rates.
When focusing on the absolute mutation rates however, we model the rate for each mutation as sampled from a DMR, which accounts for the variation in the rates of specific mutations. This allows us to investigate the effects of this variation on the repeatability of experimental evolution. It obviously makes sense to consider a DMR when an MPC has a large number of mutational elements. But interestingly, we can think of any MPC as a set of mutation rates sampled from a DMR. In particular, the same mutations sampling their rates twice independently from the same DMR to form two parallel MPCs can model closely related species.
We model the DMR as a log-normal probability distribution to satisfy several properties: (1) most mutation rates are low and very close to 0 (around 10 −10 gen −1 for instance), (2) some are higher, (3) none are zero and almost none are extremely low (e.g. <10 −50 gen −1 ).
Whether the support of the distribution covers all positive numbers or only the [0, 1] interval makes no difference in the context of mutation rates, where it suffices that values above 10 −2 gen −1 are practically impossible. Among other distributions, such as the Gamma or the Beta distribution, the log-normal distribution is the only one to always satisfy condition 3., according to which any genetic sequence should in principle be able to mutate. The log-normal distribution is the distribution of the exponential of a normally distributed random variable. Its parameters μ and σ are the parameters of the underlying normal distribution.

Empirical data
To provide a direct link between theoretical results and laboratory experiments, we refer to the wrinkly spreader (WS) system in Pseudomonas fluorescens SBW25. One of the best characterized systems in experimental evolution, it has been used to address many fundamental questions in evolutionary biology [12,21,[37][38][39][40][41][42] and can be extended to other Pseudomonas species to test evolutionary forecasts [36]. Here, we use an estimate of 500 mutations that can produce the WS phenotype, divided unevenly between 16 genes with 2 to 120 mutations each (Appendix B, Table 2). We include base pair substitutions and small insertions and deletions that are all treated equally in terms of probability of mutation. We also use previously published [21] experimental data from fluctuation tests for the aws operon within this system (Appendix B, Table 3).
When P. fluorescens is inoculated into static growth tubes, access to oxygen selects for mutants with an increased ability to colonize the air-liquid interface. The most successful type of such mutants is called wrinkly spreader because of its distinct colony morphology on agar plates. Single mutations characterize the GPM of the WS system, so that experimental data makes it possible to estimate the number of target genes and the number of mutations per gene: evidence reveals major differences, with prominent mutational hotspots biasing evolution towards certain genes [12,21,41,43]. Although WS mutants have a diverse genetic basis, they have similar fitness when competed against the ancestor [12] and can also be isolated independently from fitness effects [21]. Thus, their evolutionary dynamics ideally fits our question since mutational phenomena play an important role compared to selection and commonly observed mutational pathways do not generally have higher fitness than rare ones [12]. However, differences in competitive fitness between wrinkly spreader mutants make mutations in some genes unlikely to be found after experimental evolution even when they have a high mutation rate [21]. This complicates evolutionary forecasts based on a GPM model because there is no way to a priori predict the fitness of mutations in different genes, as discussed previously [36], which makes a null model without selection a reasonable starting point.

Characterizing the genotype-to-phenotype map by completion experiments
Understanding the genotype-to-phenotype map (GPM) is fundamental to explain why evolution uses some routes rather than others. Experimentally however, it requires finding rare evolutionary pathways, sometimes orders of magnitude rarer than those commonly observed. As practical constraints limit how many mutants laboratory experiments can characterize, it is important to estimate the number of mutants required to find all genetic elements that can mutate to obtain a specific phenotype.
Repeatedly finding mutations in the same gene is often interpreted as an indication that the mutation pool has been exhausted. In order to explore theoretically to what extent this is valid, we draw independent, random observations from a known MPC, until all possible mutational elements have been drawn at least once. The result of such a completion experiment is the number of observations collected in order to achieve completion, simulating the number of mutants that a laboratory experiment would need to sequence. Although completion is not the only information of interest to experimentalists, they can use this simple theoretical standard to picture the statistical power of an experimental design.
We confirm that repeatedly finding the same mutations indeed indicates that the mutation pool has been exhausted, but only when the total number of observations is large enough. Simulations show that large enough can mean an impractically high number, especially when the possible mutations have very uneven mutation rates. Importantly however, the notion of large enough is not directly determined by the rarity of the rarest mutation.

Reliability of experimental results depends on counts
The exact number of times a single type of mutation has been repeatedly observed, although it does not improve our molecular understanding of the phenotypic change, provides information on how reliably we picture available mutations. Indeed, Fig. 2 confirms experimental studies suggesting that, by themselves, repeated observations do not necessarily mean that most genetic pathways have been discovered [12]. For instance, the ideal case of the top panel shows a median at 52. This means that, at best, in half of the cases, three times as many observations as the actual number of pathways were necessary to get a complete picture of available mutational pathways. If we wanted to ensure completion in practice (97.5 % of cases), 101 observations would be required. This is because completion experiment results are distributed over a broad range. Thus, our picture of available mutational pathways heavily relies on luck, as long as we consider sequencing a limited number of mutants, which produces highly variable and thus unreliable results. As a consequence, it is essential to carefully record our observation span, i.e. the exact count of mutants experimentally sequenced, even if newly sequenced mutants do not reveal any new mutation.

Rarest mutation alone does not capture completion speed
Counterintuitively perhaps, the number of experimental samples required to observe all possibilities does not critically rely on the rarest mutation pathway. We could rationally think that we have achieved completion as soon as we have discovered the rarest mutation. In other words, we could reduce an MPC to only two areas: the rarest mutation and all other mutations. But this approach is not valid to estimate a required observation span: it leads to a gross underestimation. Indeed, the middle and bottom panels of Fig. 2 suggest that all mutational elements may influence our ability to complete the discovery of all possible mutations, not only the most uncommon ones. This means that the impact of the sheer number of mutational elements may override matters of relative probability, since having more mutational elements obviously makes completion slower (Fig. 3). We provide further illustration of the relationship between completion experiments and unevenness in Appendix C (Figs 13 and 14).

Unevenness slows down completion
Beside the observation span, MPC-evenness makes it easier to achieve completion quickly, so that the even case provides a best-case scenario for a fixed number of mutational elements. Indeed, the two upper panels in Fig. 2 suggest that the more uneven the MPC, the more observations we must collect to find all mutational elements. Fig. 3 not only confirms that this is true on average and in median, but furthermore suggests that a higher unevenness monotonically increases completion time.
Since maximal unevenness means that one mutation has 100 % probability while all others have 0 % probability, U=1 implies an infinite completion time. Whereas any other MPC increases the number of mutants to sequence, the even case sets an ideal standard where completion experiments are the fastest.
In the best-case scenario of an even m-MPC, minimizing completion result with m available mutations (Appendix C), several formulas are available to inform the design of laboratory studies. The expected number of observations required to discover all mutational elements is approximately m (ln m + γ+1/(2m)), where γ ≈ 0.577 is the Euler-Mascheroni constant. The variance is approximately π 2 m 2 /6 − (ln m+1+γ) m − 1/(12m). The expected number of additional observations required to observe a new, (k+1)-th mutation after discovering k different mutations is m / (m − k). And the maximum-likelihood estimation of the total number of mutations m given the number of elements already discovered k and the total sample size obs of observations gathered satisfies [44]: Let us illustrate how we can use this best-case scenario to generate information about a completion experiment. Suppose we are interested in a mutant phenotype produced by a number of mutations that we estimate to be m=50. Were all mutation rates equal, so would we need to sequence 225 mutants on average to discover all 50 mutations. The standard deviation would be 62, meaning that, with some luck, an observation span of 300 mutants could suffice. Now, suppose we have already found 40 of the 50 mutations: we can expect to find a new one within five additional observations. All of that would be the best-case scenario: we must actually expect longer times in the likely case where mutation rates are uneven. But these simple formulas readily provide us with the minimum observation span for an evolution experiment study to not be underpowered, i.e. allowing on average the observation of the mutation range.

Assessing the relative frequency of mutational elements
Related to the question of finding all genetic elements of the GPM, another important and experimentally more difficult problem is to quantify the probability that a certain genetic element mutates rather than another. This corresponds to estimating the MPC in terms of the relative share of each mutational element, which is a prerequisite for predicting evolution based on a model GPM.
As the number of observations increases to infinity, the law of large numbers almost surely ensures that we end up discovering all mutations, and that their relative frequencies converge to their relative mutation rates. Thus, we can theoretically access the shape of the MPC in an asymptotic manner. Does this mean that we need an infinite number of experimental samples? No, luckily, things are more complex. Indeed, our simulations suggest that a correct estimation of a 16-MPC is possible, although luck plays an important role, especially if the MPC is very uneven.

MPC-estimation converges fast in mean/median
Collecting more and more data improves our estimation from experimental samples at the level of the mean and at the level of the median. On the one hand, Fig. 4 (top) shows that our attempts get closer and closer to a correct estimation of the real MPC as modelled by the unevenness: the expected error and the median error rapidly decrease when more mutations are observed, so that sequencing a few hundred mutants yields good approximations of the 16-MPC on average and in median.

MPC-estimation converges slowly in variance
Collecting more data also improves our estimation from experimental samples when considering the variance of the estimation. Indeed, Fig. 4 (bottom) suggests that the variability of the error shrinks. However, this variance drop happens on a much longer time scale: although the expected value for the error and the median error are already low after sequencing 200 mutants from a 16-MPC, it remains very variable for a few more hundreds of mutants to collect. Importantly, after a high number of observations, the improvement of the expected value for the error becomes negligible. For instance, after sequencing 1000 mutants from a 16-MPC, it would not be worth extending the observation span to 10 000 since the expected difference between the real unevenness and its estimation would remain almost identical.

Unevenness makes convergence more uncertain
Finally, Fig. 4 suggests a moderate impact of the unevenness of the true MPC on our estimation when accumulating observations. The more even the true MPC, the faster the estimated MPC becomes correct. On the contrary, if the true MPC is more uneven, then we need to sequence more mutants in order to correctly estimate its unevenness. This does not hold for the trivial case where U=1, since having a unique available mutation yields and immediate correct estimation of the MPC. These findings suggest that, in addition to finding all possible mutations, estimating their relative frequencies also requires an observation span of ideally several hundred mutants to sequence.

Viewing absolute mutation rates as a distribution
If mutation rates vary over orders of magnitude even for the same type of mutation [16], then modelling them by a single average value may lead to erroneous conclusions. The concept of a distribution of mutation rates (DMR) can help think about mutation rate variations in a more nuanced way. So, we investigate how a DMR would affect our ability to predict evolutionary outcomes, exemplified by the WS model system.
In the following, we use the maximum-likelihood estimate summarized in Table 1 as our calibrated mathematical tool, unless otherwise specified. This calibrated model is only indicative. Indeed, it is not clear how better or more data would impact the values of the estimated parameters. Obviously, the confidence interval could become narrower, but it does not seem obvious that the mean or variance of the log-normal DMR would have to increase or decrease.
Following empirical observations into assuming a skewed DMR implies that the average mutation rate does not correspond to the median, nor to the mode. Indeed, from the 95 %-CI, the data supports log-normal DMRs with a median <3.25×10 −11 gen −1 and a mode <1.37×10 −11 gen −1 , substantially lower than the assumed mean. Although they are sensitive to noise, medians should always be preferred when outliers are expected, since they are statistically stable. But the question is then how to interpret the average values obtained experimentally. Specifically, how does the DMR view of mutation rates inform the repeatability of experimental evolution experiments?

Gene mutation rates repeatably reflect large discrepancies in target size
In the DMR framework, the extent to which experimental results may be repeatable, either within a single species or between species with similar GPMs, depends on two opposing forces: the deterministic force of the MPC-setup representing mutational target size, which promotes repeatability; and the stochastic force of the heavy-tailed DMR representing the variability of mutation rates, which acts against repeatability. Here, we refer to mutations defined at the molecular level as mutations, and to larger units at the gene or metabolic pathway level as gene, as pathway or as (mutational) target, so that the pathway rate is the sum of the rates for each of its mutations. This way, the number of mutations represents the size of a mutational target.
Our numerical simulations suggest that gene mutation rates reflect large discrepancies in target size. Indeed, even though the calibrated log-normal DMR allows for the mean to be 100-fold larger than the modal mutation rate, it was not able to overcome the determinism of common pathways having 25 times more sites than rare pathways (Fig. 6). Thus, the MPC-setup determines the most common pathways in most cases, i.e. the fact that two mutational targets have substantially different sizes often ensures that the larger one has a significantly higher mutation rate. Here, substantially quantifies the extent to which repeatability of the most common pathways is robust to DMR-induced fluctuations.
Which determinations of the MPC-setup are robust to the stochasticity resulting from the DMR? No analytical answer is known in the case of a log-normal DMR [45], but our numerical results suggest that a target size 15 times larger indeed ensures a higher effective mutation rate in practice (Fig. 7). This result relies on realistic values for parameter σ of a log-normal DMR and 95 % certainty. Thus, orders-of-magnitude differences in the MPC-setup introduce a repeatability which the DMR-stochasticity cannot counteract. This holds for all log-normal DMRs that are compatible with the aws data Appendix B (Table 3), even the most heavy-tailed, which introduces the highest stochasticity. In total, 10 6 12-MPCs (aws setup) were sampled from the log-normal distribution with mean E obs corresponding to each candidate value for σ. Bottom: maximum likelihood estimation (solid red) for the DMR, corresponding to the parameters shown in Table 1; lower (dotted blue) and upper (dashed blue) boundaries of the CI assuming the same mean E obs (dashed orange line).The bottom panel makes it clear that the bulk of the DMR could be different from the mean to an unintuitive extent.   Table 2) gives an unevenness distribution (blue) with median 0.212 (dashed magenta line) and 95 % between 0.163 and 0.334 (shaded orange area). MPCs are shown for those three quantiles. Each colour accounts for a pathway and each colour shade accounts for a mutation. In the wrinkly spreader system, blue corresponds to the aws pathway, orange to dgcH, green to mwsR, and red to wsp. Fig. 7. Robustness of MPC against DMR stochasticity. Consider two mutational pathways A and B of respective target sizes n and n/r. Parameter σ of the log-normal DMR (and therefore its coefficient of variation) determines the threshold ratio r thresh under which >5 % of cases challenge the robustness of the MPC-setup: in the region strictly below the curves, r is too low to ensure at 95 % that A will have a higher mutation rate than B. The average (solid coloured lines) and median (superimposed dotted white lines) cut-off r thresh was measured out of 10 6 simulations. The coloured area indicates the 95 %-confidence interval for σ based on the aws data and the unevenness metric, with the maximum likelihood as a solid red line.

Variability of MPC-unevenness from DMR models interspecies differences
We can interpret the variance of the MPC-unevenness distribution as a dimension of the potential difference between species. If the DMR produces a narrow range of unevenness values, it is likely that the relative mutation rates for different pathways will be conserved across species and then knowledge about one species can be used to make evolutionary forecasts for closely related species, assuming the same GPM [36]. Assuming perfect knowledge of all mutation rates in one species, the narrowness of the unevenness distribution allows us to picture how much another species should be close in terms of the relative frequency of available mutations.
In Fig. 6 for example, we mimic the data from Appendix B ( Table 2) using the calibrated log-normal DMR from the aws data. Each mutation independently sampling its rate may result in more even (MPC on the left) or more uneven (MPC on the right) mutation rates. The median unevenness is 0.212, so that the data, with U obs =0.315, suggests a relatively uneven observed MPC.

DMR variance increases MPC-unevenness, DMR mean has no impact
The variance of the DMR favours unevenness in resulting MPC, whereas its average has no impact (Fig. 8). Indeed, the mean does not affect the MPC-unevenness distribution, since multiplying the rate of all mutations by the same factor does not impact their relative share in the MPC. However, the coefficient of variation of the DMR greatly impacts the unevenness: the higher the relative variance, the higher the unevenness. When the variance increases, the likely discrepancy between rare and frequent mutations becomes larger and the probability that few mutations dominate the MPC increases.

DMR shape impacts the absolute total mutation rate (TMR) when there are few mutation sites
The probability for a phenotype to appear in a population depends on the total mutation rate (TMR) from the wild-type to that phenotype. The TMR is the sum of the rates for all mutations responsible for the mutant phenotype. Variation in the TMR between species could lead to major differences in experimental observations, for example the prediction that WS mutants should evolve or not in different species [36]. Assuming that the mutation rates are independently drawn from the same DMR with defined and finite expected value − r and standard deviation s, the central limit theorem ensures that, as the number of mutations n increases, the distribution of the TMR converges to a normal distribution with location n − r and variance ns 2 . Therefore, whereas the TMR to phenotypes arising from a limited number of mutations largely depends on the exact shape of the DMR, an averaging phenomenon occurs in phenotypes resulting from a high number of mutations. This has an impact on the TMR distribution's skewness ( Fig. 9 and Appendix C, Table 4). Indeed, since the DMR is skewed toward lower mutation rates, a TMR distribution resulting from few mutations is also skewed and has a median and a mode lower than the mean. On the contrary, having more mutations implies that the TMR tends to become symmetrically distributed with its median, mode and mean converging. This effect shows in the unevenness distribution since the unevenness of potential MPCs becomes less variable when more mutations are involved (Fig. 8).

DISCUSSION
Although they constitute a fundamental evolutionary mechanism by introducing new genetic variation, mutations have traditionally been seen as simply providing the raw material for natural selection to act upon, with limited ability to influence the course of evolution [1,4]. However, biases in the origin of genetic variation may play a major role since they mean that frequencies at which particular mutants appear can differ by orders of magnitude, either in terms of phenotypes or in terms of genetic elements [2,4]. It is common to interpret mutational patterns to infer biological information about mutational target size, genetic constraints or the underlying causes of repeated evolution. We suggest that researchers should consider the full range of possible explanations in terms of the influence of selection, mutational hotspots and differences in the capacity of genes to translate genotypic variation into phenotypic variation, without making implicit simplifying assumptions. As we show here, it will in many cases not be practically possible to exclude alternative explanations, as it might require showing the absence of alternative mutational pathways or unfeasible measurements of mutation rates.

Relative mutation rates and how to deal with rare mutations
In this study, we used numerical simulations to explore how to take into account mutational biases and differences in the number of adaptive mutations per gene when interpreting empirical data, including results on evolutionary predictability. We showed that the number of possible adaptive routes is likely to be underestimated since most experimental studies, having an insufficient observation span, lack statistical power (Fig. 2), which leads to an incomplete characterization of GPMs. Indeed, for the model system of the wrinkly spreader in Pseudomonas, directly finding all 16 genes that can be mutated by a single mutation to cause a WS phenotype would require sequencing several hundreds of independent mutants, while accurately estimating their probability would require thousands of observations. Unevenness in the rates of available mutations scales up requirements beyond that of this theoretical, best-case scenario (Figs 2 and 3), rendering them impractical in laboratory settings.
Thus, the brute force approach of assuming that population sizes overcome the rarity of all mutations in microbiology experiments may be unable to reveal rare mutations in practice. A more strategical approach may be employed in cases where the most Fig. 9. Distribution of the total mutation rate to the phenotype. Effect of the number of mutation sites on the total mutation rate, presented as a distribution centred around its mean and divided by its standard deviation (which would standardize a normal distribution). The examples shown use a log-normal DMR model to simulate 10 6 for each number of mutation sites.
common pathways can be disabled [12], which has led to the proposal of a hierarchy of mutational pathways in terms of negative regulators (commonly mutated), promoter mutations (intermediate) and rare activating mutations. Our results provide theoretical support for this view, because we showed that the speed at which we correctly picture the breakdown of mutation rates among pathways was short on average (Fig. 4), in the range of laboratory experiments.

Distribution of absolute mutation rates and repeatability of evolution
A large variation in mutation rates not only hinders the qualitative and quantitative discovery of mutational pathways, it is, additionally, a fundamental limitation for predicting evolutionary outcomes based on an understanding of GPMs [21]. To illustrate this with an estimated GPM for the wrinkly spreader system in Pseudomonas [36], we calibrated a log-normal DMR model using a small experimental dataset from Pseudomonas fluorescens SBW25 (Fig. 5 and Table 1). It is unclear to what extent this data is representative for Pseudomonas, but the observed between-mutation ratio of 20 (Appendix B, Table 3) is similar to the ratio of 30 reported in Pseudomonas aeruginosa [46]. We also note the variation in mutation rates is likely to be higher than this experimental data suggests because possible mutations with lower rates are not observed experimentally, which leads to an overestimation of the average or median mutation rate per site.
We suggest that the DMR approach helps to disrupt thinking about mutation rates in terms of a single, constant value equated with the average (Fig. 6). We found that it formalizes the fact that mean mutation rates are higher than median mutation rates ( Fig. 5 and Table 1). Therefore, most mutations have a lower rate than the average typically measured experimentally in mutation accumulation experiments. Put differently, natural selection acts upon a basis of genetic variation that is lower than traditionally assumed. Our indicative calibrated log-normal DMR model allowed us to quantify that the median origin of genetic variation could actually be an order of magnitude below the mean mutation rate (Fig. 5 and Table 1). This would extend to large populations of 10 9 [2] the range where mutational biases influence evolutionary outcomes, even though selection is the dominant factor in such cases.
Thus, evolutionary outcomes result from both a bias in variation origination (mutation) and a bias in variation fixation (selection). We should consider mutational biases as a key evolutionary driver beside selective processes because the standing variation selection acts upon is not as extensive as commonly assumed. A long history of origin-fixation models [11] supports this view, where evolutionary repeatability can be broken down into a contribution from selection and a contribution from mutation. The fixation step is often represented by random draws from a distribution of fitness effects. We cautiously suggest that a promising way forward may be to define the origin part of origin-fixation models as a DMR instead, which would allow for an estimation of the relative contribution of mutation versus selection. How this should be done exactly and how useful this approach may prove is an open matter for future studies.
Numerical results based on our calibrated model suggest that evolution may be predictable at the gene level in the following sense: pathways intrinsically more common by a factor 15 should indeed emerge more often in experimental studies. We can interpret this intrinsic predominance in terms of target size (Fig. 7), but also in light of a molecular understanding [12]. This provides theoretical and quantitative support to experimental studies exploiting the rate hierarchy of mutational pathways [12] in a forecasting perspective [36].
Additionally, if we interpret a DMR as a null-model for repeatability within and between species, its variance ( Fig. 8) directly relates to the probability of parallel evolution [4], which we discuss further in Appendix C. Our findings suggest low levels of exact mutational parallelism between different species: while molecular evolution in a single species may be highly repeatable because of mutational hotspots, rare adaptive pathways with few possible mutations are likely to differ between species (Fig. 6). Therefore, we predict that, when mutating to a wrinkly spreader phenotype, other Pseudomonas species will also use the main known pathways: wsp, aws, mwsR and dgcH. However, since each comprises many possible mutations, we can expect the particular mutations used for each to differ between species. We predict that genes with few possible mutations only appear common in a particular species where they happen to have a high mutation rate: we expect them to differ across species with little to no predictability. Moreover, since the wrinkly spreader phenotype is due to a large number of possible mutations, we can predict that the total mutation rate approximately follows a normal distribution across different species (Fig. 9). These predictions rely on the null-model that different species share the same DMR and the same GPM.

CONCLUSIONS
We have argued that the mathematical shape of a DMR was a matter of modelling on the ground of first principles. Nevertheless, future experimental data on mutation rates could help improve the DMR model. Thus, we see a great need for more extensive experimental studies collecting and characterizing many independent mutants [1]. It would be especially interesting to study natural genomic contexts where multiple genes with large mutational target size present adaptive mutations. There are well-known differences in the average rates of different types of mutations as discussed in the introduction. Yet, little is known about the DMR for different types of mutations and, with additional experimental data on mutation rates of different types of mutations, it might be possible to use different DMRs for transitions, transversions, insertions and deletions to improve evolutionary forecasts and better quantify their uncertainty. In a similar way, it would also be possible to use different DMRs for different environments, for example to include different mutational spectra during aerobic and anaerobic growth [47].
Finally, we showed that information on MPCs, i.e. on relative mutation rates for the different mutations available, provides direct constraints on compatible DMRs. Therefore, finding ways to adequately measure and report the wide range of mutation rates, both in absolute and in relative terms, would generate empirical material to inform our model. Whether the mutation rate variation spectrum is similar across species remains to be discovered, as well as the ways in which the environment and DNA repair systems influence it. We hope to see much progress in the field since addressing these questions does not entail intrinsic technical difficulties. Thus, we encourage researchers to make experimental data available in an accessible format, so that the methods we developed here to calibrate a DMR model for mutation bias, based on such data, can easily be transposed to other experimental models.

METHODS
We ran numerical experiments in Python. Appendix A presents the rationale of their experimental design in more detail. The code is available at https://github.com/anthony-sun/Mutationrates.git. We only use experimental data published previously, described in Appendix B. Our numerical completion experiments (Figs 2 and 3) modelled the observation of each occurrence of a mutation in the laboratory as a single, independent, random draw sampled from an MPC. Estimation experiments from an unknown MPC (Fig. 4) compartmentalized the true MPC to estimate, the observation of randomly sampled mutants, and algorithmic estimations based on the observed mutants. Finally, repeatability (Fig. 7) was investigated by setting up known values for target size ratio between two hypothetical genetic targets (deterministic factor), and for the coefficient of variation of a log-normal DMR (stochastic factor).

APPENDIX A: NUMERICAL APPENDIX Completion experiments from known MPC
If we model the observation of each occurrence of a mutation in the laboratory as a single, independent, random draw sampled from an MPC, we can use the coupon collector's problem to obtain theoretical results. Indeed, the non-equiprobable extension of this mathematical problem then tells us about the number of observations needed to discover all possible mutational pathways: its expected value is known exactly [48][49][50] as well as its variance [50,51].
Yet, the exact mathematical formulas involved are too complicated to be useful in an experimental context. Indeed, laboratory studies need, on the one hand, working approximations rather than exact mathematical formulas for the variance and, on the other hand, an estimation of the median number of mutants to be collected rather than the expected number.
We propose a simple way to visualize the inherent variation that laboratory observations impose. Starting from a given pie chart of mutational pathways, we numerically run multiple completion experiments. Thus, we can access the distribution of the completion experiments' results (Figs 2 and 3).

Estimation experiments of unknown MPC
We evaluate two sorts of information in the framework of a laboratory experiment investigating available mutational pathways responsible for the transition from a given wild-type to a given mutant phenotype. First, how many mutants do we need to observe in order to correctly estimate the total number of mutational pathways available? Secondly, some being more common than others, what does it take to correctly estimate the relative probability of each mutational pathway?
To evaluate our ability to correctly infer this information on available mutational pathways based on experimental evolution in the laboratory, we use computer simulations as numerical experiments. Indeed, that evaluation is impossible in practice since we cannot access the correct data we are trying to infer. Computer simulations allow us to start by positing what the truth could be in order to test to what extent experiments are able to recover it. For instance, we assume that four mutational pathways are available, all equally common first for simplicity (25, 25, 25, 25 %), and simulate the discovery of one, two, etc. mutants, each with a single mutation corresponding to one of the pathways.
To model the fact that the true MPC is unknown, we design numerical experiments with three compartments (to produce Fig. 4).
The MPC that experimentalists want to estimate is coded in a truth compartment. It determines the observation compartment, where randomly sampled mutants appear. We code algorithmic estimations in the modelling compartment based on the observed mutants. Thus, experimental estimations have no direct access to the truth compartment.

Repeatability threshold
Consider a focal, minor mutational pathway A and a major mutational pathway B, with a target size r times larger. The only way for A to have a higher effective mutation rate is to have a mutational hotspot allowing it to overcome the determinism of the MPC-setup of target sizes. Thus, a higher coefficient of variation of the DMR increases stochasticity, favours mutational hotspots and decreases repeatability. In the log-normal framework for the DMR, the coefficient of variation is directly tied to parameter σ. Therefore, parameter σ of the log-normal DMR determines a threshold value r thresh above which the determinism introduced by target size is robust with 95 % certainty against the stochastic aspect introduced by the DMR. Realistic values for σ keep this threshold below 15 (Fig. 7).

Description of the wrinkly spreader model system
The wrinkly spreader (WS) system in Pseudomonas fluorescens SBW25 is one of the best characterized experimental evolution systems and has been used to address many fundamental questions in evolutionary biology [12,21,[37][38][39][40][41][42]. The experimental setup is simple: bacteria are inoculated into static growth tubes, and competition for access to oxygen leads to strong selection for mutants with increased ability to colonize the air-liquid interface. Several types of mutants are distinguishable by their colony morphology on agar plates, which allows for isolation of many independent mutants even when present at low frequencies in a population.
The most successful phenotype is the WS morphotype, whose increased production of exopolysaccharides increases cell-cell and cell-wall adhesion [52]. The WS phenotype is caused by mutational activation of diguanylate-cyclases that increase the production of the second messenger c-di-GMP, a signal for biofilm formation and increased exopolysaccharide production [12,41,53]. The WS morphotype has been found to be caused by single mutations in 19 genes in 13 regulatory pathways, all with a similar fitness increase when competed against the ancestor [12,21]. However, mutations in some genes are rarely observed experimentally due to the small number of possible WS mutations [12]. Therefore, the spectrum of WS mutations is almost completely dominated by mutations in four main pathways, wspABCDEFR, awsXRO, mwsR and dgcH. All are under negative regulation, which can be disrupted by a large number/range of mutations to produce the WS phenotype [21,41].
It is also possible to isolate WS mutants without selecting for the ability to colonize the air-liquid interface, by using a reporter construct where mutation rates can be measured in a Luria-Delbrück fluctuation assay [21,43]. This has led to the realization that mutational hotspots play a major part in determining the outcome of experimental evolution in this model system [21,41]. Pseudomonas protegens Pf-5 also evolves similar morphotypes with mutations in the same main pathways [36]. However, it appears that mutation hotspots are not conserved between species, which could limit our ability to forecast evolution in one species based on an understanding of the GPM of another [36].

Estimation of the number of WS mutations
To examine the consequences of having major differences in the number of adaptive mutations per gene, we estimated the number of mutations that could produce the WS phenotype (Table 2). This GPM model is based on incomplete empirical data combined with an understanding of the molecular function of some of the proteins involved. It only serves to provide a realistic example of a model system with order-of-magnitude differences between the number of adaptive mutations per gene. It is not a statement about the true number of WS mutations per gene, which is difficult to measure experimentally, as we mention in the main discussion.
For the six genes with the smallest number (<5) of estimated mutations (Table 2, bottom), we base our estimates on experimentally observed mutations in a strain where the common wsp, aws and mws pathways have been deleted [12]. Given that some mutations in these genes were observed only once, this is likely to be an underestimate.
For dgcH, 24 different mutations were observed [12]. The mutational pattern makes it clear that there is a roughly 500 bp region where any disabling mutation, including small in-frame deletions, lead to a WS phenotype. Since many of these mutations were observed only once, we estimate that the number of possible WS mutations is 50 for dgcH.
Identifying all possible WS mutations in the common pathways would require characterization of thousands of independent mutants. Lacking such data, we base our estimates on the number of mutations observed [21,41,43] combined with knowledge that any loss-of-function mutation in wspF or awsX produce a WS type [41,54]. The wspF gene is about 1000 bp, so there are 3000 different possible base pair substitutions as every base can mutate to the three others. Roughly one third of these are synonymous mutations that are unlikely to have major effects on protein function, leaving 2000 base pair substitutions producing missense or nonsense mutations. Based on site-directed mutagenesis studies [20,[55][56][57][58], we suggest that at least 5 % of these 2000 mutations would severely compromise protein function, so that 100 different WS base pair substitution mutants would be possible in wspF. Additionally, there should be many possible small insertions and deletions that disrupt function. Thus, the 120 proposed possible WS mutations in wspF ( Table 2) are likely to be an underestimate.
We chose to estimate how high the number of possible WS mutations in the common and rare pathways should at least be. The total mutation rate to the WS phenotype, measured by fluctuation tests, is 10 −8 gen −1 [21], which with 500 estimated total mutations in Table 2 gives an average mutation rate of 2×10 −11 gen −1 . Given that two mutation hotspots in awsX and awsR account for 42 % of the total mutation rate [21], the median mutation rate is lower than the average.

WS mutation rate data for the aws operon
The mutation rate to the WS phenotype has previously been measured in fluctuation tests using a reporter construct [21]. We used the experimental data for the aws pathway in Appendix B (Table 3) to calibrate the log-normal DMR model.

Discussion of the MPC metrics
Here, we compare the metric that we use to summarize a given MPC by a single number, the unevenness index, to two other possible metrics, the Gini inequality coefficient and the sum of the two largest shares (SLS2). All metrics quantify the unbalance between the shares of the MPC and give a value between 0 (perfect balance) and 1 (maximal unbalance). All depend solely on the relative proportion of the MPC's shares and are insensitive to absolute mutation rates: they quantify the unbalance of the whole MPC, not the probability of any single mutational element. This article focuses on the unevenness index U, based on Shannon's diversity index H′: where n is the total number of mutational elements, and p i the probability of a particular element i (with all such probabilities summing to 1).

Alternative metrics
We present two alternative metrics and show that, in both cases, their range depends on the number of mutational elements, which makes them inconvenient when the total number of mutational elements may vary or be unknown as in the laboratory.
Gini inequality coefficient: the Gini inequality coefficient, used in economics, is based on pairwise comparisons between all mutational elements of the MPC: The Gini coefficient is 0 if and only if all mutation elements have the same probability. When mutations have a 100 % probability to come from a single site, it increases up to 1-1 /n, which gets closer to one if there are many mutation sites.
Sum of the two largest shares: the sum of the two largest shares of an MPC is the most intuitive measure of disbalance from an experimental point of view: The lowest possible value for the SLS2 is 2/n, which gets closer and closer to 0 when the number of mutational elements increases. It corresponds to the MPC is perfectly balanced. The SLS2 is one if and only if two mutational elements take up the entire MPC.

Relationship between the different metrics
In general, each single value for a given metric corresponds to multiple possible configurations for an MPC. This variation means that there is no exact correspondence between a value of the unevenness and the other MPC metrics.
However, Fig. 10 shows that this variation is moderate and that there are clear statistical correlations between the three metrics, when the number of mutations considered is fixed. All proposed MPC metrics capture in a single value the imbalance between mutation rates: the higher they are, the less equal the mutation rates. When the number of mutations increases, the Gini coefficient tends to give a more and more distorted image of the MPCs in that most MPCs with some substantial unevenness have similar, high Gini coefficients. Consequently, the Gini coefficient distinguishes balanced MPCs more effectively as well. By contrast, a high number of mutations makes the SLS2 correlate more linearly with the unevenness.
All in all, the approach we chose in the main text could use the Gini coefficient as well. Indeed, using the distribution of any MPC metric works in principle for all. Fig. 11 shows that this distribution is qualitatively similar for the unevenness and the Gini coefficient.

Unevenness and completion experiments' results
Reproducibility with a different number of mutations: Fig. 12 shows that the qualitative impact of the unevenness on the results of completion experiments is the same as in the main text even when the number of mutations considered changes.
The unevenness is relevant to analyse the results of completion experiments: The unevenness captures only a part of the variability of MPCs: for the same unevenness, many MPC configurations may be possible. Fig. 13 shows that the variability that is not captured by the unevenness metric has a moderate impact on the variability of completion experiments. Importantly though, the median number of observations required for completion remains similar. The mean may vary more. In all cases, measures of dispersion seem to be of the same order of magnitude. By contrast, Fig. 14 shows that variations in the unevenness capture dramatic changes in completion experiments' results, which can be shifted by an order of magnitude. This not only applies to their dispersion but also to their mean and median.
Theoretical results about the particular case of an even MPC

Minimum for the expected result of completion experiments
We model the expected number E{C m } of observations to gather in order to find all possible mutations as the expected waiting time in a generalized coupon collector's problem. Flajolet et al. [49] prove that: where m is the total number of possible mutations (coupons), p i is each mutation's probability, and x is the observation span (number of coupons collected). Here, we prove that, for any given number m≥2 of mutations (shares on the pie chart), the expected value is minimal in the equiprobable case, where all mutations have the same mutation rate.
Obviously, what we want to prove is implied by the fact that, for any set → p of m probabilities p i summing to one and for any x≥0: ( since no factor can be negative. That inequality is trivially true if any of the probabilities p i vanishes. But in general, this corresponds to a simple, convex optimization problem better formulated with a logarithm transform: with the Lagrangian function: ) .
The following, necessary Karush-Kuhn-Tucker conditions provide the solution: Thus, the equiprobable case minimizes the expected number E{C m } of observations required to find all possible mutations. Actually, an optimization approach similarly proves that the same holds for any number j (between 2 and m) of mutational elements to be found: the expected number of observations required is minimal in the equiprobable case.

Formulas for the case of an even MPC
In the case of an MPC with unevenness U=0, the expected number of observations required to discover all mutational elements is where γ ≈ 0.577 is the Euler-Mascheroni constant.
We can further approximate the standard deviation using the formula Moreover, the number of additional observations required to observe a new, k+1th element after discovering k different mutational elements is geometrically distributed, with expectation m/(m−k).
Modelling the discovery of mutational elements from an even MPC as the coupon collector's problem in mathematics, the even case further allows for an estimation of the total number of mutational elements m given the number of elements already discovered k and the total sample size obs of observations gathered. Indeed, Dawkins (1991) shows that the total number m of mutational elements achieving the maximum likelihood satisfies obs = m m + m m−1 + · · · + m m−k+1 = m where H m and H k-1 are the m-th and k-1-th harmonic number respectively.
Impact of the number of mutations on distribution of the total resulting mutation rate (TMR) Table 4 summarizes the theoretical impact that the number of mutations has on the distribution of the total resulting mutation rate (TMR).

Theoretical discussion on evolutionary repeatability and log-normal DMR
Our results agree with several theoretical expectations regarding the repeatability of evolution [4] (p. 159-161). In particular, the formalism of a log-normal DMR perfectly fits the expectation that the probability of observing parallel evolution strongly depends (main text, Fig. 8) on the coefficient of variation of the DMR, modelled by parameter σ, and not on the absolute mutation rates, modelled by parameter μ. In the log-normal DMR framework, the probability Pr para of parallel evolution [4] (eq. 8.8, p. 160) reduces to the simple expression: where n is the number of evolutionary possibilities. It allows us to interpret the aggregation of molecularly defined mutations, expected to increase evolutionary parallelism [4] (p. 161), in terms of different scales: thus, it becomes obvious that a scale with a lower number n of mutational elements (gene or metabolic pathway, as opposed to the nucleotide level) increases the predictability of evolution, as observed experimentally [59].
Our numerical quantification of the repeatability of experimental evolution illustrated the balance between the determinism of the MPC-setup and the stochasticity of the DMR-variance. It explains why experiments always observe common mutated genes, while rare pathways may only be found in an unreliable fashion among replicates, or across species. As a consequence, we should interpret mutational pathways found in several similar experiments as corresponding to a large target size (common mutations), while pathways found by some studies only should have a significantly lower target size (rare mutations). Repeatedly finding rare mutations in some studies only is in turn strong evidence that the observation span is too narrow to conclude, suggesting that all studies taken together remain underpowered.