The rich phase structure of a mutator model

We propose a modification of the Crow-Kimura and Eigen models of biological molecular evolution to include a mutator gene that causes both an increase in the mutation rate and a change in the fitness landscape. This mutator effect relates to a wide range of biomedical problems. There are three possible phases: mutator phase, mixed phase and non-selective phase. We calculate the phase structure, the mean fitness and the fraction of the mutator allele in the population, which can be applied to describe cancer development and RNA viruses. We find that depending on the genome length, either the normal or the mutator allele dominates in the mixed phase. We analytically solve the model for a general fitness function. We conclude that the random fitness landscape is an appropriate choice for describing the observed mutator phenomenon in the case of a small fraction of mutators. It is shown that the increase in the mutation rates in the regular and the mutator parts of the genome should be set independently; only some combinations of these increases can push the complex biomedical system to the non-selective phase, potentially related to the eradication of tumors.

Among other findings, it has been shown that the small mutation rates for the mutator gene lead to the mixed phase with a small fraction of the mutator allele, but an increase in these rates turns the system to the mutator phase. The infinite population model with a constant mutation rate has been solved in ref. 29 (see also the recent review 30 ).
When we map the evolutionary dynamics to statistical physics, the genome length plays the role of lattice size. In statistical physics, different phases arise as different analytical expressions for free energy at a large lattice size. Similarly, we need the large genome length to define the evolutionary dynamics phases. In this paper, we derive these different expressions for the mean fitness in mixed and mutator phases. The most realistic case, when back mutation for the mutator gene can be neglected due to its small impact during the time of the experiment, gives a very simple situation: in a mutator phase in a steady state, there is a zero fraction of normal sequences, even for small genome length, but there is a finite fraction of normal sequences in the mixed phase.
Several different classes of evolution models form the basis of mutator phenomenon analysis: Class I Phenomenological models 5,6 . Such models consider deterministic or stochastic equations for a set of several genotypes. Models in this class are relatively simple but useful for describing the specific types of experiments, e.g., investigations that show how the mutator mechanism competes with the ordinary evolution scheme with a high mutation rate 5,6 .
In ref. 5, a simple infinite population model without selection has been exposed, and it has been assumed that cancer arises after a given number of mutations in oncogenes. The probability of the start of cancer development at a fixed time has been calculated for the mutator mechanism as well as without mutators. The accuracy of the derivations has been well controlled using a small parameter: the ratio of the ordinary mutation rate to the mutation rate due to mutators. A different phenomenological model has been proposed without derivation details 6 , which discusses three types of fitness landscapes. This concept is of interest, and we consider one of the fitness landscapes in our work. However, the phenomenological models as a methodology cannot describe the different phases of the evolutionary dynamics accurately.
Class II Infinite population microscopic models. Models in this class are defined in the binary sequence space with a finite genome length 16,17 . The Crow-Kimura 18,22,31 and Eigen [23][24][25]32 models, which are the quasispecies models mentioned above, have been widely investigated over recent decades, especially in the context of virus evolution. Mathematically, each of these models is described via a system of nonlinear master equations, which can be mapped to the chain of linear ordinary differential equations with some nonlinear transformation. In the Crow-Kimura setting, mutation and selection are parallel processes, while in the Eigen model, mutation is connected to selection. In smooth fitness landscapes, the steadystate distributions of the Crow-Kimura and Eigen models can be mapped to each other. In the large genome limit, the Crow-Kimura model is equivalent to a discrete-time Eigen model 33 and is very close to branching processes. Moreover, the Wright-Fisher model 34,35 for large populations can be mapped to a discrete-time Eigen model 33 .
Class III Finite population microscopic models 36 . In this case, models are also defined in the binary sequence space 36 . In ref. 4, numerical simulations have been performed for the Moran model by considering the Muller ratchet in the presence of mutators. In ref. 7, the growth population version of the Wright-Fisher model 34,35 for the four-dimensional landscape has been formulated; there are neutral, deleterious and advantageous mutations, as well as a sufficient number of mutations in the part of the genome that brings about the mutator effect. According to these results, the mutator phenotype evolves only at a moderate level of selection for the advantageous mutations.
Class IV Infinite population and infinite number of genotypes models without backward mutations 29,30 . These models are useful to describe the virus evolutionary dynamics for a short period of time. Compared with the microscopic models, which represent different possible phases and collective effects in evolution, class IV models are less fundamental.
In the current paper, we consider the microscopic infinite population models (class II). The discrete-time version of the Eigen model [23][24][25] was originally introduced to describe the self-replication of macromolecules at the origin of life (see also 37,38 ). In this model, the selection and mutation happen together (mutation directly follows replication), while in the Crow-Kimura model, originally proposed to describe biological speciation, mutation and selection happen in parallel. In both models, the genome is represented by a chain of L letters (genes), which take the values ± 1, similar to the Ising spin 39,40 . There is a mutation rate of μ per letter. We take the sequence with all "+ " letters as the reference sequence. All of the genomes with the same total number l of "− " letters (l point mutations from the reference sequence) have fitness r l ≡ f(1 − 2l/L), where f(x) as a fitness function, x = 1 − 2l/L. For the initial symmetric distribution, the model can be described via differential equations of L + 1 variables P l , corresponding to the total probabilities of the sequences with l mutations in the genome. The mean fitness of the Crow-Kimura model has been calculated using algebraic methods 41 and the Hamilton-Jacobi equation (HJE) method 42,43 . The exact dynamics had been derived using the HJE 44 . Here, we construct a mutator gene model on the basis of the Crow-Kimura model 18,22,31 and solve it for the general fitness case, by providing also the solution for the Eigen model version of the mutator phenomena.
Our solutions for both models are consistent with each other. We find the analytical solution for the mean fitness for the general fitness function, including the multidimensional case, and clarify and calculate the order parameters and the phases of the model, which was an open problem until now. Previous work 5,6 has been focused on the comparison between the efficiencies of cancer development via ordinary evolutionary dynamics and the evolutionary dynamics using the mutator mechanism. Our quantitative results of the phase structure (shown in Fig. 1) depict the borders of different phases. The procedure to obtain Fig. 1 is described below.
The phase structure may have biomedical implications. In ref. 9, it was proposed that the transition from the mixed phase to the non-selective phase is associated with the possible eradication of the tumor. We provide the solution for two types of fitness landscapes; a symmetric case, where the fitness landscape depends on the total number of mutations from the reference sequences, and a random fitness landscape.
Key technical terms and notations are listed in Tables 1 and 2, respectively.

Results
The model with a symmetric fitness landscape. In this section, we consider a modification of the Crow-Kimura model 18,22,31 for evolutionary processes under the influence of a mutator gene. The chain of (L + 1) genes defines the genome, which can be formally subdivided into two parts; a regular part with L genes and a mutator gene, whose mutations change the fitness and the mutation rate in the regular part of the genome. We represent each gene with two alleles by ξ θ = ± 1, θ = 0, 1, … , L. The regular set of genes is denoted by the sequences The mutator gene in the state ξ 0 = + 1 determines the normal dynamics of the regular part with a mutation rate μ 1 . This wild-type form of the gene corresponds to the most common phenotype in the population. If the state of the mutator gene changes to ξ 0 = − 1, then the evolutionary process switches to a different regime with a mutation rate μ 2 . The mutant-type gene defines an altered phenotype, which is called a mutator phenotype. We use α 1 and α 2 to designate forward (ξ 0 : +1 → − 1) and backward (ξ 0 : − 1 → + 1) mutation rates of the mutator gene. We assume that the fitness landscape is symmetric, which means it depends solely on the number of mutations, l, from the reference sequence (without loss of generality, we take the reference sequence to be S = (+ 1, … , + 1)). For this purpose, it is natural to use the notion of Hamming distance between S and Σ ξ ≡ = − θ θ = S l d L : ( because all sequences in the same Hamming class have the same value of fitness. To facilitate further analysis, we introduce the variable for the mean value, . The state of the system is represented by two probability distributions. We use P t ( ) l to denote the relative frequency of normal genome sequences (wild-types) with l mutations (in the l-th Hamming class) at the time moment, t, and Q t ( ) l for the relative frequency of sequences with an increased mutation rate (mutator-types) with l mutations, and we have a probability balance condition for any time moment, All admissible transitions between system states can be seen as arrows in Fig. 2 45,46 . The parameters of the model are α 1 = a, α 2 = 0, μ 1 = 1, and μ 2 = μ. There are three phases: mixed phase with 0 < s < 1, 0 < q < 1, non-selective phase with s = 0, 0 < q ≤ 1 and mutator phase with 0 < s, q = 1. The border between non-selective and mutator phases is given by μ = J, the border between non-selective and mixed phases is given by a = J − 1, between mixed and mutator phases is given by a + 1 = μ line. From the bio-medical perspectives we distinguish the mutator I and mutator II, mixed I and mixed II subphases. From the mutator I, the system transforms to the non-selective phase simply increasing the μ. From the mixed II the system transformers to the non-selective phase simply increasing the a ≡ α 1 . From the mutator II and mixed I subphases we need change both a and μ to transform the system to the non-selective phase. We have the same picture in case of Eigen model with a fitness A for the peak sequence and fitness 1 for other sequences. We have for the mixed phase R = e −(h+γ) A, for the mutator phase R = e −μγ A and for the nonselective phase R = 1. The border between non-selective and mutator phases is given by μγ = lnA, the border between non-selective and mixed phases is given by h = lnA − γ, between mixed and mutator phases is given by h = (μ − 1)γ line. To describe the evolution of the probability distribution under consideration, we use the following system of 2(L + 1) differential equations: Here, f(x l ) is a fitness function for the normal genome sequence, and g(x l ) is a fitness function for the mutator sequence. As long as changes in the mutator gene significantly affect both the fitness and mutation rate in the regular part, fitness functions f(x l ) and g(x l ) can be different, and μ 2 is commonly 10-100 times larger than μ 1 2,10 . Note that the rates α 1 and μ 1 also differ because that α 1 is proportional to the number of possible mutagenic loci. The coefficients (L − l + 1) and (l + 1) appear in the Eq. (1) transition terms according to the combinatorial formulae for Hamming class probabilities 41,47 . To investigate Eq.(1), we consider a nonlinear transformation 48 : It is easy to show that P l and Q l are the solutions of the following system of linear equations: In Eqs (1) and (2), P l and Q l are the probabilities under the normalization constraint, while P l and Q l of Eq. (3) are not normalized.
In this paper, we focus on the following characteristics of the model in a steady state: the mean fitness R, the total surplus s (the expected value of x l ), the surplus for the wild-type part of the population s 1 and for the mutator-type part s 2 , and the fraction of the mutator sub-population q: The surplus s shows the average number l of mutations in the population, = − . l L s (1 )/2 The maximum of the distributions P l and Q l is attained at points L(1 − s 1 )/2 and L(1 − s 2 )/2 44 . Therefore, the value of the mean fitness for wild-type sequences is equal to f(s 1 ), and that for mutator-type sequences is equal to g(s 2 ), as the distribution is narrow due to the large constant L in the exponent in Eq. (5) below. Assuming a smooth distribution, we obtain the following from Eqs (1) and (2): Calculating the mean fitness in growing populations is crucial for understanding evolutionary dynamics and is the primary concern of this investigation. Depending on the values of the main model parameters, we can obtain different analytical expressions for the mean fitness, which correspond to different phases of the system. The key problem is to first solve the large L limit, and clarify the following items:

Lethal mutations
A mutation that results in the premature death of the organism carrying it.
Selective phase A state of evolutionary system of binary sequences, when the majority of the population is concentrated near the peak genome (with the maximum fitness value) in the sequence space.
Non-selective phase A state of evolutionary system of binary sequences, when the population is diluted in the sequence space.
Solvable cases of fitness landscape Random fitness landscape and symmetric fitness landscape.

Random fitness landscape
The values of fitness defined on a sequence space are random numbers with some distribution.

Symmetric fitness landscape
The values of fitness defined on a sequence space by a function of the total mutation number.

Mutator phase
The absolute majority in the population has a mutator-type (mutator allele of the mutator gene) after long enough time of evolution.
Mixed phase There is a finite fraction q of mutator sequences in the population after longtime evolution (0 < q < 1).  These four questions are discussed in this paper; we show that both negative and positive answers exist for each. The second question (B) leads to the most considerable distinction in the analysis. In the Results section, we first investigate the case s 1 = s 2 ; then, in the Methods section, we describe a more complex situation with s 1 ≠ s 2 . Moreover, contrary to the classical evolution models for quasispecies, here, the evolutionary picture depends on the genome length.
The case of forward and backward mutations of the mutator gene. Let us consider a smooth solution of Eq. (3) at the limit L → ∞ after the simplification mentioned in the previous section. We assume the following ansatz 42,44,49 : l where we denoted x = 1 -2l/L for large values of L. This form of expressions for P l and Q l allows the application of the HJE method to the system of equations (5) (see Methods). Putting our ansatz (5) into Eq. (3), we get a system of two equations for the variables v 1 , v 2 , du/dx and du/dt. Looking at these equations as a system of two linear algebraic equations for v 1 and v 2 , we get a zero determinant condition for the non-zero solution of v 1 and v 2 , so the final equation contains only du/dt and du/dx. The fact that L disappears in the Hamilton-Jacobi equation in the large L limit shows the correctness of our ansatz. We derive the mean fitness investigating the Hamiltonian without solving it and calculating the function u(x) in the steady state. ξ θ , θ = 0,1, … , L ξ 0 = ± 1 -the state of the mutator gene, ξ θ = ± 1 (1 ≤ θ ≤ L) -the states of genes in the regular part of the genome.
r l The fitness value after l mutations in the Crow-Kimura model.
The probability of the wild-types and mutator-types with l mutations.
x l = 1 − 2l/L The average gene state in the genome sequence that corresponds to l mutations from the reference sequence.
The fitness functions of the wild-type and mutator-type genome sequences which assigns the reproductive rate to the average gene state x l μ 1 , μ 2 The mutation rate for the wild-types and mutator-types. μ The substitution used to denote briefly: μ 2 = μ for the choice μ = 1.
The transition rates from the wild-type to mutator-type and vise versa. α The substitution used to denote briefly: α = α 1 for the choice α 2 = 0.
s The surplus of the whole population: the average state of the gene.
The surpluses for the wild-types and mutator types.
l The average number of mutations, which is equal to The total probability of the mutator-types.

R
The mean fitness.
The relative probabilities of the wild and mutator types after l mutations, x = 1 -2l/L, Eq. (5).
The potential of evolutionary dynamics, Eq. (26). k The coefficient in the linear fitness, f(x) = kx.

J
The peak fitness in the single peak fitness landscape for the Crow-Kimura model.

U
The total number of mutations for one generation in population genetics. h The transition probability wild-type to mutator-type for one generation in population genetics. ε The time period of one generation.
The probabilities of the i-th sequences for the wild type and mutator type in the Eigen model.
The transition probabilities between the sequences in wild-type and mutator-type in the Eigen model.
The probabilities of errorless replication per nucleotide for the wild-type and mutator-type.
The probabilities of errorless replication for the genome the wild-type and mutator-type.
The Hamming distance between the i-th and j-the genomes A The peak fitness in single peak fitness landscape for the Eigen model. τ The generation period of the virus.  We obtain the following formulae for the mean fitness R using the potential function V ±
Having the expression for the mean fitness R, we can calculate the surplus of distribution s from the following equation (see Methods): It is important to note that Eq. (7) is valid whether s 1 = s 2 or not. However, for α 1 · α 2 ≠ 0, we have that s 1 = s 2 . In this case, substituting the solution of R and s, we get the following system of equations for the surplus s and ratio v 2 (s)/v 1 (s) in the steady state (see Methods) : In the limit of large t and L, we have q = v 2 (s)/(v 1 (s) + v 2 (s)). For the case f(x) ≡ g(x), we get a simple relationship from (8): The model with uni-directional mutations of the mutator gene. The probability of back mutation (from mutant to wild-type) is small and can be neglected in finite populations. Let us generalize this idea, and consider the infinite population model with α 2 = 0, α 1 = a (μ 1 = 1, μ 2 = μ). In this case, the expressions obtained for potential functions V ± (x) are still valid, but it is necessary to take into account both branches of the potential, V + and V − , and define the mean fitness as their maxima.
Taking V − (x) in Eq. (6), we get the mutator phase (here and below, we use terminology from 17 ): The latter equation for the surplus has the same form as for the ordinary Crow-Kimura model 41 . It can be derived directly from system (1) if we assume that the vast majority of the population has the mutator allele and omit the contribution of P l in the second equation of Eq. (1). In the mutator phase, the mean fitness is defined by the fitness landscape for the mutator genome. Further, we will consider the case when two fitness landscapes are identical, e.g., f(x) = g(x).
Taking V + (x) in Eq. (6) under the condition α 2 = 0, α 1 = a, μ 1 = 1, μ 2 = μ leads to the following expression for the mean fitness of the mixed phase (see Methods): x 2 Figure 2. The scheme of available transitions for the system states (arrows denote transitions). The upper chain corresponds to the genome without a mutator allele (wild-type); the lower chain corresponds to the genome with a mutator allele (mutator-type). l is the number of mutations in the regular part of genome.
Scientific RepoRts | 6:34840 | DOI: 10.1038/srep34840 In the mixed phase, the mean fitness is defined by the wild-type fitness function and the mutation rate. However, numerical calculations for the linear fitness landscapes f(x) = kx or simple form quadratic fitness landscapes f(x) = kx 2 /2 show that the vast majority of the population has the mutator allele for a large enough genome length. It is worth pointing out that this result (the majority of population has a mutator-type) is correct for any smooth fitness landscapes for a long enough genome length. We verify that for the single peak fitness function, there is a finite fraction of wild-type sub-population at the large genome limit. Such results will be presented below.
Two phases have been found for the discrete-time Eigen model in ref. 17, where the mean fitness has been calculated for f(x) = k(x − 1), k ≫ 1. This model can be exactly mapped into the Crow-Kimura model at the large L limit, when the mutation rate per gene is fixed. Figure 3 gives a comparison of our analytical results for the mean fitness R with the numerics. Figure 3 shows that our analytical results are quite reliable. The single peak model with uni-directional mutations of the mutator gene. Let us consider a single peak fitness function, f(x = 1) = g(x = 1) = J and f(x ≠ 1) = g(x ≠ 1) = 0. We have three possible phases in a steady state (Fig. 1). We consider either the steady state of Eq.  25 . We provide the following expressions for the mean fitness in the mutator phase R mu , mixed phase R mix and non-selective phase R ns ; see Fig. 1: We first calculate P 0 considering the equation for dP 0 /dt in Eq. (1) and ignore P 1 term. The fraction of the wild-type sequences in the l-th Hamming class in the population is P l = P 0 (1/J) l , which can be derived from Eq. (55) This expression allows us to calculate the mutator allele probability q using the equivalence 0 At L = 5000, the accuracy of our analytical result is approximately 0.1%, as seen in Fig. 4. Figure 4 and Eq. (14) show that the fraction of the wild allele does not approach to 0 even for a large L.

The model with uni-directional mutations of the mutator gene and a smooth fitness landscape.
As previously mentioned, we have the mixed and mutator phases. In the mutator phase, the numerics give that the mutator sub-population dominates with q = 1, even for finite L, so we can ignore the first chain of transitions and reduce the system to the standard Crow-Kimura model with mutation rate μ 2 and fitness function g(x).
Let us now consider the mixed phase. If α 1 = a, α 2 → 0, then Eq. (8) gives q → 1 for this case. Assuming an ansatz ≡ = P t P x t L u x t ( ) ( , ) exp( ( , )) l , we obtain: 1 The smooth curves in Figs 5 and 6 obtained by numerical calculations illustrate that the 1 − q strongly depends on the genome length, L. For the small genome length L, Fig. 7 obtained by numerical calculations support the behavior ∼ . q a (16) As has been derived in ref. 29, for the large L, we get 1 − q ≪ 1. In the Methods section, we calculate (1 − q) for the system with a general smooth symmetric fitness function and parameters a, μ. For linear fitness functions f(x) = kx and small values of a and μ 1 /μ 2 ≡ 1/μ, we obtain a simple expression:  This result is well supported by the numerics, as seen in Fig. 5 and Table 4 for 1 − q ≪ 1, when the exponent is a large negative number. The transition between two regimes (16) and (17) is for the values of L where the expression of the exponent in Eq. (17) is ~1.
In population genetics, the fitness difference for one mutation is denoted by S and corresponds to the linear coefficient k; U is the total number of mutations per generation. We use the equalities U = εμ 1 , S = ε2k/L and h = εα 1 , where ε is the discrete time step of one generation and h is the transition probability to the mutator allele during one generation. Applying the discrete-time Eigen model to Crow-Kimura model mapping, we get a condition for the exponent: Otherwise, the numerics give q ≪ 1 instead of Eq. (17). The linear fitness case can be solved exactly. We introduce the generation functions Q(z) = ∑ l Q l z l and P(z) = ∑ l P l z l , so rewriting Eq. (1) with these variables gives a system of equations for the generation functions: It is easy to calculate R from the first equation with the constraint that P(z) is an L-th order polynomial. Another constraint is the probability balance condition, P(1) + Q(1) = 1. Solving this system, we can calculate P l , Q l with relative accuracy O L (1/ ). The generation function method has been applied before for investigation of the Crow-Kimura model 51 .

The Eigen model with a mutator gene and random fitness landscape. The probability distribu-
tion for the Eigen model is defined for genome sequences with the wild and mutator type (p i and q i , respectively, 0 ≤ i < 2 L ). Consider now the following system of equations where r i is the fitness function, Q ij and Q ij are probabilities of mutation from S j to S i , , w is the probability of errorless replication per nucleotide for sequences with normal mutator gene and 1 − e −h ≈ h is the transition probability from the wild type to the mutator type. The diagonal terms of the mutation matrix are w) is the mutation parameter in the Eigen model, , ŵ is the probability of errorless replication per nucleotide for sequences with a mutator allele. The mutation parameter for the mutator type is γ = −L w (1 ). Here, d(i, j) is the Hamming distance between two sequences, S i and S j .   For the single peak fitness function with r 0 = A and r i = 1, i ≥ 1 the mean fitness is similar to the one obtained for the Crow-Kimura model: QAe −h for the mixed phase, Q A for the mutator phase and 1 for the non-selective phase. The random fitness landscape is a reasonable approximation for real biological data 52 . However, it was shown that it is almost equivalent to the single peak fitness landscape 33 . If we assume a log-normal distribution of Wrightian fitnesses, r i : i i 2 2 then, we have the maximal value of fitness = A c ln 2 33 , while for the majority of the population, fitness is equal to 1. Using the formulae for the single peak fitness case from the Methods section, we obtain: It is interesting that there is no μ dependence in our expression. Taking the case of strong selection, s ≫ U, we obtain the situation when q ≪ h/U, as in the case of refs 53 and 54. According to ref. 53 (Table 1)

Discussion
In this work, we have investigated the mutator phenomenon in the evolutionary process in a stable environment for infinite population size. Previous studies have obtained some results for a microscopic model with uni-directional mutation (wild-type → mutator-type) and linear Malthusian fitness in a 1-dimensional landscape. The mutator and mixed phases have been considered theoretically in the literature, so we continued to focus on the phase structure in this research. We have calculated exactly, for the first time, the phase structure of the model for the general fitness function, the analytical expression for the mean fitness for any scheme of mutations between normal and mutator states in the d-dimensional fitness landscape (see Methods). Moreover, we have discussed the important case of the random fitness landscape. We have derived the formula for the mutator-type probability, q, for the large genome length. This mathematical problem is highly non-trivial, as it is non-perturbative via the value of the back mutation rate: see Eqs (8,9) and (17). Equation (17) shows that the mutator-type takes over the entire population for the model with a linear fitness function. This effect has been observed experimentally 27,56 . In ref. 17, an expression for the mutator allele probability has been derived when the mutator-type is the minority and the fitness function is linear. Our results confirm these findings 17 ; furthermore, our method provides an analytical solution for the model in this case. When condition (18) is broken, the wild-type takes over the population, which has also been proven experimentally 54 . The value of q depends on the genome length (q ≪ 1 or 1 − q ≪ 1; see Fig. 5), and for the surplus, both alternatives hold (s 1 = s 2 or s 1 ≠ s 2 ). Thus, the finite size corrections could be large, even for the large genome length L ~ 10 4 .
A careful investigation of the evolutionary models is essential for biological applications. Looking at possible phases with order parameters is needed to provide an appropriate level of accuracy. As long as cancer is assumed to be a collective phenomenon, even with some collective intellect 57 , it is necessary to use analytical methods for complex systems. Let us compare our results with those by Desai and Fisher 29 who solved a phenomenological model, while we rigorously solved the microscopic model. For "small" genome sizes, the mutator fraction is proportional to the small parameter α, as has been found in ref. 29. What we claim is q → 1 for sufficiently large genome length for any smooth fitness function. The latter case is a non-perturbative phenomenon, and it is not represented in the approach of 29 , as they neglected the backward mutation between different classes of wild-types. The approach by Desai and Fisher 29 cannot be applied for the steady state of our model, as it is completely wrong for that case; however, Desai and Fisher obtained 29 a useful (especially for the cancer case) analytical estimate for the short-time dynamics for the involved case of general fitness function.
Theoretical results concerning the error threshold 58 have a practical impact for viruses and cancer, and it has been recently suggested to use an error catastrophe as a new therapeutic strategy for the solid cancer treatment 10 . A phase structure shown in Fig. 1 qualitatively describes the different stages of cancer; the mixed phase s ≠ 0 characterizes the early non-aggressive version of tumor, and the mutator phase is an aggressive stage of cancer, possibly with metastasis. To eradicate the tumor according to the strategy suggested in ref. 9, it should be transferred to the non-selective phase s = 0. As seen in Fig. 1, there are four different situations. In the mixed II subphase, we can push the tumor to the non-selective phase (the eventual goal of error-catastrophe therapy) to increase the forward mutation rate for the mutator gene α 1 , which is responsible for the genome stability. In the mutator I subphase, we should increase the mutation rate μ to push the tumor to the non-selective phase. In the mutator II and mixed I subphases, we need to increase both versions of mutation rates. The phase structure has the same form for the random fitness case and a similar form for other fitness landscapes.
Further research should focus on the finite population version of the mutator model, as the population size can drastically affect the evolutionary dynamics 8 . The problem has already attracted attention 19 , but it is reasonable first to solve a simpler case of infinite population limit.
To summarize, let us directly reply to the four questions stated in the beginning of the Results section.
• A. There is a phase transition between two phases, selective and non-selective, for the random fitness landscape and for the last two types of fitness landscapes considered in ref. 6

Methods
To perform the analytical investigation Eq. (3), we use the HJE approach. Ignoring O(1/L) correction terms and using Eq. (5) and the formulae P l±1 = v 1 e Lu±2u′ and Q l±1 = v 2 e Lu±2u′ in Eq. (3), we get:  Deriving V ± (x) according to Eqs (25) and (26), we get Eq. (6) to calculate the mean fitness R. At the maximum point x = s of the distribution, we take ′ = ′ = − = + u u H R 0, t in Eq. (26) to obtain Eq. (7) for the surplus s. Putting to Eq. (24) the expression for the R, and looking the point x = s, we derive Eq. (8).
If f(x) = g(x), α 1 = a, α 2 = 0 (μ 1 = 1, μ 2 = μ), then, taking into account the ansatz P l = e Lu(x,t) , we derive an equation: x e x e ( ) 1 2 The mutator-type probability. Consider the general case s 1 ≠ s 2 and parameters α 1 = a, α 2 = 0, μ 1 = 1, . For Q l ~ P l , we consider the second equation in (1) with the substitution P l = e Lu(x,t) in a steady state. We derive an equation for Q l : We can write an approximate solution: l If P l ≪ Q l , then, substituting =Q Lu x t exp[ ( , )] l , we get the following equation for û: The expression for s 2 can be derived from R = f(s 2 ). We took the solution (27) at the interval [s 3 , 1] (where s 3 is a switching point between s 1 and s 2 ) and the solution (30) at the interval [s 1 , s 3 ]. We use the concatenation of these two solutions, assuming the smoothness of the derivative ′ = ′ u s u s ( ) ( ) 3 3 , thus we obtain the following equation for s 3 : Putting a condition u(s 1 ) = 0 (which is equivalent to ∑ l P l = 1) and ∑ ∼ Q Q l l l 0 , l 0 = (1 + s 2 )L/2, we get: Here, u′ and ′ u are calculated using Eqs (27) and (30), respectively. The expression in the exponent in Eq. (32) is exact at the limit L → ∞ .
These analytical results allow us to predict, which allele takes over the population. The analysis of , then the mutator-type dominates. If 1 − K ≪ 1, then the wild-type takes over the population.
The case of small mutation rate for mutator allele a. Let  We are looking the case of large μ, therefore, we can replace s 3 by s 1 : This solution is correct when the following condition is met: The mean fitness for the model with d-dimensional fitness landscape and mutator gene. The multidimensional model has been discussed in ref. 19, where two parts of genome were considered; the first part is with lethal mutations and the second part is with deleterious and advantageous mutations. The genome is formally fractured into d parts with the lengths Ly i . For the wild-type, the fitness is equal to ∑ i f(x i ), and for the mutator-type, the fitness is ∑ i g(x i ), where x i = 1− 2l i /(Ly i ) and l i is the number of "− " spins (alleles) in the i-th part of genome. We denote mutation rates as μ i for the wild-type and ν i for the mutator-type, see ref. 49 for the multidimensional fitness model without a mutator gene. For these modified expressions, we calculate the mean fitness R as a maximum of the potential function V ± , defined as: x h x ( 1 1) 2 For the single peak fitness, we obtain for the selective phase 58 : = .
− R QAe (41) h For the non-selective phase, we simply have R = 1.
In the mutator phase, we ignore the first chain of equations in Eq. (20) and get R as the mean fitness of the Eigen model with the value: As all the sequences, besides p 0 , q 0 , have a fitness value 1, we have an equation (q 0 + p 0 )A + (1 − q 0 − p 0 ) = R. Therefore