Mutation rate of SARS-CoV-2 and emergence of mutators during experimental evolution

Abstract Background and objectives To understand how organisms evolve, it is fundamental to study how mutations emerge and establish. Here, we estimated the rate of mutation accumulation of SARS-CoV-2 in vitro and investigated the repeatability of its evolution when facing a new cell type but no immune or drug pressures. Methodology We performed experimental evolution with two strains of SARS-CoV-2, one carrying the originally described spike protein (CoV-2-D) and another carrying the D614G mutation that has spread worldwide (CoV-2-G). After 15 passages in Vero cells and whole genome sequencing, we characterized the spectrum and rate of the emerging mutations and looked for evidences of selection across the genomes of both strains. Results From the frequencies of the mutations accumulated, and excluding the genes with signals of selection, we estimate a spontaneous mutation rate of 1.3 × 10−6 ± 0.2 × 10−6 per-base per-infection cycle (mean across both lineages of SARS-CoV-2 ± 2SEM). We further show that mutation accumulation is larger in the CoV-2-D lineage and heterogeneous along the genome, consistent with the action of positive selection on the spike protein, which accumulated five times more mutations than the corresponding genomic average. We also observe the emergence of mutators in the CoV-2-G background, likely linked to mutations in the RNA-dependent RNA polymerase and/or in the error-correcting exonuclease protein. Conclusions and implications These results provide valuable information on how spontaneous mutations emerge in SARS-CoV-2 and on how selection can shape its genome toward adaptation to new environments. Lay Summary: Each time a virus replicates inside a cell, errors (mutations) occur. Here, via laboratory propagation in cells originally isolated from the kidney epithelium of African green monkeys, we estimated the rate at which the SARS-CoV-2 virus mutates—an important parameter for understanding how it can evolve within and across humans. We also confirm the potential of its Spike protein to adapt to a new environment and report the emergence of mutators—viral populations where mutations occur at a significantly faster rate.


BACKGROUND AND OBJECTIVES
Mutation is the principal process driving the origin of genetic diversity. The mutation rate is a function of replication fidelity and represents the intrinsic rate at which genetic changes emerge, upon which selection can act. The substitution rate, instead, is a measure of mutation accumulation in a given period of time and embeds the effects of selection [1]. These rates and the spectrum of mutations that emerge and spread are fundamental to our understanding of how organisms evolve and how new variants are purged or established in natural populations.
Laboratory microbial evolution experiments allow to determine how fast mutations accumulate [2,3], and combining them with high-throughput sequencing is one of the best methods to estimate mutation rates, determine how they vary along the genome [4] and study the extent to which convergent evolution occurs [5,6].
In DNA-based microbes, the genomic mutation rate per-cell per-generation, estimated in laboratory conditions, is close to a constant, but can span orders of magnitudes when measured per-nucleotide [7]. In RNA viruses, the replication fidelity also varies remarkably [8,9]. The basic mutation rates, expressed as nucleotide substitutions per-site per-cell infection (s/n/c), range between 10 À6 and 10 À3 for the several positive ssRNA viruses which have been studied [10]. Importantly, our current knowledge of the mutation rate of the human beta-coronavirus SARS-CoV-2-cause of the COVID-19 pandemic [11]-is based on estimates from different coronaviruses [10,[12][13][14]. While there are estimates of the per-site per-year mutation rate of SARS-CoV-2 ($1.12 Â 10 À3 nt À1 year À1 [15]), we still lack a direct quantification of the amount of mutations that can be generated in a single infection cycle. Here, via experimental evolution in Vero cells of two natural variants of SARS-CoV-2 [16], followed by whole genome sequencing, we characterized the spectrum of the emerging mutations and estimated their spontaneous mutation rate per-site per-cell infection. We further identified specific targets of selection that occurred during the experimental evolution. Notwithstanding that the rates and the mutations observed here might depend on the specific laboratory conditions (which are far from those in the natural environment of SARS-CoV-2), our study provides important information about the basic biology of this virus and can motivate future researchers to further investigate its adaptive potential and genetic constraints.

Virus growth and in vitro assay
Vero E6 (African green monkey, Cercopithecus aethiops kidney epithelial cells, ATCC V R CRL 1586 TM ) cells were cultured at 37 C and 5% CO 2 in Minimum Essential Medium (MEM 1Â, Gibco V R ) supplemented with 10% fetal bovine serum (FBS), penicillin (100 units/ml) and streptomycin (100 mg/ml) þ fungizone. The two clinical isolates Portugal/PT0054/2020 and Portugal/PT1136/2020, isolated at the National Institute of Health Doutor Ricardo Jorge (INSA), were used to produce the ancestors of the experimental evolution, CoV-2-D and CoV-2-G, which seeded the two laboratory evolution experiments. For this, the initial SARS-CoV-2 stock was produced by infecting Vero E6 cells (freshly grown for 24 h) and incubating the cells for 72 h. The culture supernatant was stored in aliquots at À80 C. The Median Tissue Culture Infectious Dose (TCID 50 ) of viral stock was calculated according to the method of Reed and Muench [17]. All work with infectious SARS-CoV-2 strains was done inside a Class III microbiological safety cabinet in a containment level 3 facility at the Centre for Vectors and Infectious Diseases Research (INSA).
From the stored stocks, two 96-well plates fully inoculated with 50 ml of Vero E6 cells (2.0 Â 10 4 cells) grown for 24 h were infected with 50 ml of the SARS-CoV-2 strains viral suspension (2.0 Â 10 3 viruses) at a multiplicity of infection of 0.1. MEM supplemented with 10% FBS, penicillin (100 units/ml) and streptomycin (100 mg/ml) þ fungizone was added to each well (50 ml) and the plates were incubated for 24 h. Each well had a final volume of 150 ml. Every day, for 15 days, serial passages were done by passaging 50 ml of the culture supernatant to 96well plates (one for each SARS-CoV-2 strain under study) fully inoculated with 50 ml of Vero E6 cells (2.0 Â 10 4 cells) using the same procedure and incubated in the same conditions. At Day 15, total nucleic acids were extracted from 100 ml of viral suspension of each well in each plate (96 samples of Day 15 for each strain) using the automated platform NUCLISENS easyMAG (Biom erieux). Confirmation of nucleic acid integrity and rough concentration estimative was made before sequencing experiment by RT-qPCR of eight random chosen samples from each plate at Day 15 (CoV-2-D and CoV-2-G) using Novel Coronavirus (2019-nCoV) RT-PCR Detection Kit (Fosun Diagnostics). Samples from inoculation suspension (Day 1) were also analyzed. All samples presented values of 7-10 Ct (cycle threshold). When we infect the cells with 2 Â 10 3 plaqueforming units (PFUs), after 24 h, the number of PFUs is around 2 Â 10 6 . So, assuming no major fluctuations in the viral load of the transferred suspension throughout the 15 passages and assuming a yield of approximately 1000 PFU/cell [14], the estimated number of replication cycles per passage is around 1 (i.e. (2 Â 10 3 ) Â 10 3 ¼ 2 Â 10 6 ).

Simulations of the neutral mutation accumulation
To obtain a non-equilibrium neutral expectation of the site frequency spectrum of mutations, we performed forwardsimulations to model mutation accumulation using the mutation rate inferred from the experiment. We model an organism with a bi-allelic genome of size L ¼ 30 000 ($SARS-CoV-2). An initially isogenic population undergoes 15 cycles of growth, mutation and bottleneck, according to the following life cycle: 1. A clonal population starts with an inoculum size of 2000. 2. Each genome replicates Â times. We assume the burst size Â to be Poisson distributed with mean 1000. 3. For each of the replicating genomes, we introduce a Poisson number of mutations with mean 0.1 (corresponding to a rate of 3.3 Â 10 À6 nt À1 cycle À1 ). We assume mutations to emerge with uniform probability in the parental genome and we allow for back-mutation. 4. After replication and mutation, we sample 1/1000 of the individual genomes. 5. Repeat steps 2-4, 15 times.
We validated the simulation code by confirming expected outcomes: mutations accumulate linearly over time and the posterior estimation of the mutation rate retrieves the original value (bottom distribution in Supplementary Fig. S3b).
After 15 cycles, we collect the artificial genomes from 100 independent simulations, and compute their site frequency spectrum as in the experiment.
In order to test whether cross-well contamination could justify the observed site frequency distribution, we modified the previous algorithm by introducing migration. At each Cycle t, after each bottleneck event, a fraction of viral genomes (m ¼ 0.1) is replaced by migrants sampled from a pool of genomes that have undergone t cycles of growth.
In order to test whether a more heavy-tailed distribution of burst size could explain the observed site frequency distribution, we modified the previous algorithm by assuming a log-normal burst size distribution (x ¼ e z ; z $ Normalðl ffi 6:4; r ¼ 1Þ), such that its mean mean x ð Þ ¼ e lþ s 2 2 ¼ 1000 is equivalent to the previously assumed Poisson distribution. The algorithm was written in R (version 3.6.1) and the results analyzed in RStudio [25].

Mutation accumulation rates in all, synonymous and non-synonymous sites
To quantify the rate at which mutations accumulate during the experiment, we compute M a r ð Þ ¼ P fr PÃL r , where f r is the frequency of all mutations observed in region r, P ¼ 15 is the number of passages and L r is the length of region r. For the genome-wise mutation accumulation L r ¼ 29903, the entire genome of SARS-CoV-2. We also computed the mutation accumulation rates at synonymous and non-synonymous sites. In these cases, the synonymous rate is given by M a r; syn ð Þ ¼ P fr PÃL r Ãp r;syn , where p r;syn is the proportion of mutations in region r, leading to synonymous changes. Equivalently the non-synonymous rate is M a r; n syn ð Þ ¼ P f r PÃLr Ãpr;n syn . In practice, assume the region of interest has sequence r: ATGTTT. For each base, we count the proportion of mutations that would change (or not) the corresponding amino acid. In the example p r; 17 mutations out of the possible 18 are non-synonymous and only one is synonymous (ATGTTc). Therefore, in this example, the total size is L r ¼ 6, p r;n syn ¼ 17=18 and p r;syn ¼ 1=18. Following this method, we calculated the genome-wise and gene-specific mutation accumulation rates in all, synonymous or non-synonymous sites, reported throughout the entire manuscript. The genomic sequences of each region were retrieved from NCBI (entry: NC_045512).

pN/pS calculation and confidence interval
To test for the action of selection, we estimated the ratio of non-synonymous to synonymous polymorphism rates (pN/pS) [26] for each of the two lineages. Within a given region r, we computed pN(r) as the summed frequencies of all the observed non-synonymous mutations over the number of all possible non-synonymous changes in that region: pN r ð Þ ¼ P fr N r;nsyn , where N r;syn ¼ 3L r p r;n syn . Equivalently, we computed the synonymous counterpart: pS r ð Þ ¼ P fr N r;syn . In the previous example, within the region r: ATGTTT, N r;n syn ¼ 17 while N r;syn ¼ 1. Finally, the pN/ pS statistics is the ratio of pN and pS and its expected value is 1 under neutrality. We report the pN/pS together with its bootstrap-based confidence interval . In particular, we report the 95% percentiles of the pN/pS distribution obtained by resampling the observed synonymous and non-synonymous mutations 1000 times. As the pN/pS is a ratio of proportions, to test its deviation from 1 (or equivalently pN 6 ¼ pSÞ, we perform two proportions z-tests and correct for multiple comparisons using the Benjamini-Hochberg method.

Convergence statistics
To test for the significance of the observed convergent evolution, we compared the average pairwise nucleotide distance (p) within and between groups. Considering the SNPs accumulated , where dði; jÞ is between the lines of CoV-2-D (N1) and CoV-2-G (N2). Note that, p between > p within might result from background-dependent evolution but also from migration (cross-contamination between lines of the same background) and/or undetected standing variation.
We tested the significance of the amino acid convergence between the spike protein of both backgrounds via the hypergeometric test, where the set of mutated amino acids of CoV-2-D is the number of draws (n), the convergent ones are the successes (k), the number of amino acids of the spike protein is the total population (N), and the set of mutated amino acids of CoV-2-G is the number of possible successes (K).

Experimental evolution design and ancestor backgrounds
Two SARS-CoV-2 viral strains were isolated from two non-related patients for continuous propagation in Vero cells (see Methodology, Fig. 1a). These were chosen according to their polymorphism at amino acid position 614 of the spike protein: CoV-2-D carries a D and CoV-2-G carries a derived mutation which changes the D into a G. This D614G mutation in the spike protein emerged early in the pandemic, increased the infectivity of the virus and became prevalent worldwide [27]. Here, we want to test for differences in their mutation rates, spectrum and/or in the selective forces as the strains are propagated in cells.
In order to discriminate de novo mutations from standing genetic variation, we identified the mutations (relative to the Wuhan-Hu-1/2019 reference genome sequence, Wu et al., 2020) that were already present at the start of our evolution experiment (see the list and their frequencies in Supplementary Fig. S1).
If the mutation rate is similar to that of the mouse hepatitis virus or that of the SARS-CoV (about 3.5 Â 10 À6 and 2.5 Â 10 À6 nt À1 cycle À1 , respectively) [10,12] hundreds of mutations should accumulate, many of which are expected to be neutral but some could reflect adaptation to the specific conditions used in this experiment.
Mutation accumulation and spectrum after 15 passages of SARS-CoV-2 evolution We considered de novo mutations those that reached a frequency of at least 1%, supported by a minimum of 10 reads and that were not detected in either the ancestor or the original clinical isolate from which the ancestor was derived (full list in Supplementary Table S1). Propagation of the 96 CoV-2-Dderived lines resulted in 1753 de novo mutations, while the 96 lines derived from CoV-2-G resulted in 6181 de novo mutations (n ¼ 94 as in two lines, the sequencing had poor coverage) (Fig. 1b). The much higher number of mutations in the CoV-2-G background, compared to CoV-2-D, is explained by 15 of these lines where many more mutations were observed (Fig. 1b). These lines, hereafter referred to as mutators, are characterized by a larger proportion of SNPs compared to the non-mutator lines where, instead, deletions account for more than 20% of all de novo mutations (Fig. 1c).
The frequency of mutator clones was estimated to be between 1 and 2% after 15 infection cycles, since these were the frequencies measured for the vast majority of mutations observed in the mutator lines. The genetic cause of the mutator phenotype is difficult to determine but it could likely be hidden within the mutations that occurred in the RNA-dependent RNA polymerase (Nsp12) and/or in the error-correcting exonuclease protein (Nsp14) [13]. Indeed, looking at the mutations that are specific to the lineages with mutators, we found eight nonsynonymous mutations in Nsp12 (one leading to a stop at amino acid 670) and nine non-synonymous mutations in Nsp14 (one leading to a stop at amino acid 78) (Supplementary  Tables S2 and S3). Any of these mutations could potentially lead to the observed change in mutation rate, but none of these has been associated with an increased mutational load of the circulating viruses [28].
Next, we obtained the per-base per-passage rate at which mutations accumulated (M a ), from the frequencies of the observed mutations. As a 24-h passage in our experiment corresponds to $1 cell replication cycle (see Methodology), we hereafter report such rate of mutation accumulation per unit of replication cycle (nt À1 cycle À1 ). Interestingly, the non-mutator lines of CoV-2-G show a significantly lower accumulation rate compared to the CoV-2-D lines (P < 10 À5 , non-parametric Wilcoxon test) (Fig. 1d). However, this difference between the two backgrounds is more likely due to differences in selection rather than differences in mutation rates, as we will explain later on.
The SNPs accumulated over 15 passages show that both genomic backgrounds have a strong propensity to accumulate C->T mutations (Fig. 1e), a well-known bias of SARS-CoV-2 [29]. In the mutator lines, the main mutation bias changed from C->T to G->T (Fig. 1e), also observed in SARS-CoV-2 samples collected during the COVID-19 pandemic [30,31].
It is important to notice that, both the accumulation of mutations and the biases we observe in the data (Fig. 1b-e) might have been shaped by selection and deviate from the neutral rate and spectrum of mutations. In fact, on one hand positive selection can increase the frequencies of beneficial mutations and on the other hand purifying selection can purge the deleterious alleles. Therefore, we next looked for evidences of selection in the mutation accumulation data.
Signs of selection: site frequency spectrum and heterogeneity across genes In serial propagation experiments with SARS-CoV-2, it is extremely difficult to transfer a single virus [32]. In our experiment, the effective population size is at least 2000 (see Methodology), and thus unlikely to eliminate the effects of either positive or negative selection [2]. Indeed, several patterns in the data indicate that selection played a significant role in the experimentally evolved SARS-CoV-2 lines. The distribution of allele frequencies in a sample, i.e. the site frequency spectrum, has a well-known theoretical expectation under a simple equilibrium neutral model of molecular evolution (Chap. 5, p. 233 of B. Charlesworth and D. Charlesworth, 2010 [33]). But, this distribution is sensitive to the action of selection and also to complex demographic events, such as population bottlenecks. Given the bottlenecks occurring in our experiments and the slow evolutionary time elapsed during the 15 infection cycles, the neutral theoretical expectation at equilibrium may not apply. To obtain a non-equilibrium expectation of the site frequency spectrum, we performed forward-simulations (see Methodology). We assumed that neutral mutations occur at a rate of 3.3 Â 10 À6 nt À1 cycle À1 , similar to that estimated from the data, and simulated populations evolving under neutrality. The site frequency spectrum of the mutations accumulated in both CoV-2-D or CoV-2-G lines deviates significantly from the neutral expectation predicted by the simulations (Fig. 2). High frequency mutations are not expected under this neutral model (mutations with frequencies above 30% are reported in Supplementary Fig. S2). To test whether possible contamination among wells could explain the observed site frequency spectrum, we performed additional simulations with migration (see Methodology). Even when considering a migration rate of 10%, the neutral site frequency spectrum is still incompatible with the experimental data ( Supplementary Fig.  S3a), and the estimation of mutation rate should not be affected ( Supplementary Fig. S3b). We also tested whether larger variation in viral burst sizes could generate patterns more similar to the observed (see Methodology). The simulations demonstrate that a more heavy-tailed distribution of burst size can lead some mutations to higher frequencies (compared to the previously assumed Poisson distribution, see Supplementary Fig. S3c) and suggest that such variation in the viral burst size might contribute, together with positive selection, to skew the site frequency spectrum and explain the observed long tail (Fig. 2, Supplementary Fig. S3d).
A second evidence of selection comes from the considerable variation in the rate of mutation accumulation observed across the SARS-CoV-2 genome (Fig. 3, Supplementary Fig. S4). When excluding the mutator lines, the S gene, which codes for the spike protein, has the highest rate of mutation accumulation among the different genes (Fig. 3). Remarkably, the spike accumulated 13.5 6 0.4 Â 10 À6 nt À1 /cycle À1 mutations in the CoV-2-G genotype (excluding mutators), and 17.1 6 1.0 Â 10 À6 in the CoV-2-D genotype, about 5-fold the corresponding genomic averages, suggesting the strong action of positive selection. In the mutator lines, the spike gene accumulated $2 times more mutations than in the non-mutators (Supplementary Fig. S4). This observation is in contrast with the 5-fold increase in mutation accumulation across the entire genome (Fig. 1d) and suggests that more complex selective forces might be acting on the mutator phenotype (see the heterogeneity of the mutation rate across the CoV-2-G mutator genome in Supplementary Fig. S4). Overall, the data confirmed that selection has shaped the way mutations accumulated. Therefore, in order to obtain a more accurate quantification of the spontaneous rate of mutation, we performed a more systematic analysis of the sites under selection.

Identifying regions under selection
From the frequencies of all mutations observed in the CoV-2-D and CoV-2-G non-mutator lines, we computed an accumulation rate of 3.7 Â 10 À6 and 2.9 Â 10 À6 nt À1 cycle À1 , respectively (Fig. 1d). Given that during our experiment, selection affected the allele frequencies (Figs 2 and 3), such rates may deviate from the spontaneous mutation rates of the virus. In order to attempt to estimate the spontaneous mutation rate, we first focused on synonymous mutations, which, if neutral and not linked with sites under selection, should accumulate at the rate at which they occur [34]. Focusing on the synonymous changes, we estimated a basic mutation rate of 3.8 Â 10 À6 nt À1 cycle À1 for the CoV-2-D background and 1.2 Â 10 À6 nt À1 cycle À1 for the CoV-2-G ( Supplementary Fig. S5a). However, the rate of nonsynonymous mutation in CoV-2-D is lower than the synonymous one (Supplementary Fig. S5a and b), suggesting the action of purifying selection on non-synonymous sites or positive selection on the synonymous sites, leading to their increase in frequency [35,36]. To distinguish between the two cases, we compared the accumulation rate of synonymous mutations in the entire genome (M Syn a ), with the accumulation rate of synonymous mutations excluding one gene at a time (M Syn;Dg a ), similar to the Cook's D approach [37]. This approach revealed that a remarkable accumulation of synonymous mutations in the Nsp6 gene led to the overestimation of the mutation rate in the CoV-2-D background (Supplementary Fig. S6). In contrast, for the CoV-2-G background, this approach indicates that the estimation of M Syn a ¼1.2 Â 10 À6 nt À1 cycle À1 is homogeneous across the genome and can provide a first estimation of its mutation rate (Supplementary Fig. S6b).
To test the robustness of the former estimations of mutation rate, we followed a second approach: identify the regions under selection in either the CoV-2-D or CoV-2-G lines and exclude them from the estimation of the spontaneous mutation rate. First, we compared the relative accumulation of nonsynonymous and synonymous mutations, via the pN/pS statistics (equivalent of dN/dS for polymorphic samples, see Methodology). In the CoV-2-D background, the pN/pS of the S and the Nsp6 genes significantly differ from 1 (Fig. 4a, P-value ¼ 0.03 and 2 Â 10 À13 , respectively, after two proportions ztest and Benjamini-Hochberg correction). The spike accumulated more non-synonymous mutations consistent with the action of positive selection (pN/pS ¼ 4.44, 95% confidence interval: [2.47-9.09]), while the Nsp6 accumulated more synonymous mutations, consistent with our previous findings (pN/ pS ¼ 0.02, 95% confidence interval: [0.01-0.04], Supplementary  Fig. S6a). In particular, the synonymous change A11041G was found in 88 evolved populations (out of 96), but also at frequency below our 1% threshold in the ancestral population, suggesting that such mutation was incorrectly considered as de novo and that the estimated mutation accumulation in Nsp6 was the resulting artifact.
Due to the limited number of mutations within each gene and the fact that we are comparing evolving populations (rather than divergent species), the pN/pS may lack the power to identify additional regions under selection [38]. To overcome this issue and to identify additional genes affecting the estimation of the mutation rate, we computed the rate of mutation accumulation excluding one gene at a time and compared this with the entire genome (similar to what was done before for synonymous mutations only; Supplementary Fig. S6). With this outlier-detecting method, we confirmed that the S gene biased the estimation of mutation accumulation and was likely under positive selection in both CoV-2-D and CoV-2-G backgrounds (Fig. 4b). We also spotted the Nsp3 gene, a genomic region that accumulated significantly fewer mutations than the genomic average, suggesting the action of purifying selection (Fig. 4b).
Overall, we conclude that during our experiment, the spike protein was under strong selection in both backgrounds and additional mutations in different genomic regions have also increased in frequency, possibly due to selection but also through hitchhiking with beneficial mutations.
Estimation of mutation rates and bias excluding genes with signs of selection Non-neutral processes have shaped the allele dynamics in our experiment. We identified the Nsp3, Nsp6 and S genes, which have shown signs of selection and/or biased the estimation of mutation accumulation in at least one of the two backgrounds (Fig. 4). Excluding these genes from the analysis allows us to remove, at least in part, high frequency mutations (see Supplementary Fig. S7) and to obtain a more realistic estimate of the SARS-CoV-2 mutation rate prior to selection. By doing so, we estimate a spontaneous mutation rate of 1.3 6 0.2 Â 10 À6 nt À1 cycle À1 for the CoV-2-D background, and 1.2 6 0.2 Â 10 À6 nt À1 cycle À1 for the CoV-2-G (excluding mutators) (Fig. 5a). The mean estimated mutation rate does not differ between backgrounds (P ¼ 0.6, non-parametric Wilcoxon test), suggesting that the previously observed differences were due to selection ( Fig. 1d and Supplementary Fig. S5). Importantly, the estimated mutation rate of CoV-2-D and CoV-2-G are also in agreement with those obtained from the synonymous mutations (see Supplementary Fig. S8). Assuming one infection cycle per passage (15 cycles in total), the average mutation rate across backgrounds is 1.3 Â 10 À6 6 0.2 Â 10 À6 nt À1 cycle À1 (mean 6 2SEM). Considering variation in the number of cycles occurred in our experiment (i.e. 15 6 5) broadens the interval of confidence to 1.0 Â 10 À6 to 1.9 Â 10 À6 .
Excluding the Nsp3, Nsp6 and S genes was our best attempt at quantifying the spontaneous mutation rate of SARS-CoV-2. It is important to note that, even after removing the regions under selection, neutral mutations might still be found at high frequencies, possibly due to variation in the viral burst size (as discussed earlier, see Supplementary Fig. S3c and d) and/or due to hitchhiking with beneficial mutations. However, in our analysis, after removing the S, Nsp3 and Nsp6 genes, the fraction of mutations with high frequency is largely reduced (see Supplementary Fig. S7) and the few left should not impact the overall estimation of mutation rate (see a simulated example of gene-specific selection and hitchhiking in Supplementary Fig.  S9). In contrast, our estimation should still underestimate the real mutation rate due to the fact that we ignored mutations with a frequency below the 1% threshold. Our simulations suggest that the ignored mutations should increase the estimation by < 3 times ( Supplementary Fig. S10), finally placing the estimated mutation rate of SARS-CoV-2 between 1 and 5 Â 10 À6 nt À1 cycle À1 .
Finally, we quantified again the relative proportion of single nucleotide changes and confirmed that both backgrounds have a spontaneous bias toward C>T mutations and the mutator changes this bias toward G>T mutations (Fig. 5b), suggesting that such biases are not a consequence of selection.

Convergent targets of selection on spike
The spike protein showed clear signs of adaptation during our evolution experiment, so we next focused on the specific sites under selection and compared them with the new spike variants that spread in the human population. We first quantified the level of convergence at the nucleotide level between CoV-2-D and CoV-2-G spike proteins. We note that convergence between the two backgrounds reflects true independent origin of the mutations, as they were propagated and processed for sequencing independently. In contrast, convergence within replicates of the same background could also result from some possible cross-contamination or from undetected standing variation. Considering the SNPs accumulated in each line, the average nucleotide diversity between backgrounds (6.7 6 0.05) is greater than that within backgrounds (4.5 6 0.06 and 5.6 6 0.08 for CoV-2-D and CoV-2-G, respectively), suggesting some degree of genotype-dependent evolution (see Methodology). Nonetheless, significant convergence was observed at both nucleotide and amino acid levels. In the spike protein, 21 specific sites and 3 larger regions were hit independently in both backgrounds (non-synonymous changes shown in Fig. 6a, Supplementary Table S4), suggesting convergent adaptation since this would be extremely unlikely by chance (p x ! 21 ð Þ < 10 À5 , hypergeometric test). We find high evolutionary convergence at the S1/S2 cleavage site: three distinct deletions (675-QTQTN-679 del; 679-NSPRRAR-685 del and 679-NSPRRARSVA-688) emerged multiple times in both backgrounds. Such changes have been previously shown to emerge rapidly in Vero cells and to be important for the virus cell tropism [39,40]. On the other hand, the 678-TNSPRRARS-686 deletion, which should also trigger a similar functional effect, i.e. knock out of the furin cleavage site [41,42], emerged multiple times, but was exclusive of CoV-2-D lines (n ¼ 58). Apart from these deletions, other mutations in the S1/S2 region (e.g. Arginine 682, see Fig. 6) were also highly convergent, highlighting the adaptive potential of the cleavage site when evolving on Vero cells.
Some level of evolutionary convergence could also be found for the structural genes N, E and M suggesting that adaptation could also have occurred in these genes (Supplementary Table S5).
The inferred mutators in the CoV-2-G background also carry many mutations in the spike protein including in the receptorbinding domain (RBD; amino acid changes at positions 328, 339, 364, 416, 454, 465, 474, 479, 482, 522 and 524) and multi cleavage site regions (positions 798 and 799) (Fig. 6b). Several amino acid changes that occurred in our experiment were also observed in the natural population of SARS-CoV-2, which is not unexpected considering the massive volume of sequence data generated so far. Nonetheless, among these, we report the mutations H655Y (present in the variant of concern Gamma, lineage P.1, originated in Brazil, and Omicron, lineage BA.1), D215G (present in the variant of concern Beta, lineage B.1.351, first identified in South Africa) and D253G (found in lineage B.1.426, mostly detected in the USA) (Fig. 6) [43].

CONCLUSIONS AND IMPLICATIONS
The SARS-CoV-2 beta-coronavirus, first observed in the Wuhan province of China [11], has infected millions of humans causing more than a 5 million toll of deaths (as of 2 November 2021; https://covid19.who.int/). Since it was first sequenced [44], the virus has been accumulating 0.44 substitutions per week at close to linear rate. Here, we followed the evolution in cells of two strains of SARS-CoV-2: one with the original spike protein (CoV-2-D) and one carrying the D614G mutation (CoV-2-G), a variant that emerged in the early phase of the pandemic and soon became prevalent [27]. We did not observe differences in their spontaneous mutation rates. Our estimate, for both backgrounds, is of the order of 0.1 per genome per infection cycle (between 1 and 5 Â 10 À6 nt À1 cycle À1 ), consistent with previous estimations in other coronaviruses [14]. New beneficial mutations did spread to high frequencies and considerable convergent evolution was detected between the CoV-2-D and CoV-2-D backgrounds. The two genomic backgrounds shared several de novo mutations, suggesting that the D614G mutation on the spike protein did not substantially alter the mutational path of the viral populations (Fig. 6).
Despite the similarities, we did observe a reduced accumulation of beneficial mutations in the CoV-2-G (particularly in its spike protein, see Fig. 3). Assuming that the D614G genotype is better adapted than its ancestor in Vero cells, this observation is consistent with the hypothesis that fitter genotypes adapt at a slower pace [45] (Fig. 1d).
After 15 days of propagation in Vero cells, we could also observe mutators emerging with a distinct mutation bias (G>T). This occurred only in the lines carrying the D614G mutation (CoV-2-G strain), but we cannot discard the null hypothesis that this happened in such genomic background simply by chance. The emergence of mutator phenotypes has not been documented in the human population, possibly because it is difficult to identify at low frequencies or simply because it is linked to our specific experimental conditions. Our data demonstrate that it is possible for SARS-CoV-2 to increase its mutation rate and survive, over the course of 15 propagation cycles in Vero cells. Given the caveats, this observation might be relevant for the development of strategies involving mutagen drugs [46,47], especially if the drugs are being tested against SARS-CoV-2 grown in Vero cells.
Our study presents a number of limitations. Importantly, the experiments were conducted with Vero E6 monkey cells and need to be interpreted accordingly. Several studies (including ours) have shown that, propagating the SARS-CoV-2 virus in Vero cells can lead to rapid increase in frequencies of genetic variants with mutated furin cleavage site [39][40][41][42], demonstrating that the selective pressures in such environment do not reflect what happens during human infections [48]. Regarding the emergence of mutations, we did our best at excluding the effect of selection, but future studies should investigate how the spontaneous mutation rate changes when propagating SARS-CoV-2 on different host cells.

ETHICAL STATEMENT
The Portuguese NIH is authorized by the Portuguese Authorities' (General-Directorate of Health and the Authority for Working Conditions) to handle and propagate Risk Groups 2 and 3 microorganisms. All culture procedures were performed inside a Class III microbiological safety cabinet in a containment level 3 facility. This study is covered by the ethical approval issued by the Ethical Committee ('Comissão de Ética para a Saúde') of the Portuguese National Institute of Health.

SUPPLEMENTARY DATA
Supplementary data are available at EMPH online.