Evolution of Mutation Rate in Astronomically Large Phytoplankton Populations

Abstract Genetic diversity is expected to be proportional to population size, yet, there is a well-known, but unexplained lack of genetic diversity in large populations—the “Lewontin’s paradox.” Larger populations are expected to evolve lower mutation rates, which may help to explain this paradox. Here, we test this conjecture by measuring the spontaneous mutation rate in a ubiquitous unicellular marine phytoplankton species Emiliania huxleyi (Haptophyta) that has modest genetic diversity despite an astronomically large population size. Genome sequencing of E. huxleyi mutation accumulation lines revealed 455 mutations, with an unusual GC-biased mutation spectrum. This yielded an estimate of the per site mutation rate µ = 5.55×10−10 (CI 95%: 5.05×10−10 – 6.09×10−10), which corresponds to an effective population size Ne ∼ 2.7×106. Such a modest Ne is surprising for a ubiquitous and abundant species that accounts for up to 10% of global primary productivity in the oceans. Our results indicate that even exceptionally large populations do not evolve mutation rates lower than ∼10−10 per nucleotide per cell division. Consequently, the extreme disparity between modest genetic diversity and astronomically large population size in the plankton species cannot be explained by an unusually low mutation rate.


Introduction
The level of genetic diversity in a population is determined by the balance between the new mutations occurring in the population and the loss of polymorphisms by stochastic processes (drift) and selection (Leffler et al. 2012;Ellegren and Galtier 2016). More mutations are expected to occur in a larger population because there are more individuals to mutate. In addition, drift is weaker in larger populations (Crow and Kimura 1970), thus larger populations are expected to contain more genetic diversity. On the other hand, selection is expected to be more powerful in populations of larger size, potentially allowing selection to reduce mutation rate to lower values in larger populations (Lynch 2010;Sung, Ackerman, et al. 2012;Lynch et al. 2016), which may explain the wellknown phenomenon of relatively low genetic diversity in large populations (Lewontin 1974;Leffler et al. 2012;Corbett-Detig et al. 2015;Ellegren and Galtier 2016;Filatov 2019;Xu et al. 2019). Here, we test this idea by measuring the spontaneous mutation rate in a unicellular eukaryotic marine coccolithophore Emiliania huxleyi (Haptophyta) that has an astronomically large population size and thus would be expected to evolve to a very low mutation rate if this rate is determined by efficacy of selection.
Emiliania huxleyi is ubiquitous and so abundant in modern oceans that its seasonal blooms are visible from space. Like most other coccolithophores, it produces coccoliths (calcite shields with species-specific shapes) at the cell surface. Due to its abundance, E. huxleyi is thought to be the main calcite producer on Earth (Paasche 2001;Daniels et al. 2014Daniels et al. , 2016Rembauville et al. 2016) affecting the global CO 2 budget and carbon cycle. Due to its ecological importance and ease of culturing, E. huxleyi became a model phytoplankton species with a significant body of on-going work devoted to the interplay between coccolithophore abundance, climate change and the global carbon cycle (Rickaby et al. 2007;Iglesias-Rodriguez et al. 2008). However, surprisingly little is known about evolutionary genetic processes in populations of marine phytoplankton generally (Rengefors et al. 2017) and E. huxleyi populations in particular (Bendif et al. 2019;Filatov 2019).
Based on the large population sizes of marine plankton species, their genetic diversity is often assumed to be very high (Read et al. 2013). Yet, recent studies of genetic diversity in a number of marine phytoplankton species revealed a surprisingly low level of single nucleotide polymorphism (Blanc-Mathieu et al. 2017;Filatov 2019;Rastogi et al. 2020). In particular, genetic diversity in a world-wide sample of E. huxleyi is only p$0.006 per silent site across the genome (Filatov 2019)-similar to the level of polymorphism in Arabidopsis thaliana (1001Genomes Consortium 2016, twice lower than in marine unicellular green algae Ostreococcus tauri (Blanc-Mathieu et al. 2017) and at least three times lower than in Drosophila melanogaster (Langley et al. 2012). Patterns of genetic diversity (e.g., very low linkage disequilibrium) in the E. huxleyi genome rule out any trivial explanations for this low diversity, such as high clonality or recent expansion from a very small population (Filatov 2019). However, an unusually low mutation rate may account, at least partly, for this low genetic diversity (Xu et al. 2019). For example, the mutation rate in another group, ciliates, was reported to be two to three orders of magnitude lower than in other studied eukaryotes (Sung, Tucker, et al. 2012;Long et al. 2018). No estimates of mutation rates are available for any Haptophyte species, making it difficult to assess E. huxleyi mutation rate even to an order of magnitude. To address this, we measured the spontaneous mutation rate in the diploid E. huxleyi strain RCC1242 (¼CCMP1516) for which a $167-Mb long-genome sequence was published previously (Read et al. 2013).

Mutation Accumulation Experiment
We performed a mutation accumulation (MA) experiment with the diploid E. huxleyi strain RCC1242 (¼CCMP1516). We followed the protocol previously developed by Krasovec (Krasovec et al. 2016) for MA experiments in a liquid medium. The MA experiment included 15 MA lines and lasted 8 months. The size of the MA experiment was planned, assuming the mutation rate is of the order of 10 À10 per nucleotide per cell division, as found in a few other phytoplankton species (Ness et al. 2012;Krasovec et al. 2017, but with the possibility to expand the size of the experiment should the E. huxleyi mutation rate prove to be much lower. The initial line was obtained from a single cell by dilution and used to inoculate 15 MA lines kept in 24-well plates at 20 C in F2 medium. Serial bottlenecks every 14 days were used to reduce the efficiency of selection by decreasing the population size of the MA lines. At each bottleneck, the cell culture was counted with a Beckman Multisizer Coulter Counter to calculate the number of cell divisions in the time interval and inoculate the MA lines in fresh media with one cell by dilution following a previously developed protocol (Krasovec et al. 2016(Krasovec et al. , 2017Krasovec, Sanchez-Brosseau, et al. 2018). The average cell division per day of the MA lines between each bottleneck could be used as a proxy of the fitness throughout the experiment. We used a linear correlation to test a change in fitness over the time of the experiment with R v3.5.1.

Genome Resequencing and Identification of De Novo Mutations
DNA of the 15 MA lines and the initial culture were extracted with the DNeasy Plant Mini Kit of QIAGEN following the standard instructions. Genomic libraries were prepared and sequenced at the Wellcome Trust Centre for Human Genetics (WTCHG) at the University of Oxford, UK. Genomic DNA was quantified using Qubit (Invitrogen) and the size profile analyzed using eGel (Thermo Fisher, 1% EX Agoarose). Input material was normalized to 300 ng prior to fragmentation and library preparation. Fragmentation was performed by mechanical shearing to an average size of 300 bp (Covaris S2 series; duty Cycle-10%, intensity-5.00, cycles/Bursts-200, time-60 s). Automated library preparation was performed using the Apollo 324 prep system (Wafergen, PrepX ILMN 32i, 96 sample kit) and standard Illumina multiplexing adapters following manufacturer's protocol up to pre-PCR amplification. Libraries were PCR amplified (10 cycles) on a Tetrad (Bio-Rad) using the NEBNext High-Fidelity 2Â PCR Master Mix (NEB) and in-house unique dual indexing primers (based on Lamble et al. [2013]). Post-PCR purification performed using Agencourt Ampure XP (Beckman Coulter; ratio 1:1) before combining. Individual libraries were normalized using Qubit, and the size profile was analyzed on the 2200 or 4200 TapeStation (Agilent). Individual libraries were normalized and pooled together accordingly. The pooled library was diluted to $10 nM for storage. The 10-nM library was denatured and further diluted prior to loading on the sequencer. Paired-end sequencing was performed using a HiSeq4000 150-bp platform (Illumina, HiSeq 3000/4000 PE Cluster Kit, and 300 cycle SBS Kit), generating a raw read count of >34 million reads per sample (table 1).
PCR duplicates were removed with GATK v3. 4-46 (McKenna et al. 2010) and reads were mapped against the reference genome of the RCC1242 strain (NCBI accession: GCA_000372725.1) (Read et al. 2013) with BWA mem v.0.7.12 (Li and Durbin 2009). For organelles, we used the reference mitochondrion NC_005332.1 (S anchez Puerta et al. 2004) and the reference chloroplast NC_007288.1 (S anchez Puerta et al. 2005) genomes. To calculate the mutation rate per haploid organelle genome, the average ploidy levels of the organelle genomes per cell (estimated from read coverage for organellar relative to nuclear genomes) were taken into account.
The BAM files were sorted with Samtools v.1.2 ) and variants called with HaplotypeCaller from GATK v3. 4-46 (McKenna et al. 2010) following the best practice recommendations (RealignerTargetCreator and IndelRealigner). To avoid false positives during de novo mutation identification, we apply several criteria used in previous studies with PCR confirmation of mutation candidates Krasovec, Chester, et al. 2018). Criteria were: 1) callable sites were defined with a threshold of 20 mapping quality and 2) a minimal coverage of 20 in the MA line and ancestral genomes; 3) sites covered by >150Â were removed to exclude repetitive regions; 4) the alternative allele was supported with a minimal coverage of 1/3rd of the total coverage; 5) candidates within an insertion-deletion were removed with Bcftools v1.2 (options SnpGap ¼ 5); and 6) all de novo mutations were manually checked in the mpileup files generated by Samtools v.1.2 for all MA lines and the ancestral genome. To test the reliability of in silico identification of de novo mutations, we randomly selected de novo mutation candidates for manual verification based on PCR and Sanger sequencing. All manually checked mutations were confirmed to be true positives (supplementary table S3, Supplementary Material online). The effect of de novo mutations was determined with snpEff v4.3 (Cingolani et al. 2012). To detect any bias in the mutation distribution, we compared the mutation distribution using v 2 and binomial tests against the null hypothesis assuming that mutations appear independently and randomly in the genome.

Mutation Bias and GC Content Evolution
To detect a potential mutation bias in the mutation spectrum, we calculated R1 and R2 mutation rates for GC to AT and AT to GC nucleotide mutations, respectively. With the numbers of mutations from GC to AT and AT to GC and the number of sites in the genome GCn and ATn, R1 ¼ (GC to AT) / GCn and R2 ¼ (AT to GC) / ATn. Equilibrium GC content (GC eq ), was calculated as GC eq ¼R2 / (R1þR2). The equilibrium GC content is the GC content reached by the mutation process alone, that is, the GC content, where the numbers of mutations from GC to AT and AT to GC are equal.

Codon Bias
Codon bias in E. huxleyi was measured with the effective number of codons (ENCs; Wright 1990) as implemented in software CodonW (http://codonw.sourceforge.net). First, we ran a correspondence analysis to generate the hilo.coa files containing the preferred and unpreferred codons (based on a two-way v 2 contingency test) and the fop.coa files. Then, we calculated the frequency of optimal codons and the ENCs gene by gene using the files generated by the previous correspondence analysis. Last, we calculated the strength of selected codon usage bias (S) from the codon bias in the highly expressed E. huxleyi genes using the method of Sharp et al. (2005), but using the mutation spectrum observed in our MA experiment, as we did previously in the analysis of codon bias in Phaeodactylum tricornutum (Krasovec and Filatov 2019). S was calculated for the entire genome, as well as for 500 most actively expressed genes. Expression data for this analysis were obtained from a previous study (Huff and Zilberman 2014) (raw data SRR847300 from the bioproject PRJNA201680). Raw reads were aligned against the transcriptome with RSEM v.1.2.31 (Li and Dewey 2011) to obtain the fpkm values for each gene. S was calculated only for amino acids encoded by two codons where one is the preferred one (Phe, Tyr, His, Gln, Asn, Lys, Asp, and Glu). Furthermore, the fpkm values were used to test the effect of expression level on the mutation rate by comparing expression at the genes with and without a mutated site.

Mutation Accumulation Experiment and E. huxleyi Mutation Rate
To measure the spontaneous mutation rate in E. huxleyi, we conducted a MA experiment (Halligan and Keightley 2009) that included 15 MA lines grown under standard lab conditions for 232 generations on an average, totaling 3,480 generations across all MA lines. To exclude selection and allow all mutations, including deleterious ones, to be fixed, the effective population size of MA lines was reduced by serial bottlenecking-reduction of the population to one cell every 2 weeks. From the cell counts at each bottleneck time, we estimated 1.17 generations per day on an average (supplementary table S1, Supplementary Material online) and an average effective population size of N e ¼ 7.6 (estimated from the harmonic mean of cell number) throughout the experiment. The generation time average did not change over the time of the experiment (Spearman correlation test, P value ¼ 0.8355).
In order to identify the mutations accumulated during the MA experiment, we used Illumina high-throughput sequencing to sequence the genomes of the MA lines at the beginning and the end of MA experiment (  (table 1), the spontaneous mutation rate in the nuclear genome is m ¼ 5.55Â10 À10 (Poisson CI 95%: 5.05Â10 À10 -6.09Â10 À10 ) per nucleotide per cell division. The mutation rate variation between the lines was significant ( fig. 1, Pearson's v 2 test, P value ¼ 0.0006), such as observed, for example, in Chlamydomonas reinhardtii and Caenorhabditis elegans (Ness et al. 2015;Konrad et al. 2018). The per haploid genome per cell division mutation rate (U ¼ genome size in bpÂm) in E. huxleyi is U ¼ 167Â10 6 Â5.55Â10 À10 ¼ 0.092, whereas the mutation rate per coding sequence (39,635,709 nt of annotated CDS) (Read et al. 2013) is U cds ¼ 0.022. Based on m and synonymous intraspecific polymorphism from 17 E. huxleyi strains (Filatov 2019) (p s $0.006), the estimate of effective population size in E. huxleyi is N e ¼ p s / (4Âm) $2.7 million. In addition, we identified seven and two de novo mutations in the mitochondrial and the chloroplast genomes, respectively (supplementary  table  S2, Supplementary Material online). The resulting estimates of per site per cell division mutation rates are m mt ¼ 1.45Â10 À9 (Poisson CI 95%: 5.82Â10 À10 -2.98Â 10 À9 ) and m cl ¼ 1.76Â10 À10 (Poisson CI 95%: 2.13Â10 À11 -6.43Â10 À10 ) for mitochondria and chloroplasts, respectively.

Distribution of De Novo Mutations in the Genome
We detected de novo mutations in all three genomic compartments-nuclear, mitochondrial, and chloroplast DNA, with a significantly higher mutation rate in mitochondria (m mt ¼ 1.45Â10 À9 ) compared with the nuclear genome (2Â2 contingency v 2 ¼ 5.44, P value ¼ 0.0196). Although the chloroplast mutation rate (m cl ¼ 1.76Â10 À10 ) appears to be lower than the nuclear rate (m ¼ 5.55Â10 À10 ), this difference is not significant (2Â2 contingency v 2 ¼ 2.28, P value ¼ 0.1309). Too few chloroplast and mitochondrial mutations were detected to analyze the distributions within these genomes.
Most of the nuclear de novo mutations occurred in noncoding regions (354), whereas 32 and 69 mutations occurred at synonymous and nonsynonymous positions, respectively, in annotated coding regions. These numbers do not differ significantly from the expectations based on the proportions of callable noncoding, synonymous, and nonsynonymous positions in the E. huxleyi genome (Binomial test, ns). Furthermore, we did not detect any significant correlations of local mutation rate with genomic features. Finally, gene expression was not significantly different between genes with and without a mutated site (fpkm average of genes with a mutated site ¼ 34.69, fpkm average of genes without a mutated site ¼ 21.62, Student's test, P value ¼ 0.6628).
DNA methylation (and associated deamination of methylcytosines) is a major contributor of mutations in the eukaryotic genomes (Holliday and Grigg 1993). If methylation contributes to the rate and pattern of mutations in E. huxleyi, we would expect to detect a lack of CpG dinucleotides in the genome (e.g., as is the case in mammals) and an excess of de novo C to T transitions at such dinucleotides. CpG represents 10.8% of all dinucleotides in the genome of E. huxleyi, which does not deviate from what is expected given the genomic GC-content (v 2 test, ns). Furthermore, the total number of de novo mutations from CpG to CpH is 10.3%, that is, there is no excess of mutations at CpG sites (v 2 test, ns). These results indicate that the effect of CpG methylation on mutation rate is low or absent in E. huxleyi.
The analysis of mutational patterns ( fig. 2) revealed significantly more AT to GC mutations compared with GC to AT mutations (207 vs. 151, respectively, binomial test, P value ¼ 0.0036 with probability 50:50). This indicates that the E. huxleyi mutation spectrum has a significant GC-bias, which makes E. huxleyi the first eukaryotic species with a GC-biased mutation spectrum detected in a direct MA experiment, though, indirect inference of the mutation spectrum from sequence polymorphism and divergence indicated that the genome of plant Coffea canephora may also have a slightly GC-biased mutation spectrum (Clement et al. 2017). With an average of m GC->AT ¼0.381Âm AT->GC , the equilibrium GCcontent of the E. huxleyi genome is GC eq ¼ 72.42%. The actual GC-content is slightly lower than the GC eq for the total genome (65.7%) and the noncoding regions (63.0%). In the coding sequence, the actual GC is 69.1%, close to GC eq due to the very high GC-content at third-codon positions (GC3s ¼ 84.2%).

Codon Bias and Long-Term Effective Population Size
High GC-content at third-codon positions is due to strong codon usage bias in E. huxleyi. The preferred codons (listed  in supplementary table S4, Supplementary Material online) almost always end with G or C and a widely used measure of codon bias -ENC (Wright 1990), averaged across all genes was ENC ¼ 38.17, whereas for 500 strongest and weakest expressed genes ENC is 35.52 and 39.03, respectively. Using the method of Sharp et al. (2005) we estimated the strength of selected codon usage bias (S) for amino acids encoded by one preferred and one unpreferred codons (Phe, Tyr, His, Gln, Asn, Lys, Asp, and Glu; see supplementary table S4, Supplementary Material online). For 500 most actively expressed genes, where selection for codon usage is expected to be the strongest, the frequency of optimal codons was 0.8859, corresponding to S ¼ 1.084, which is similar to the strength of selected codon bias in other organisms with large populations, such as Drosophila, nematodes, and bacteria (Sharp et al. 2010). This allows us to estimate long-term effective population size, given that evolution of codon bias is a slow process (Lawrence and Ochman 1997) that "averages" over short-term changes in population size and efficacy of selection (Sharp et al. 2010). Assuming selective advantage (s) of the preferred compared with unpreferred codons to be of the order of 10 À6 -10 À7 , as suggested by analyses of codon bias in Drosophila (Akashi 1997;Powell and Moriyama 1997) and bacteria (Sharp et al. 2010), the long-term effective population size in E. huxleyi and its ancestral species is of the order N e ¼ S / 4s $ 1.084 / 4Â10 À6 $271,000 to N e $ 1.084 / 4Â10 À7 $2,710,000. The latter estimate is almost identical to our genetic diversity-based estimate N e ¼ p s / (4Âm) $2.7 million. Given that this estimate is independent of the mutation rate and genetic diversity values, the correspondence between the genetic diversity-based and codon bias-based estimates of N e is reassuring.

Emiliania huxleyi Mutation Rate
Here, we reported the first estimate of spontaneous mutation rate for a Haptophyte species. Emiliania huxleyi per-nucleotide per cell division mutation rate (m ¼ 5.55Â10 À10 ) is close to estimates for other eukaryotic plankton, such as the diatom P. tricornutum ) (m ¼ 4.77Â10 À10 ) or unicellular green algal species O. tauri (Krasovec et al. 2017) (m ¼ 4.79Â10 À10 ) and Chlamydomonas reinhardtii (Ness et al. 2012) (m ¼ 3.23Â10 À10 ). This similarity of mutation rates across Haptophytes, Stramenopiles, and Chlorophyta suggests that the mutation rate of the order m $ 5Â10 À10 is typical for unicellular eukaryotes regardless of their phylogenetic affinities. Ciliates represent a notable exception to this, with Paramecium tetraurelia having an order of magnitude lower mutation rate (Sung, Tucker, et al. 2012), possibly due to their peculiar life cycle and the presence of two genomes in macro-and micronuclei (Long et al. 2018).
Our estimates of mutation rates in the nuclear, mitochondrial, and chloroplast genomes of E. huxleyi reveal that, similar to animals (Denver et al. 2000;Xu et al. 2012;Konrad et al. 2017) and diatoms , but contrary to plants (Drouin et al. 2008;Ossowski et al. 2010), haptophytes have a higher mitochondrial than nuclear mutation rate. The chloroplast mutation rate in the diatoms ) and green plants (Smith 2015;Smith and Keeling 2015) is lower than that of the nuclear genome. The estimates of nuclear (m ¼ 5.55Â10 À10 ) and chloroplast (m cl ¼ 1.76Â10 À10 ) mutation rates in E. huxleyi show a difference in the same direction, although the difference between the two rates is not significant.

Is the Lab-Based Mutation Rate Estimate Representative of That in the Open Ocean?
The spontaneous mutation rate may be affected by environmental conditions (Jiang et al. 2014;Liu and Zhang 2019). It is not possible to accurately reconstruct all the diversity of environmental conditions for a species that is ubiquitous in the world oceans and inhabits environments ranging from the tropics to the Arctic. However, it is possible to use the rich fossil record for this species (Raffi et al. 2006) to calibrate the rate of the molecular clock and compare it with the mutation rate found in the lab. Based on the fossil record, E. huxleyi evolved from the genus Gephyrocapsa $290 ka (Raffi et al. 2006). This is consistent with the conclusions of an integrated analysis of fossil and genome sequence data from E. huxleyi and four Gephyrocapsa species (Bendif et al. 2019), which demonstrated that 290 kyr corresponds to the divergence between E. huxleyi and Gephyrocapsa muellerae. Given sequence divergence d s $ 3% (Bendif et al. 2019) and the time of divergence (T$290 kyr) between these species, the mutation rate in the open ocean can be estimated as m year ¼ d s / 2T % 5.18Â10 À8 per year. Given the maximal rate of cell division achieved under optimal lab conditions in our experiment ($1.17 per day), E. huxleyi in the wild should have <300 generations per year, providing the minimal estimate m gen ¼ 5.18Â10 À8 / 300 ¼ 1.73Â10 À10 for the per generation mutation rate in the wild. Using our MA-based estimate of m gen , we can infer the number of generations per year in the wild to be $93 (¼ 3.45Â10 À8 / 5.55Â10 À10 ) or one generation in $4 days. This estimate appears realistic because environmental conditions in nature are not always optimal for growth and E. huxleyi blooms require specific conditions (Tyrrell and Merico 2004). This suggests that the lab-based estimate of mutation rate (m gen ¼ 5.55Â10 À10 ) accurately reflects E. huxleyi mutation rate in the wild.

Effective Population Size and Evolution of Mutation Rates
Mutation rates vary widely among organisms and their evolution is thought to be driven by natural selection (Lynch 2010;Sung, Ackerman, et al. 2012), which is reflected by the negative correlation between per-nucleotide mutation rates and effective population size ( fig. 3). As most mutations are deleterious (Muller 1950), in most circumstances selection is expected to favor a reduction in the overall mutation rate (Lynch 2010). However, the strength of such selection is thought to be relatively weak (Drake et al. 1998) and unable to overcome genetic drift in small populations. Such weak selection to reduce mutation rate can be effective only in populations of large size, where drift is weak (Sung, Ackerman, et al. 2012). Here, we used a unicellular eukaryotic phytoplankton species with astronomically large population size to test whether selection in such a large population is able to push its mutation rate below the usual 10 À9 -10 À10 range typical for unicellular eukaryotic species.
Our analysis revealed that despite the astronomically large E. huxleyi populations, selection is unable to reduce the mutation rate <10 À10 per nucleotide per cell division. One possibility to explain this apparent minimum to the mutation rate is that $10 À10 represents the limit to how low the mutation rate can be reduced due to intrinsic biochemical constraints of replication and error correction cellular machinery (Kunkel and Erie 2015). Very few known species, regardless of their biology and ecology, can reach a per site mutation rate <10 À10 . The only known organisms with <10 À10 mutation rate are ciliates (Sung, Tucker, et al. 2012;Long et al. 2018). Another possibility is that despite the astronomical census population size of E. huxleyi, its effective population size (N e ) is relatively modest, that is, drift is relatively high due to demography or other reasons, as reflected by the modest genetic diversity of this species (Filatov 2019). Indeed, the estimates of E. huxleyi N e from its genetic diversity (N e ¼ p s / (4Âm) $2.7 million) and codon bias N e $ S/4s ¼ 1.084 / 4Â10 À7 $2.7 million are both smaller than the estimates of N e in other marine phytoplankton species, including the green algae O. tauri (N e $ 12 million; Blanc-Mathieu et al. 2017) and the diatom P. tricornutum (N e $ 8.7 million; . The reasons for extreme disparity between the astronomically large census population size and modest effective population size in E. huxleyi remain uncertain, though, it is clear that given very low linkage disequilibrium, this disparity is unlikely to be caused by periodic asexual reproduction of this species (Filatov 2019). Neither population size changes (Filatov 2019) nor speciation bottlenecks can explain the limited genetic diversity of E. huxleyi because the population size of this abundant species remained large even during its recent ($290 ka) speciation from Gephyrocapsa (figure 2c in Bendif et al. [2019]). Linked selection ("genetic draft") may potentially account for modest N e and limited genetic diversity, though, the amount of selection needed for this appears very high (figure 7 in Filatov [2019]). Our study rules out the possibility that the extreme disparity between modest genetic diversity and astronomically large population size in the plankton species is due to an unusually low mutation rate that is expected to evolve in very large populations ( fig. 3 and Sung, Ackerman, et al. 2012). The population genetic processes dominating huge populations of marine plankton species, such as E. huxleyi studied here, remain poorly understood (Rengefors et al. 2017) and deserve more attention from the research community.

Supplementary Material
Supplementary data are available at Genome Biology and Evolution online.