Genetic analysis of the invasive alga Didymosphenia geminata in Southern Argentina: Evidence of a Pleistocene origin of local lineages

The diatom Didymosphenia geminata has gained notoriety due to the massive growths which have occurred in recent decades in temperate regions. Different explanations have been proposed for this phenomenon, including the emergence of new invasive strains, human dispersion and climate change. Despite the fact in Argentina nuisance growths began in about 2010, historical records suggest that the alga was already present before that date. In addition, preliminary genetic data revealed too high a diversity to be explained by a recent invasion. Here, we estimate the divergence times of strains from southern Argentina. We integrate new genetic data and secondary, fossil and geological calibrations into a Penalized Likelihood model used to infer 18,630 plausible chronograms. These indicate that radiation of the lineages in Argentina began during or before the Pleistocene, which is hard to reconcile with the hypothesis that a new variant is responsible for the local mass growths. Instead, this suggests that important features of present distribution could be the result of multiple recent colonizations or the expansion of formerly rare populations. The text explains how these two possibilities are compatible with the hypothesis that recent nuisance blooms may be a consequence of climate change.

The colonial diatom Didymosphenia geminata (Lyngbye) Schmidt has been abundant in some rivers of the north temperate zone for many years. However, there are very few records before the last 30 years for other regions of the World 1,2 . In 1989 growths suddenly developed in rivers of the central region of Vancouver Island in Canada. Since then, many rivers from temperate regions worldwide have been studied carefully enough to be certain that there have also been major increases. Mass growths consist of epiphytic and epilithic accretions of thick mats mainly composed of branched stalks made of a matrix of polysaccharides, uronic acids and proteins 3,4 . The stalk mass greatly exceeds material inside the cell. D. geminata colonies can grow together with other benthic diatoms. In addition, organic debris and small organisms are often entangled in the mats. Mass growths are sometimes sufficient to cover not only the bed of a river, but also its vegetation. Although the alga is not considered harmful for human health, it is widely accepted that it has the potential of altering an ecosystem and producing negative impacts on human activities 5,6 . This has generated much concern and prompted control programs in some countries.
There is still no precise explanation for the behavior of the alga in recent decades. The temporal co-occurrence of mass growths along with increased sport fishing activity during the early Vancouver Island blooms 7 fueled the hypothesis that the overgrowths could be due to dispersion by recreational fishermen. In addition, the facts that D. geminata rapidly invaded many New Zealand rivers and has been reported to be introduced species there 8,9 have suggested that the nuisance growths elsewhere also were due to colonization of ecosystems in which the alga was formerly absent. That the overgrowths could be a consequence of the emergence of a new variant with increased growth and invasion capabilities has also been considered. These hypotheses were rapidly and widely accepted, strongly influencing management actions. Ecological, histochemical and experimental studies have Results Molecular data generation. We obtained no, poor and/or non-specific amplification products despite previously published PCR conditions being implemented using either whole mat material (WBS-DNA) or isolated cells (IC-DNA) as template (not shown). This could, however, be improved by assessing several primer combinations and nesting schemes and carefully tuning the corresponding annealing temperatures. For each primer combination evaluated (Tables 1 and 2), the annealing temperature was optimized by a gradient PCR (50 to 65 °C in steps of ~1.5 °C) using WBS-DNA as template. Once optimal temperatures were determined, these were re-assessed using IC-DNA as template. The results of these experiments, including PCR performance and optimized temperatures, are outlined in Table 2. Out of 10 primer combinations evaluated, 8 were able to generate amplicons suitable for direct sequencing after PCR optimization ( Table 2). All the samples could be amplified with primer pairs EUK528f-etts3rev and eits2dir-etts3rev (Experiment E in Table 2), but the obtained sequences corresponded to contaminant, environmental DNA. For one of the samples from the Futaleufú River (FTa), the amplified 28S sequences did not correspond to D. geminata and the rbcl PCR produced negative results. No data could be generated for five samples from four different rivers, because no DNA could be amplified (samples QQ, QM and TR and 18S, ITS1-2, 5.8S and rbcl loci from samples DVa and DVb) or because the amplified sequences didn't correspond to D. geminata (28S amplicons from samples DVa and DVb). phylogenetic inference. As a first step in the dating analyses, we placed the Argentine sequences phylogenetically in the context of other diatom groups. This was done to ensure including a representative sample of the diatom diversity for estimating our molecular clock rates. Overall, we could compile 18S, 28S and rbcl sequences from 76 diatom species encompassing the major groups [20][21][22][23][24][25][26][27][28][29][30][31][32][33] . There were too few COX-1 sequences available for these taxa and the available ITS and 5.8S sequences could not be aligned confidently, so these loci were not used in the dating analyses. Previous studies have reported conflicting phylogenetic signals between different genes in some diatoms, which has been attributed to phenomena like differential retention of paralogs, horizontal gene transfer, heteroplasmic variation and deep coalescence 24,31,34 . Thus, we identified all the species or specimens having discordant relationships between the markers studied. The criterion implemented was that phylogenetic assignments between all the loci after both Maximum Likelihood and Parsimony analyses were to be compatible. The specimens that did not satisfy this criterion were dismissed from posterior analyses. The selected sequences are listed in Table 3. These sequences were aligned together with the Argentine D. geminata sequences and submitted to the evolutionary analyses described below.
The aligned data presented 3754 positions with 1872 patterns of which 1341 were parsimony informative. Phylogenetic analyses recovered the major diatom groups with good branch supports (Fig. 1). The few nodes that were inconsistent between the Maximum Likelihood and Parsimony topologies presented low (<80) or no (<50) www.nature.com/scientificreports www.nature.com/scientificreports/ bootstrap supports. Furthermore, most alternative resolutions of such nodes were coincident with the most frequent resolution in the alternative method. For example, Mediophyceae was the sister clade of Bacillariophyceae (i.e. same as in ML tree) in 30 parsimony bootstrap trees and it was the sister clade of Coscinodiscophyceae (i.e. same as in Parsimony analysis) in 30 ML bootstrap topologies. Likewise, Cocconeis stauroneiformis was placed as the sister clade of Naviculales in 23 ML bootstrap trees and displayed the same position as in the ML bootstrap tree among 18 Parsimony bootstrap trees. Cymbella mexicana, C. proxima and Didymosphenia were always recovered as a monophyletic group. Nonetheless, the positioning of C. mexicana was erratic among the Parsimony bootstrap trees, where it was recovered either as the sister species of Didymosphenia ( Fig. 1), the sister of C. proxima (17 trees), forming a trichotomy with Didymosphenia and C. proxima (8 trees) or as the sister species of (Didymosphenia + C. proxima) (25 trees). Uncertainty was considerable regarding the phylogenetic placement of other Cymbellaceae (Fig. 1). Didymosphenia and D. geminata were strongly supported in both analyses. D. siberica was the sister of D. geminata in 75 Parsimony Bootstrap trees and in 83 ML ones (Fig. 2). The most frequent alternative resolutions were D. dentata and D. siberica monophyletic (8 Parsimony and 8 ML bootstrap trees), the three species forming a trichotomy (8 Parsimony bootstrap trees) and D. dentata and D. geminata monophyletic (8 ML bootstrap trees). Despite D. geminata was well supported and the Parsimony analysis resulted in only 6 optimal topologies, the supports within the D. geminata clade were low, indicating that the data support different plausible phylogenies for our sequences. To take into account this uncertainty as well as the minor differences between the relationships of major diatom groups, we performed the dating analyses using the ML and Parsimony trees and the full sets of Parsimony and ML bootstrap trees (detailed below). Molecular clock parameters assessment. To assess the performance of different model parameter configurations, we implemented a cross-validation (CV) criterion (Materials and Methods). The smaller CV score (97.9) was obtained from one of our most parsimonious trees when it was analyzed using values of λ and k of 0.33 and 2, respectively. The ML tree resulted in the smaller CV when we set λ = 0 and k = 2. A few combinations of parameters resulted in very high CVs with some of our trees (Fig. 3). Nonetheless, these same configurations, in combination with other trees, produced average CVs. As instance, setting k = 4 and λ = 1 resulted in a very high CV when used with one of our most parsimonious trees. However, the same configuration produced relatively small CVs when used along the rest of our Parsimony and the ML trees. Furthermore, the majority of configurations and trees resulted in very similar CVs. In Fig. 4, where both the tMRCAs obtained from each reference tree and the corresponding CVs are shown, the majority of tMRCAs (points) are colored in intense green, meaning that the corresponding scores were small. In the same Figure, only a small fraction of the points is colored brown or red, which correspond to comparatively 'bad' (high) CVs. Moreover, it is easy to appreciate, by looking at Figs. 3 and 4 together, that these 'bad' CVs do not correspond to any combination of parameters. Thus, we integrated the results obtained under all the settings implemented.    Table 4). All the calibrated nodes but the node labeled 3 were shared with the corresponding Parsimony phylogeny (please see Fig. 1 and main text for detailed comparisons). Some clades were collapsed for enhanced display (numbers of accrued sequences are given in parenthesis in the corresponding terminal names). Numbers close to branches correspond to bootstrap supports > 50. Numbers below the phylogram correspond to mid-ranges between the minimum and maximum bounds implemented in the clock model (see also Pinnularia stem node (7) -100 Secondary 32 Pinnularia stem node (7)   www.nature.com/scientificreports www.nature.com/scientificreports/ analysis is summarized and compared to previous geological data and a fossil-based diatom biochronology in Fig. 5. Remarkably, many of the tMRCAs estimated here fell in a period of time roughly coinciding with the Great Patagonian Glaciation (GPA) between 1.17 and 1 Ma 35 . Moreover, the large majority of the obtained tMRCAs were coincident with a period of intense diatom turnover and diversification in the northern hemisphere 36 .

Discussion
Here, we used genetic data to evaluate alternative explanations for the nuisance D. geminata overgrowths which have occurred in recent times in Argentina. Data from seven loci were generated, including nuclear (18S, ITS-1, 5.8S, ITS-2 and 28S), mitochondrial (COX-1) and plastidic (rbcl) markers, and homologous sequences were searched among other groups of diatoms. The search resulted in a selection of sequences from 3 genes (18S, 28S, and rbcl) that were well represented in the major diatom groups. These sequences were combined with geological, fossil and secondary calibrations to set a molecular clock that allowed us to infer a chronology for the strains present in Argentina. As discussed in detail below, the chronology suggests that the alga has recently become abundant or that several ancient D. geminata lineages have been recently introduced.
Generating D. geminata molecular data is difficult 19 . No 28S and COX-1 data have been reported before for this species. There is a single ribosomal 18S sequence from the old world, obtained in the context of a broad study of other diatom groups 26 , and another from New Zealand 21 . The previous data from the American continent includes 28 molecular clone sequences of the 18S gene from eastern rivers of the USA 25 , a single 18S sequence from Colorado (USA) and 15 18S sequences from southern Argentina and Chile 16,24 . Sequence data have also been reported for the ribulose-1,5-biphosphate carboxylase/oxygenase large subunit (rbcl) gene of 9 samples from Chile 24 . The amplification of the ITS1, 5.8S and ITS2 loci resulted to be particularly challenging. All the  www.nature.com/scientificreports www.nature.com/scientificreports/ primers combinations tested produced nonspecific or multiband amplifications except the ones including one didymo-specific primer in the first PCR rounds, which we attribute to an enhanced variability of these markers. On the other hand, the use of D. geminata-specific primers in the 18S amplifications allowed us to generate good DNA amounts in a single PCR round using both IC-DNA and WBS-DNA as template. This can be helpful when molecular cloning or cell isolation cannot be afforded, as for example when analyzing large amounts of samples. Despite using optimized DNA extraction and PCR conditions, no data could be generated for five samples and two loci from a sixth. This might reflect the persistence of enzymatic inhibitors 19 . However, the amplification of environmental DNA in several experiments suggests that in some cases the absence of D. geminata amplicons might respond to sequence variability, which requires further study.
The chronology inferred for the strains from Argentina is summarized in Fig. 5. Overall, our results indicate that the common ancestor of the Argentine sequences probably lived during the Pleistocene or before. The younger tMRCA inferred from our 18,630 chronograms was 26,134 years and only about 10% (1911) of these chronograms supported tMRCAs shorter than 1 My. This makes it very unlikely that the nuisance growths from Argentina be caused by a novel variant of the species (in which case the actual tMRCA should be much smaller; maybe some tens of years). Taking into account the divergence times inferred in this work, in order for the 'new variant' hypothesis to be valid it would be necessary to assume that a mutant strain has emerged somewhere, and that this mutant was recently brought to the region and the mutation was transferred to ancient strains by sexual reproduction. We have observed populations with different average cell sizes, suggesting that cycles of cell size reduction and restitution may occur (unpublished results). However, to the best of our knowledge, there is no yet confirmation that size restitution is accompanied by sexual reproduction in D. geminata.
If it is assumed that the presence of D. geminata in South America is recent, a hypothesis based on our results is that the corresponding colonization was carried out by multiple ancient lineages. This idea (high propagule pressure) has been used to explain the high genetic diversity observed in two rivers in Maryland, USA 25 . A high propagule pressure could be a consequence, for example, of anglers spreading propagules consisting of many cells. As already explained, the concept that the recent nuisance behavior of the alga is associated with its dispersion along with fishing equipment is entrenched in some areas. However, previous studies indicate that D. geminata cells are easily killed by physical stress such as drying, salt and freezing 37 , which raises doubts about this explanation. The sensitivity of D. geminata cells to environmental stressors suggests that long-distance dispersion events are rare and probably achieved by very few cells. This concept is compatible with the genetic homogeneity observed in New Zealand 9 , where the species is likely a recent invader. A plausible way in which multiple D. geminata lineages could have arrived to South America is along with fish stocks brought to the region. P. J. Macchi and collaborators conducted a survey whereby they determined that between 1 and 8 foreign species were introduced into 28 basins between 1904 and 1965 38   Each tree was analyzed with 90 different λ and k combinations, giving a total of 18,000 chronograms. The color bars below the histogram represent the age ranges obtained using optimal ML and Pars trees (details in Fig. 4). The upper part of the figure shows the relative diversity of five diatom genera in the Neogene and Quaternary of USA; based on reference 36  Another possible explanation for the present results is that D. geminata may has been present in the region since ancient times but in very low abundance. There are previous fossil and extant taxa data that are compatible with this. Fossils of Didymosphenia spp. and Cymbella mexicana, the sister species of the genus Didymosphenia according to previous studies 30 and our own results ( Figs. 1 and 2), have been described in sedimentary rocks of the Miocene of North America and Japan 40,41 . This indicates that in that period there were already stem Didymosphenia species. In addition, the presence of D. geminata fossils has been reported in Pliocene sediments from China 42 . Thus, the species of the genus Didymosphenia likely have had plenty of time to disperse. By the other side, the cooling after the Mid-Miocene Climatic Optimum (MMCO) has allowed the establishment of temperate species in latitudes that formerly were too warm 43 . A nice example of this phenomenon is the diatoms biochronology studied by W. N. Krebs et al. 36 (depicted in the upper panel of Fig. 5), which reflects the expansion and diversification of three diatom genera as a consequence of climate cooling in a region now corresponding to North America. The cooling after the MMCO also played a significant role in the configuration of the South American biota. Sérsic et al. 44 and Hazzi et al. 45 synthesized the available information for various groups of plants and animals. Hence, the species of the genus Didymosphenia have probably had the possibility of colonizing the land masses that now correspond to southern South America since the middle to late Neogene. This, together with the tMRCAs inferred here and the 20th century sporadic records, support the new hypothesis that D. geminata may has been present in the region since ancient times. The genetic diversity we observed could be consequence of the geoclimatic events occurred in the pre-Quaternary/Quaternary. For example, the Patagonian glacial cycles could have driven the establishment of various D. geminata lineages through phenomena like extinction/ recolonization cycles (multiple introductions) and/or niche contraction/expansion (local diversification), as observed for other taxa from South America and elsewhere 44,46,47 . The divergence times inferred in this study are close to the divergence times of taxa believed to have been affected by Patagonian glaciations, which, in addition to the cooling and heating cycles, produced changes in the patterns of water courses and lakes, and the position of the continental divide. Our hypothesis predicts that D. geminata genotypes should have a more or less structured geographic distribution. Previous molecular studies revealed the existence of cryptic variation and a geographical structure between Gomphonema parvulum demes 48,49 , suggesting that it may also happen in D. geminata. Our hypothesis also predicts that there should be ancient D. geminata fossils in South American, but rare. Future research should aim to deepen genetic studies and investigate the fossil record, which will allow further insight on the possibility that D. geminata is an ancient, formerly rare species.
In summary, the results presented here are compatible with three possible scenarios: (i) that the recent overgrowths are due to the recent introduction of multiple lineages; (ii) that the species was already present and that for some reason it became more abundant; (iii) that a mutant lineage was recently introduced and dispersed new genes conferring an exacerbated invasiveness. The first two hypotheses, which are the ones we prefer for requiring less ad-hoc hypotheses, are compatible with the idea that an ecological factor may have been altered (or reached a threshold) recently, affecting (broadening?) the niche of the alga. Maybe a factor altered by climate change. Whitton et al. 2 and Watson et al. 6 reviewed the evidence suggesting that climate change has influenced some environmental factors and hence the competitive success of D. geminata elsewhere. If this is what happened in Argentina, we could be witnessing a conspicuous effect of climatic change and an example of its unpredictable consequences on worldwide distributed aquatic ecosystems.

Materials and Methods
Samples. D. geminata mats were collected at eight rivers distributed along about 1600 km in western Argentinean Patagonia and Tierra del Fuego. Sixteen samples (Table S1) were selected based on presence, abundance and integrity of cell content, complexity (i.e. proportion of didymo cells relative to debris and other microorganisms in the samples), river bed coverage (>50 m 2 ), and geographic origin. The studied samples are from the Yelcho (samples FTa, FTb, FTc, FTd, FTe, FTf and RI; Futaleufú and Rivadavia Rivers), Puelo (samples AZa, AZb and QM; Azul and Quemquemtreu rivers), Chubut (sample CH; Chubut river), Santa Cruz (samples DVa, DVb and TR; De las Vueltas and Toro rivers), Valdivia (sample QQ; Quilaquina river) and Grande (sample GD; Grande River) basins. All the sampling sites are located in the Andean Patagonian Forest (Bosque Andino Patagónico, BAP) ecoregion except for Chubut River and Grande, which are close to the BAP/Patagonian Steppe ecotone. Details on sampling procedures and DNA extraction methods have been described elsewhere 16,19,50 . PCR amplification and sequencing. PCR amplifications were optimized using DNA templates (1 µl volume of purified DNA suspensions) obtained from both whole benthic samples (i.e. whole mat material; WBS) and didymo cells (50 per sample) isolated from the mats by mouth pipeting (IC). PCRs were made using oligonucleotides described in previous studies 21,23,[51][52][53][54][55][56] , which are detailed in Table 1. All the reactions were carried out in a MyCycler Thermal Cycler (BioRad) applying programs consisting of an initial denaturation step at 95 °C for 1 min, followed by 30 cycles of 95 °C for 30s, an annealing step of 30s and an extension step at 72 °C for 1 min/Kb, and a final extension step of 72 °C for 1 min. PCR products were analyzed by electrophoresis in agarose gels (1.8% in 1X Tris-Acetate-EDTA buffer) stained with Redgel (Biotum) and visualization under UV light. A molecular weight marker, 1 Kb Plus DNA Ladder (Invitrogen), was included in parallel in each gel. For sequencing, three independent PCRs were purified using QIAquick PCR Purification Kit (QIAGEN) according to manufacturer instructions, and the obtained products were re-analyzed by electrophoresis, as mentioned before, but including a mass ruler (Low DNA Mass Ladder; Invitrogen) to quantify the DNA by densitometric analysis using the program Image J 57 . The purifications were also checked by photometric analysis using a Nano spectrophotometer Nano Vue plus (GE Health Science) and finally sequenced by Sanger chemistry using amplification oligos. Chromatogram data processing and sequence assemble were performed with the newer version of BioEdit 58 . Polymorphic sites (double peaks) were coded using IUPAC codes. The obtained sequences were deposited in GenBank under accession numbers KY007192-KY007220 and MK291483-MK291492 (Table S1). phylogenetic inference. Maximum Likelihood trees were inferred by the program RAxML next generation (RAxML-ng, DOI:10.5281/zenodo.593079) under evolutionary models inferred by jModelTest 2 61 based on the Bayesian information criterion. The inferred models were very similar to each other. All included invariant sites and among-sites rate heterogeneity. The rbcl data was better explained by the GTR substitution scheme whereas for the rDNA sequences (18S and 28S) the algorithm suggested one rate for A/C and C/G substitutions, a second one for A/T and G/T and 3 different rates for the remaining substitutions (the TIM3 model in RAxML-ng). Tree searches consisted in 20 initial Parsimony trees that were rearranged by tree bisection and reconnection (TBR). Parsimony analyses were made in TNT software 62 . Tree searches consisted in Wagner (WAG) initial trees that were subsequently rearranged by TBR. The number of WAG trees required to obtain a stable consensus tree (sensu Goloboff 63 and references therein) was 100. Holding one tree during tree swapping was enough both to target optimal trees and obtain a stable consensus implementing 100 replications. Targeting all shortest trees (n = 6) required about 600 replications. Branch supports were calculated with the bootstrap routines implemented in RAxML-ng and TNT. Tree comparisons were performed with the cophyloplot function of the R package Ape 64 , Dendroscope 65 and ad-hoc R scripts available from the authors on request.
penalized Likelihood (PL) dating. Penalized Likelihood analyses were performed with the chronos function from the R package ape 64 . We deemed the most realistic clock model for this dataset, the discrete one, in which different branches are characterized by discrete rate categories. This better reflects that our dataset included several diatom taxa (Table 3; details in Results). Implementing a discrete clock requires using a predefined number of rate categories (k) and a parameter, λ, which governs the magnitude of the penalty applied to rate changes across the phylogeny. The performance of different k and λ settings was explored by a cross-validation criterion. We implemented a one-by-one terminal removal approach as in reference 66 . We define a score (CV) based on the cross-validation implementation of Ape's chronopl function, aiming to weigh the impact of parameters' variation on tMRCA estimates. For each k and λ combination (varied from 2 to 10 in increments of 1 and 0 to 1 in increments of 1/9, respectively), the height of the D. geminata crown node (OCH, for Observed Crown Height) was obtained from the full tree (reference tree, RT) and from each of a set of M trees generated by deleting one RT terminal at a time. Then, we calculated a score for each combination of k and λ values by the equation: where OCH is the observed D. geminata crown node height, that is the height in the RT, and PCHs (Predictor Crown Heights) are the heights in the deleted trees.
Tree calibration. Didymosphenia sp. fossils have been described from the Messinian period 41 , suggesting that Didymosphenia radiation started along the Neogene. Thus, we set the beginning of the Messinian as the minimum bound for the crown node of the genus. Previous studies 30 and the analyses performed here (please see the Results) indicated that the most basal taxon inside the Didymosphenia clade is D. dentata, which is one of the endemic species of the genus in Lake Baikal. The Baikal formation begun in the Late Cretaceous and it is estimated that the older Baikal taxa appeared about 70 Ma 67,68 , so we used this to set the maximum bound for the whole Didymosphenia clade. The age of the diatoms stem node was bounded between 190 and 396 Ma based on fossil evidence 69 and previous molecular clock analyses involving an extensive sample of eukaryotic taxa 70 , respectively. The diatom crown node, the Bacillariophyceae crown node, the core araphids/raphids split and the raphid pennates crown minimum and maximum bounds were set according to references 71 and 29 . The Mediophyceae stem node was assigned a minimum age of 110 My based on fossil records 72,73 . The corresponding maximum age was equalized to the minimum age of the diatom crown node. We also set minimum and maximum bounds for the Pinnularia stem node, based on fossil data and molecular clock analyses from 32 . The full set of calibration points implemented in the present study are summarized in Table 4 and Fig. 2.