Single cell PCR amplification of diatoms using fresh and preserved samples

Single cell Chelex® DNA extraction and nested PCR amplification were used to examine partial gene sequences from natural diatom populations for taxonomic and phylogenetic studies at and above the level of species. DNA was extracted from cells that were either fresh collected or stored in RNAlater. Extractions from Lugol's fixation were also attempted with limited success. Three partial gene sequences (rbcL, 18S, and psbA) were recovered using existing and new primers with a nested or double nested PCR approach with amplification and success rates between 70 and 96%. An rbcL consensus tree grouped morphologically similar specimens and was consistent across the two primary sample treatments: fresh and RNAlater. This tool will greatly enhance the number of microscopic diatom taxa (and potentially other microbes) available for barcoding and phylogenetic studies. The near-term increase in sequence data for diatoms generated via routine single cell extractions and PCR will act as a multiproxy validation of longer-term next generation genomics.

Introduction DNA barcoding has become common practice in animal and plant taxonomy (Hebert et al., 2003) with cytochrome c oxidase 1 (CO1), a mitochondrial gene, serving as the main animal barcoding gene (Hebert et al., 2004). In plants the chloroplast genes ribulose 1,5-biphosphate (rbcL) and megakaryocyte-associated tyrosine kinase (MatK) serve as two of the preferred barcode genes for taxonomic identifications (CBOL Plant Work Group, 2009). This is in contrast with the situation faced in diatom barcoding where several regions are presently identified as prominent taxonomic markers (e.g., Yoon et al., 2002;Evans et al., 2007). In some studies the conservative rbcL, CO1, and ribosomal complex gene 18S are considered to be good taxonomic characters for species determinations (Mann et al., 2010;Hamsher et al., 2011;Zimmermann et al., 2011). Ribosomal complex genes ITS, 18S, and 28S were also used both individually and in multi-gene studies to evaluate cryptic taxonomic variations at the genus and species levels (e.g., Amato et al., 2007;Vanormelingen et al., 2008;Pouličková et al., 2010;Kaczmarska et al., 2014). The smaller (∼500 bp) chloroplast psbA gene was used in evolutionary studies but is not variable enough to be informative for species-level taxonomic studies (Souffreau et al., 2011, pers. obs.). In contrast, chloroplast gene psbC (>1000 bp) is more widely used across all the major orders (Theriot et al., 2010). With little consensus as to which marker best delimits diatom groups, the ability to amplify several genes including new genes from a single cell is essential for diatom taxonomy using DNA barcodes.
In microbial studies, genetic sequencing has been successful across all the primary algal families Bacillariophyceae, Cyanophyceae, Chlorophyceae, Chrysophyceae, Cryptophyceae, Desmidiaceae, Euglenophyceae, Haptophyceae, Pyrrhophyceae, Raphidophyceae, Synurophyceae (e.g., Daugbjerg and Andersen, 1997;McCourt et al., 2000;Tomitani et al., 2006;Edvardsen et al., 2011;Bennett et al., 2014). However, compared to DNA research in Plantae and Animalae, there are much fewer sequences available for diatom taxa, leaving large taxonomic holes in DNA databases. As well, microbial genetic studies in algae are limited by the ability to collect enough material, and population genetic studies are all but absent. Culturing algae and other microbes to accumulate sufficient DNA has been the time-limiting step in microbial genetics. Cultures supply extra material for morphological identification and validation; however cultures are prone to alterations in structural morphology (Trainor et al., 1971;Estes and Dute, 1994).
Single cell extraction and PCR protocols have been advanced in microbe research on live and fixed materials, although no single approach meets all aspects of routine enhanced multi-gene taxonomic research (e.g., Sherbakova et al., 2000;Ruiz Sebastián and O'Ryan, 2001;Lang and Kaczmarska, 2011). To date single cell extractions have been successfully completed with the Chrysophyceae (Auinger et al., 2008), Pyrrhophyceae (Richlen and Barber, 2005;Henrichs et al., 2007), and Bacillariophyceae (Lang and Kaczmarska, 2011). There are only a few published examples using non-cultured single cell amplifications (e.g., Godhe et al., 2002;Auinger et al., 2008;Lang and Kaczmarska, 2011). These amplifications are successful if there is sufficient initial DNA within the cell and the primer set is effective at maximizing amplification efficiency. In most cases, amplification of DNA for sequencing requires a nested amplification protocol. This nested approach is effective but has the potential to generate amplification errors (Ruck et al., 2014). In order to effectively utilize this nested approach, clear protocols for error checking and sequence validation much be established. Single cell genetic determinations are presently a novel and potentially efficient way to generate a large reference library of taxonomically informative data from single algal cells in complex environmental systems. This reference library will also contain an extensive database for population genetics and genetic biogeography studies (e.g., Alverson et al., 2007).
There are a number of reagents which can be used in single cell DNA amplifications with algae (e.g., Bertozzini et al., 2005;Auinger et al., 2008). Chelex R resin is an effective DNA extraction tool with applications in molecular biology ranging from multicellular vertebrate tissues to single microbial cells (Hahn et al., 2000;Richlen and Barber, 2005). This extraction method was used in genetic investigations in forensic science (Legrand et al., 2002), population studies (Richlen and Barber, 2005), and evolution (Theriot et al., 2010). In mammalian research Chelex extractions have been used on fresh, frozen, alcohol preserved, and to a limited degree on formalin-fixed tissues (Legrand et al., 2002). The simplicity of this technique coupled with relatively cheap cost allows for quick PCR assays (e.g., Bowers et al., 2000;Reyes-Escogido et al., 2010) which has the potential for more detailed taxonomic barcoding initiatives and finer population genetic studies.
The recovery of gene sequences from structurally fixed and preserved material has had some success across all the biological groups, but has not developed into a routine protocol used for sequencing (Connell, 2002;Godhe et al., 2002;Henrichs et al., 2007). Institutions with collections of fixed biological materials, like museums, are extremely interested in the recovery of genes and genomes from their historic collections. The temporal record hidden in fixed biological collections-with regard to speciation and population genetics-is waiting to be mined. In microbial genetics, gene sequences have been recovered from ethanol, Lugol's solution, buffered formalin and RNAlater fixations (Ambion, 1999;Connell, 2002;Godhe et al., 2002;Auinger et al., 2008;this study). The general recipe for success is removal or dilution of the traditional fixation solution followed by standard sample processing, DNA extraction, and gene sequencing. Buffered formalin preserved materials can be treated with cold methanol to minimize the impact of the fixation, while sodium thiosulfate is effective in capturing iodine from solutions and biological materials (Godhe et al., 2002;Auinger et al., 2008). In the Protista, ethanol preparation and fixation prior to sequencing is less common, although has been successful with some limitations (e.g., Godhe et al., 2002;Henrichs et al., 2007;Lang and Kaczmarska, 2011;Ivanova et al., submitted). RNAlater is a more recent stabilizer suitable for the storage of material prior to RNA and DNA sequencing with a high rate of recovery (Ambion, 1999). RNAlater has the advantage of being a good fixative and further prevents the degradation of RNA and DNA. At 4 • C viable fixation can be maintained for a month and at −20 • C samples can be maintained indefinitely (Ambion, 1999).
The objective of this study was to evaluate a nested amplification protocol for multiple genes in diatoms from single cells under live and RNAlater preserved conditions. This will establish a standard multiproxy routine for reproducible barcoding with morphological analysis of natural populations. In this study new primer pairs for nested amplification of rbcL, 18S and psbA were designed and different DNA polymerases and cycling protocols were compared. Finally, examples are presented for how this protocol can be used to establish a more comprehensive reference library of taxonomic and physiological genetic data.

Collection of Samples
All samples were benthic or planktonic and collected from freshwaters with a wide range in pH (5.1-7.8) and eutrophication states (oligotrophic-mesotrophic) (Tables 1, 2). The samples were kept cool in transport and at room temperature in the lab in natural light prior to single cell isolations.

RNAlater Preservation
A 0.2 mL volume of fresh benthic sample was aliquoted into a 1.5 mL tube containing 1 mL of RNAlater tissue storage buffer (http://www.protocol-online.org/prot/Protocols/ RNAlater-3999.html). The sample was hand shaken to mix, kept at room temperature for 24 h, and then stored in the dark at 4 • C between 5 and 21 days before single cell isolation.

Lugol's Fixation
Following the protocol from Henrichs et al. (2007), 0.2 mL of living benthic sample was aliquoted into a 1.5 mL tube and then fixed with Lugol's iodine solution (10 g I 2 , 20 g of KI, 200 mL ddH 2 O). In this study non-acidified Lugol's fixation (no glacial acetic acid) was used while Henrichs et al. (2007) used acidified Lugol's fixation. The merits of using non-acidified vs. acidified Lugol's fixatuion, was discussed in Throndsen (1978). Samples fixed with non-acidified Lugol's iodine solution varied in storage duration from 12 days to 20 years. Just prior to isolation, 20 µL of 1 M sodium thiosulfate was added to the samples and hand mixed until the Lugol's iodine solution was dissipated (became colorless).

Single Diatom Cell Isolation
One to two milliliters of each sample (living, RNAlater preserved or Lugol's preserved) was placed on a large microscope slide. The sample was then diluted with ∼1 mL of sterile, nuclease free water (Bioshop Canada Inc.) and examined under an inverted microscope (Leica) or a compound microscope (Nikon, with long working distance objectives) at 10x magnification. Individual cells were isolated through suction using 20-40 µl drawn-out disposable pipets, either with a Narishige micromanipulator or simple manual suction. This isolation procedure was modified from Throndsen (1978) by removing the use of Formvar film. The isolated cell with associated contaminants was transferred to a new water droplet of DNA nuclease free water. This isolation and transfer was repeated 2-5 times to remove any contaminants and/or preservative residue. Individual cells were then isolated for the final time and transferred to a 0.2 mL PCR tube containing 200 µL of 10% (w/v) Chelex R 100 solution (Richlen and Barber, 2005). The samples were stored from 1 to 51 days at 4 • C in the dark until DNA extraction (Tables 1, 2; Supplement 1 in Appendix).

DNA Extraction, PCR and Sequencing Protocol
For DNA extraction, Chelex-stored samples were incubated for 20 min at 95 • C. They were then vortexed for 15 s and centrifuged for 15 s at 14,000 rcf. We also tested extraction without the incubation step on some samples, and had similar success with both protocols. For PCR study, the following primers were used ( Table 3).
The first amplifications were performed in a 25 µL volume with a final concentration of 1× PCR buffer (Bioshop Canada Inc.), 2 mmol L −1 MgCl 2 , 0.3 mmol L −1 dNTP, 0.4 µmol L −1 of each primer, 1 unit of Taq DNA polymerase (BioShop), and 5.0 µl of Chelex DNA extract supernatant. The following cycling conditions were used: 94 • C for 210 s; followed by 36 cycles of 94 • C for 50 s, 52 • C for 50 s, and 72 • C for 80 s; and then a final elongation step at 72 • C for 15 min. For the second amplification (and the third amplification step when performed), all steps and concentrations were the same as above except that 1 µl of the product from the previous amplification was used as template. The success of the PCR was assessed by visualizing the products on a 1.5% agarose gel. Successful PCR products were purified using the enzymes Exonuclease I and shrimp alkaline phosphatase (USB Corporation). Big Dye version 3.1 (Life Technologies Corporation) was used for sequencing reactions using 0.6 µL of Big Dye in a 10 µL reaction. Sequencing reaction products were purified via ethanol-EDTA-sodium acetate precipitation. Nucleotide sequences were generated using automated cycle-sequencing on an Applied Biosystems 3130xl automated sequencer. To validate the use of Bioshop Taq DNA polymerase, seven of the samples were reamplified for all three genes using Phusion R High-Fidelity DNA Polymerase (New England Biolabs). The optimized annealing temperatures used in the first and second Phusion amplifications were as follows: rbcL 57.7 • C and 60.3 • C; 18S 52.2 • C and 60.3 • C; psbA 52.2 • C and 63 • C. Further, to ensure that the error rate was not affecting our sequences, seven samples were re-amplified for rbcL using Phusion with 20, 25, and 30 cycles in the second amplification. The final PCR products were then sequenced and compared for base pair differences with the sequences obtained with Bioshop Taq using the standard 34 cycle second amplification step.
Seventy-two rbcL sequences were studied from a wide range of diatom taxa. Within the pennate raphe bearing diatoms, 35 diatom cells were sequenced for all three genes to ensure that the method would allow for multi-gene sequencing. Sequences were assembled and edited in Geneious version 6.1.5 and consensus sequences were aligned using the MAFFT alignment tool. Consensus sequences were compared to the GenBank database using the Basic Local Alignment Search Tool (BLAST) to verify and ensure that no contaminants were sequenced. Initial Maximum Parsimony (MP) tree topologies of each gene were assessed in PAUP v.4.0 (Swofford, 2003), and phylogenetic model testing (using likelihood scores and AIC calculations) of each region was analyzed in JModel Test v.2.1.4 (Guindon and Gascuel, 2003;Darriba et al., 2012) to ensure that the data could be concatenated for analysis. Datasets had the preferred General Time Reversible model (GTR+I+G) (Tavaré, 1986) except the 18S dataset which had the Transitional Model (TIM3+I+G) (Posada, 2003). However, the initial topology tree of the 18S matched both the rbcL and psbA initial topology trees, and, using a chi-squared distribution, the delta values from TIM3+I+G and GTR+I+G (delta = 0.0000, K = 78; delta = 3.9436, K = 80, respectively) were not shown to be statistically different (P < 0.15). Therefore, the model GTR+I+G was used for Bayesian analysis (BI) and Maximum Likelihood (ML) for both the concatenated data set (rbcL, 18S, psbA) and the single gene dataset (rbcL). The BI was carried out with MrBayes v.3.1.2 (Huelsenbeck and Ronquist, 2001;Ronquist and Huelsenbeck, 2003), with a Monte Carlo Markov Chain (MCMC) run for 1 million generations for the concatenated gene data set and 5 million generations for the rbcL dataset with the default settings. Runs were sampled every 1000th generation. The first 250,000 and 1,250,000 were discarded as burn-in for the concatenated gene dataset and rbcL dataset, respectively. The convergence and stationarity of the BI results were analyzed in Tracer v1.6 (Rambaut et al., 2013) and topology convergences were analyzed  in AWTY (Wilgenbusch et al., 2004). ML Bootstrap analysis (Felsenstein, 1985) was done in Garli v.2.01 (Zwickl, 2006), using the GTR+I+G model, with 1000 bootstrap replicates for both the concatenated dataset and rbcL data set. Fragilaria bidens (GenBank Acquisition AB430716.1) was used as the outgroup for our concatenated dataset as it was a close sister species to the taxa used in the concatenated analysis. Bolidomonas pacifica (GenBank Acquisition HQ912421.1) and Cyclotella meneghiniana (GenBank Acquisition KF959651.1) were used as outgroup and sister taxa, respectively, for the rbcL dataset as they were the closest relatives to the dataset taxa available on GenBank.
Of the diatoms sequenced, 60 were from fresh living samples, 12 were from RNAlater fixed samples, and one from a Lugol's fixed sample (Tables 1, 2, Supplement 1 in Appendix). NCBI Blast searches using the new sequences resulted in matches consistent with the genus-level morphological identifications of our specimens. Method validation of the number of cycles (20, 25, 30, and 34) and type of DNA polymerase using five different taxa showed only one instance of base pair substitutions, though there we no differences in the overall sequence alignments. The recovered sequence lengths for rbcL and 18S were both within the average range for the diatom sequences found on GenBank ( Table 4). The length of psbA sequence recovered was slightly below the gene length found for diatom sequences on Genbank ( Table 4). The amplification success was 70%, 90%, and 96% for rbcL, 18S, and psbA, respectively ( Table 5). The recovery success of 18S and psbA was higher than rbcL because only samples that amplified successfully for rbcL were processed for these two regions. We had very low amplification success with the Lugol's fixed samples (Table 5).
In addition, within the Lugol's fixed samples 13 contained fungi 18S nuclear DNA. RNAlater amplification success was consistent across sample storage periods ranging from 5 to 21 days ( Table 5).

Multiple Gene Analysis
Individual topologies of the three genes (rbcL, 18S, psbA) showed no differences, neither did the ML (-LnL = 10058.4412) nor BI analyses, thus only the BI tree was shown with both the BI posterior probabilities (PP) and the ML bootstrap values (BS) (Figure 1). Our dataset showed significant separation at the family level for the following: Pinnulariaceae (PP = 100, BS = 100), Sellaphoraceae (100, 74), Stauroneidaceae (100, 85), Pleurosigmataceae (100, 100), Naviculaceae (100, 100). RNAlater preserved and fresh samples of the same taxa were found within the correct clades. Examples of this can be seen in the genera Craticula, Gyrosigma, and Pinnularia (Figure 1). In the genus Gyrosigma, a fresh sample and an RNAlater sample were significantly similar (100, 87), and came out on the same terminal branch (Figure 1, stars). The small branching of individual taxa within this genera were due to ≤5 bp differences. Although difficult to determine, low number of base pair differences could be either base pair substitution error or intrageneric variation (0.001%) between the concatenated sequences (>3000 bp). Slight branching between individuals was also present in the genera of Craticula (≤ 3 bp differences, 0.001%) and Pinnularia (≤ 14 bp differences, 0.004%), showing low levels of intrageneric variation (Figure 1). Individual cells were principally collected  from benthic sediments, leading to the larger representation of Naviculaceae taxa.

Single Gene Analysis
For the single gene dataset using only rbcL, neither the ML (-LnL = 9690.8210) nor the BI trees showed any differences, thus only the BI tree was shown with both the BI PP and the ML BS values (Figure 2). All isolates from the same genus showed strongly supported monophyletic taxa. In particular, the genera Craticula, Pinnularia, Neidium, Frustulia, Cymatopleura, Surirella, Gyrosigma, Melosira, and Aulocoseira all had very high support values (PP = 100; BS = 100), while the genera Stauroneis (100, 65) and Navicula (100, 85) had supported monophyletic taxa groups. Cells of the same taxon, collected from the same location (±10 m area) were also more closely aligned in the tree compared to similar cell isolates of the same taxon from other locales (Figure 2, red arrows). Specimens from the same genus which were isolated from either fresh, RNAlater preserved or in one case Lugol's preserved samples were always in the same monophyletic group. The Gyrosigma specimen which was isolated from iodine fixation was placed with all other Gyrosigma isolates (PP = 100, BS = 100). As well, for both Gyrosigma and Pinnularia, fresh and RNAlater preserved isolates were on the same terminal branch (Figure 2, black arrows).  The black arrows show two instances in which sequences from preserved cells had very similar sequences from fresh cell samples. The red arrows show three instances where taxa collected from the same location had identical sequences. Bolidomonas pacifica (GenBank Acquisition HQ912421.1) and Cyclotella meneghiniana (GenBank Acquisition KF959651.1) were used as an outgroup and sister group respectively, and their rbcL sequences were obtained from GenBank.

Discussion
The development of novel DNA extraction protocols has accelerated the exploitation of microbial genetic studies in health (e.g., Richlen and Barber, 2005), environment (e.g., Neilan, 1995;Kermarrec et al., 2013), and even diatom taxonomic research (e.g., Evans et al., 2007;Pouličková et al., 2010). Nested DNA amplifications have the potential to open the genetic vault of taxonomic information from single microbial cells. The simple methodology of single cell Chelex R DNA extraction followed by nested PCR has great implications for expanding the genetic reference library of information in algal research. This study uses diatoms as test organisms; preliminary PCR success using dinoflagellates (pers. obs.) is also recorded. The 70-96% amplification success rates using live and (31-100%) using RNAlater fixed samples for single cell PCRs is similar to recovery rates using cultures (Lang and Kaczmarska, 2011). DNA polymerases of different quality and price points (BioShop Taq R and Phusion R ) produced (93%) the same sequence results.
Comparisons of sequences using NCBI BLAST also supported the morphological taxa identifications of the specimens. In this study, the systematic associations of Gyrosigma to Navicula, Craticula to Stauroneis, and Pinnularia to Sellaphora (Figures 1,  2) were in agreement with other studies (Evans et al., 2007;Theriot et al., 2010). This success rate increases the utility of conserved genes known to be good for species level taxonomic discriminations (Hamsher et al., 2011).
In this study, one existing nested primer set for rbcL was used while two new primer pairs were developed for the nested amplification of 18S and psbA in diatoms. The recovered sequence lengths for rbcL and 18S were within the average range of the lengths of diatom sequences found on GenBank while psbA sequence lengths were slightly below the average gene length found for diatom sequences ( Table 4). One single cell DNA extraction provided enough amplification template for multiple DNA amplifications making the approach compatible for robust multi-gene analyses. In this study, 10 amplifications from a single cell extract were successfully performed. Based upon the replicated amplifications done, a conservative estimate suggests there could be enough template for 30-35 PCR amplifications per extraction. There is even potential that nested amplifications can be used to generate large-scale genetic datasets using next generation sequencing protocols optimized for low concentration DNA templates (e.g., initial amplification using the Multiple Displacement Amplification reaction, (Lasken, 2007); but see (Ning et al., 2014) for a review of current challenges). Some research suggests that the Chelex extraction method, which also conserves proteins, may not be the most suitable for Multiple Displacement Amplification reactions. However, one study found that keeping proteins was not an issue for whole gene amplification (Lepere et al., 2011).
One concern related to using this technique is the potential for contamination during isolation of a single cell. Often diatom cells, like taxa in the genera Asterionella Hassall, Tabellaria Ehrenberg and Surirella, have epiphytes which may contaminate amplifications. In addition, single cells often have organic extracellular polymeric substances (EPS) which capture bacteria, fungi, and loose organics with remnant DNA (Das et al., 2013). Problems related to the amplification of DNA from nontarget sources were observed in this study. In the Lugol's fixed samples, 18S nuclear DNA recovery from fungi was amplified in 13 samples. In this instance, the contamination problem was identified by BLASTing the sequencing product. Contamination from non-algae sources can be easily identified using this protocol and removed from study. Since contamination was only observed in selected Lugol's samples, we can conclude that with good isolations, fungi can be effectively removed from sample concerns using fresh and freshly fixed material. Contamination from more genetically associated algal sources maybe more problematic given the potential for cross-contamination during amplifications (Zhang et al., 1992;Ruck et al., 2014) and a limited genetic reference library available for comparison. However, in this study there was no evidence of contamination.
Time consuming recovery of DNA from micro-organism cultures has limited the success of barcoding in the Protista. Ruck and Theriot (2011) developed a single cell diatom field isolation and rbcL DNA extraction procedure using Chelex. This approach is effective, although limited by time requirements for isolating cells in the field and no ability to reinvestigate the samples collected. Collecting live and fixed samples in the field for subsequent isolation back in the lab gives greater success in DNA recovery. Using this single cell extraction protocol alone, or with a limited number of replicated daughter cells (short-term cultures), will greatly increase the database of DNA sequences for diatoms and microbes.
In cell sequencing, there are inherent problems with the destruction of the voucher specimen ( Figure A1). Although reported that diatom valves can be recovered after DNA extraction in ethanol (Lang and Kaczmarska, 2011), the glass beads in the Chelex solution destroy diatom valves after centrifugation (pers. obs.). To limit the possibility of erroneous specimen identifications, many photomicrographs were taken of each single cell prior to DNA extraction.
These were linked to morphological vouchers (cleaned diatom valves) collected from the same respective population. For example, diatom specimens of G. acuminatum were matched using DNA results and recoverable specimens from the natural population (Figure 3, Figure A1). To further reduce identification errors, replication of DNA sequencing results with comparison to more morphological specimens from a population can improve the validation of species identifications in association with both genetic and morphological variability. In the case of G. acuminatum, these gene sequences can be directly related to a morphological study of the species type (Sterrenberg, 1995). However, this approach can be highly problematic when taxa within a community have overlapping morphological characteristics. In these cases, additional detailed multi-gene studies of populations could enhance the resolution and identification of cryptic species (e.g., Pouličková et al., 2010).
Traditional fixation of microbial samples for morphological, ecological and physiological study has a long history (Throndsen, 1978;Simmons, 2014). Recent studies have demonstrated that gene sequences can be recovered from a variety of traditionally fixed samples (Connell, 2002;Godhe et al., 2002;Henrichs et al., 2007;Lang and Kaczmarska, 2011). Ethanol fixed samples are not commonly found in microbial museum collections because they are subject to evaporation and have negative extraction effects on cell pigments. In the current study, recovery of gene product from Lugol's fixed samples was poor 7.5 and 68% for rbcL and 18S genes respectively. With these product recoveries sequence success was reduced further with many isolates have low quality sequence results which were rejected. Only one psbC amplification and sequence determination was attempted and successful. No success was observed from formalin-fixed samples, although we did not immediately transfer formalinfixed material into methanol storage as described by Godhe et al. (2002). Although Bertozzini et al. (2005), and Auinger et al. (2008) have successfully recovered DNA from dinoflagellates and chrysophyte fixed with Lugol's solution, we need more detailed methodological studies to improve percent success in the routine recover of DNA from diatoms in Lugol's and formalin fixation. However, Godhe et al. (2002) suggest that Lugol's solution and varying ethanol fixations have other shortcomings. In this study we also noted the presence of fungi in museum collections fixed with Lugol's, a potential problem with historically fixed collections. At present RNAlater represents an excellent genetic fixation protocol for sample collection, short term storage and long-term archiving of microbial collections. Cell wall structure in Chlorophyceae (Oedigonium sp., Pediastrum boryanum), Cyanophyceae (Phormidium sp., Oscillatoria cf. princeps) and diatoms were maintained in RNAlater fixed samples under cold and frozen conditions. Chloroplast integrity was even maintained for G. acuminatum during freezing at −17 • C ( Figure 3F). The specifications for RNAlater indicate that treated tissues can be stored at 25 • C for 1 week, at 4 • C for 1 month, or at −20 • C indefinitely (Ambion, 1999). In this study, both fresh and RNAlater fixed samples had predictable extraction success ( Table 5). This supports the adoption of RNAlater as a long-term diatom storage media.
The recovery of DNA from archived museum and research collections is currently poor but quickly advancing, especially with vertebrate collections (e.g., Payne and Sorenson, 2002). However, museums and large collections should prioritize the implementation of storage and fixation techniques that maintain the molecular integrity of the samples. RNAlater preserved algae, including diatoms, subjected to freeze-thaw cycles showed some internal cell cytoplasmic alterations; however the chloroplasts and associated pyrenoids remained intact. RNAlater represents a good alternative for specimen, tissue and single cell preservation. DNA barcoding can help with species delimitation and refining the concept of cryptic species. For example in this rudimentary study with a small population of G. acuminatum (Figures 1, 3), gene sequences for rbcL showed 1-5 base pairs differences between the four specimen clades, collected from three different sites within our primary pond (NHC-1). C. cf. cuspidata showed no variability (no base pair differences) in specimens from another sample site. In contrast, up to 117 bp differences were observed within the Navicula clade from three different locations within a lake and up to 88 bp differences were noted in the Pinnularia clade from four different lakes and pond locations. These results suggest that by expanding the use of barcodes to many individuals within a diatom population, inter-and intraspecific questions can be routinely addressed.
Historical problems in extracting, amplifying, and sequencing DNA from single-cells have limited the development of genetics as a tool in the study of global microbial diversity, biogeography, and physiology. In diatoms, DNA sequencing from single cells is a logical step forward in population, taxonomy and environmental genetic studies. More conventional morphometric studies routinely use sample populations to determine size diminution series and variability of morphological expression (e.g., Lange-Bertalot et al., 2011;Levkov et al., 2013). With detailed genetic studies of single cells, links to match morphological populations will be informative in understanding variations in genotypic and phenotypic expression. At the species level, single cell multi-gene sequences along with associated morphometrics can act as multi-proxy validation datasets for species identifications. Future developments with single cell sequencing may even advance next generation genomic research.

Author Contributions
PH funded the project, developed in collaboration with RB the experimental design, identified the development with fixed samples, initiated the expansion to multiple gene and extractions, completed cell isolations and wrote the drafts of the manuscript. KL conducted cell isolations and the majority of the sequencing, edited and corrected the manuscript. This author developed new ideas on recovery of viable DNA, produced the figures and tables, wrote most of the methods and results. RB conducted the initial work in developing the single cell isolation protocol for the Canadian Museum of Nature Laboratory of Molecular Biodiversity, developed new primers for the nested amplification, experimental ideas for the development of the manuscript and contributed extensively to the writing and final editing of the manuscript.