Comparison and optimization for DNA extraction of archived fish specimens

Graphical abstract

Our method is a modified version of the Isolation of Genomic DNA from Tissues Protocol protocol for QIAamp 1 DNA Micro Kit, originally developed by Qiagen for DNA extraction from small quantity of tissue, which we adapt to extract DNA from ancient fish specimens preserved in museum collections. Resource availability: QiaAmp Micro Kit (Qiagen) First DNA (Gen-Ial) Mega 6 software

Method details
Background Ancient DNA (aDNA) is the DNA isolated from old samples as subfossil bones, mummies, or museum specimens, that were not properly preserved for DNA extraction. As traditional repositories for biological specimens and tissue samples, museum collections are valuable resources for mapping and naming biodiversity. Nowadays, with the possibility of DNA extraction from archived specimens, the museums become potential storehouses for lots of molecular scientific investigations [1,2].
In taxonomy, the use of aDNA has been a powerful tool for solving problems wherein the type specimens, usually very old, no longer preserve diagnostic features for species identification [3,4]. However, the DNA extracted from this kind of sample is usually little and highly fragmented, restricting the success of further applications.
In order to overcome these issues, we tested extraction kits, reagents, and primers to develop a successful DNA extraction protocol from ancient museum samples. This paper reports our experience extracting and amplifying aDNA from 53 type specimens of Characidae fish family described in the eighteenth, nineteenth and twentieth centuries (Table 1), as well as the modifications introduced in the product recommended protocols that resulted in a successful method to extract DNA from old museum fish specimens.
Since aDNA is typically scarce and fragmented, any modern DNA contamination, no matter how small, prevails over the ancient DNA and ends up aborting the results. Therefore, all procedures involved in obtaining and amplifying aDNA were performed following the established sterilization guidelines [5][6][7], to discard any possibility of contamination.

DNA extraction
We were authorized to sample 53 type-specimens of the Characidae fish family, preserved in different museum collections (see Table 1). Tissue removal was made in a sterile manner and the least invasive possible way to avoid both unnecessary damage to the specimen and contamination of the samples. For this, preferably, part of the branchial arch was removed, otherwise muscle was removed by a very small incision below the dorsal fin, always on the right side of the specimen (Fig. 1), and then immediately inserted in alcohol absolute and cold stored. The tissue was removed in sufficient quantity (30-50 mg) for three DNA extractions, allowing repetition of the process [6].
To avoid contamination with modern DNA, all procedures involving aDNA were performed under maximum cleaning and sterilization conditions, in an isolated and dedicated room [5][6][7]. The "clean" laboratory, ARCHGEN (Supl. Data 1), has "one-way" rule of movement, which means that all reagents only move from this Pre-PCR room to the Post-PCR facilities. As it is required for ancient DNA, the ARCHGEN was created in a room that was never used for manipulating DNA before, Table 1 List of the sampled type-specimens with quantifications of DNA yield at 5 min, 24   Although the DNA quantitation using spectrophotometer have been reported [8,9], is known now that fluorometers are more accurate and accepted [10][11][12][13][14]. According to Rohland and Hofreiteter [11], "Measuring DNA concentration via absorption of UV light at 260 nm may not be sensitive enough; therefore, measurements using fluorescent dyes such as Pico Green, which binds to dsDNA and increases the fluorescent signal, and extrapolation via a standard curve are recommended." Although both extraction kits showed the presence of DNA in agarose gel, only the Qiagen kit, which uses silica columns, produced viable sequences. We considered a viable sequence those with high quality chromatograms and that the search in BLAST points to the expected species, assuring that the sequences are neither human nor environmental contaminants [7].
Sequences generated from three samples (NMW57759, NMW57760-2 and NMW57540) extracted with the Gen-Ial kit (without silica columns) showed an intense noise and weak signal preventing the reading. Nevertheless, the amplification and sequencing of these same samples, when extracted with the QIAamp, were successful, resulting in viable sequences. We conclude from this that the use of silica columns during extraction results in a cleaner material and free of impurities DNA (PCR and sequencing inhibitors, tissue remains, protein, RNA and extremely small DNA fragments), improving the amplification and the sequencing processes.
All extractions resulted positive for presence of DNA, but in variable quantities (Table 1). In order to increase that amount, we carried out tests with the Qiagen QiaAmp Protocol, and we were able to greatly increase the amount of DNA. Remarkably, we noticed that changing the final step of the protocol, passing the time of elution from 5 min in room temperature (the first column of "Extraction" on Table 1) to 24 h in freezer, greatly increased the amount of extracted DNA, even as a second elution (Table 1). Meanwhile, a third elution maintained for 48 h in the freezer showed a decrease in the amount of DNA (Table 1). In Fig. 2 (Fig. 2), we show these steps as they appear in the original Qiagen kit protocol, and in the modified format of our study.

DNA amplification
Although the NexGen technology (Next Generation sequencing) explores better the fragmentary characteristic of the aDNA, the Sanger technique has a much lower cost, easy use and allows a better control of a given marker, in our case the COI (Cytochrome Oxidase I). Then, it is possible to find a sequence from type specimens that can be used to recognize modern populations of the species for further studies (i.e., phylogeny, ecology).
Our choice of amplifying and sequencing the COI gene was based on its widespread use and availability in public databases including GenBank (https://www.ncbi.nlm.nih.gov/genbank/) and BOLD (http://www.barcodinglife.org/). To overcome the problem of DNA fragmentation, we designed 5 sets of primers (COI-1, COI-2, COI-3, COI-4 and COI-5; Table 2) to amplify small sections of 150-200 bp, which combined would recover the entire COI gene (600 bp). Primer designing was based on an alignment including 217 COI sequences (mean of 600 bp) belonging to 29 Characidae species (Supl. Data 3), attempting to sample the maximum of variability of the specimens at the occurrence area. For building those primer sets we used the tool Oligo Explorer 1.4 (Gene Link, Hawthorne, NY), and checked out their quality and potential efficiency at Oligo Analyzer 1.0.2 [15].
Two brands of reagents were tested for PCR reactions: Phire Hot Start Taq polymerase (ThermoFisher Scientific) and Hot Start Master mix (Promega). PCR with Phire Hot Start Taq was  carried out in a volume of 20 ml containing: 11.6 ml of H20, 4 ml of 10Â reaction buffer, 1 ml of dNTPs (2 mM), 1 ml of each primer (10 mM), 0.4 ml (5 U) of Taq and 1 ml of template DNA. PCR using Promega Hot Start Master was produced in a total volume of 10 mL, containing: 3.45 ml of H20, 5 ml of Master mix (Promega), 0.15 ml of each primer (10 mM), and 1.25 ul of template DNA. PCR thermal profile was the same for both mixes: 94 C for 3 min for initial denaturation, followed by 5 cycles at 94 C for 30 s, high melting temperature (see Table 2) for 40 s, and at 72 C for 1 min, followed by 55 cycles at 94 C for 30 s, low melting temperature (see Table 2) for 40 s, extension at 72 C for 1 min, and a final extension at 72 C for 10 min.
PCR reactions were loaded to a 1% agarose gel together with KAPA universal ladder (Kapa Biosystem), and the products were purified by the Exosap enzymatic method (25% exonuclease, 25% Shrimp Alkaline Phosphatase and 50% deionized water). Sequences were obtained using the Big-Dye reaction on an ABIPrism 3770 automated sequencer from the LAB at NMNH-SI (Laboratory of Analytical Biology at National Museum of Natural History, Smithsonian, Washington DC), Macrogen (South Korea) and Ludwig-ACTGENE (Brazil).
The COI-1 set primer was used 217 times to amplify DNA (including ancient samples and positive control in amplified reactions), of which 47% (102) was checked for presence of bands in agarose and sequenced. Sequencing worked for 21% (22 samples). COI-2 set was tested in 56 samples and bands were confirmed in 37.5% (21) of them. Sequencing was successful in 90.47% (19) of those samples. COI-3 set amplified 29 samples and bands were observable in 51.72% (15) of them, with the exception of two samples where the sequencing failed. COI-5 set was used in 29 samples, forming bands in 34.48% (10); and successfully sequenced for only 20% (2) of the samples. Despite our efforts to increase the specificity, the COI-4 set always showed double bands in the agarose gel, and no sample was sequenced this set. Then, only the sets COI-1, COI-2, COI-3 and COI-5 were considered efficient to amplify COI fragments in archived characid specimens.
Regarding to variability, COI-1 and COI-5 sets were more conservative than COI-2 and COI-3 fragments (Fig. 3). For example, COI-2 fragment presents 6 mutational steps from the modern population of Deuterodon pedri (Fig. 3a) to other species and in Astyanax taeniatus where observed 5 mutational steps from other species (Fig. 3b). In the COI-1 and COI-5 fragments, there is only 1 mutational step between Astyanax rutilus jequitinhonhae and the remaining samples; whereas in the COI-2 fragment there are 9 steps (Fig. 3c) between them. Also, COI-3 fragment of Tetragonopterus eigenmaniorum, 19 mutational steps are counted between this species and remaining samples (Fig. 3d). In short, COI-2 and COI-3 are more variable, and therefore more informative for barcode identifications.

Negative controls
In both processes, extraction and amplification, we included negative controls for checking contaminations. An extraction negative control, containing no tissue, was processed with each species extraction performed. The quantitation of all negative controls was "lower than blank" meaning that DNA quantity is lower than blank solution used to calibrate the fluorometer.
As regarding to the amplifications, a negative control, containing no DNA, was included at each PCR reaction, which posteriorly were checked in 1% agarose gels.

Method validation
Our experience reported above demonstrates that even very small archived samples may generate viable DNA sequences. The specimens here studied were collected more than a century ago by naturalists or scientific expeditions in South America, more specifically in Brazil. The Thayer Expedition (1865-1866;), Charles Darwin in the Beagles voyage (1832), and Castelnau, as consul of the France in Brazil [16][17][18][19][20], collected specimens which later were used to describe new species. Since these collections occurred before the advent of formalin as fixative, these first naturalists usually fixed the specimens putting them in jars with spirits as rum, brandy, Brazilian cachaça, or whisky [21,22]. As spirits are essentially alcohol, that fixation certainly collaborated to make it possible to obtain viable DNA from such an old material [23].
Although both extraction kits here tested quantified positively for DNA in the spectrophotometer, only the Qiagen kit, which uses silica columns, produced viable sequences. The sequences generated from those samples extracted with the Gen-Ial kit (without silica columns) showed an intense noise and weak signal preventing the reading. Then, we conclude that the use of silica columns in the extraction produces a better quality DNA free of impurities (such as PCR and sequencing inhibitors, tissue remains, and extremely small DNA fragments), improving the amplification and the sequencing processes.
Regarding to DNA yielded obtainded with Qiagen kit, no correlation was detected between the amount of DNA extracted and the age of the sample (Fig. 4). As the precise year of specimen collection is not always available, in this study we consider the year of the original description of the species as  (5 and 19) between the holotype and samples with the lowest p-distance on the matriz. The patterns found in (B) and (C) strongly indicates the absence of a sequence that matches with those of the syntypes (B) and holotype (C).Numbers in each branch refer to number of mutational steps between haplotypes; branches with no number represent only 1 mutational step.
the age of the sample. However, we must emphasize that collection and fixation precedes, sometimes for several years, the description, as in A. taeniatus [24], whose material was collected in 1832 by Darwin, and only 10 years later was described by Jenyns (1842). Instead, we believe that maybe the amount and quality of the extracted DNA is more related with the history and storage conditions of which the specimens were exposed to (i.e., alcoholic degree at fixation, number of specimens fixed together, evaporation, dehydration). As a viable sequence appears to be dependent of the fragmentation degree of the DNA, a good quantity of DNA in the sample it is not a guarantee that the amplification and sequencing processes will succeed. Both PCR amplification kits, Phire Hot Start Taq polymerase and Hot Start Master mix, worked very well, suggesting that the success of the PCR is dependent on the extracted DNA quality. Thus, the extraction process is the critical step when working with ancient samples.