DNA evidence of bowhead whale exploitation by Greenlandic Paleo-Inuit 4,000 years ago

The demographic history of Greenland is characterized by recurrent migrations and extinctions since the first humans arrived 4,500 years ago. Our current understanding of these extinct cultures relies primarily on preserved fossils found in their archaeological deposits, which hold valuable information on past subsistence practices. However, some exploited taxa, though economically important, comprise only a small fraction of these sub-fossil assemblages. Here we reconstruct a comprehensive record of past subsistence economies in Greenland by sequencing ancient DNA from four well-described midden deposits. Our results confirm that the species found in the fossil record, like harp seal and ringed seal, were a vital part of Inuit subsistence, but also add a new dimension with evidence that caribou, walrus and whale species played a more prominent role for the survival of Paleo-Inuit cultures than previously reported. Most notably, we report evidence of bowhead whale exploitation by the Saqqaq culture 4,000 years ago.


Supplementary Note 1 -Methodical biases and contamination
To monitor lab contamination during extraction, library building and indexing, a collection of control reactions were included in each batch of sample preparation. From examination of these control samples we detect a low background contamination of DNA from human, cow and chicken (Supplementary table 11), all of which are well known lab contaminants 11,12 . Based on this, DNA from the order of Primates and Phasianidae was identified as contamination and discarded. While sheep and goat have been identified previously as common contaminants, we do not identify these species in either peat or control samples. The background contamination from cattle DNA represents 1-5 reads per group and is detected in peat layers without evidence of cultural remains as well as Inuit layers (marked with * in Supplementary Table 1). At Sandnes, however, a strong and consistent signal from Cattle of ~30x the background contamination (82 -117 reads per group) suggests additional endogenous DNA from the Norse cattle in these layers. As a result DNA from Bos was included in the dataset (Supplementary Table 2), while reads assumed to represent Bos contamination were omitted in Figure 2. Furthermore, in the extraction blank from the helminth library preparation of samples from Sandnes (S-P-EB) we detect 3 Canis lupus reads. While we identify Canis lupus reads in samples from both the helminth and the sediment library preparation batch, this could indicate that Canis lupus reads at Sandnes represent a contamination. Hence, Canis reads from Sandnes were marked as potential contamination and omitted in Figure 2. Reassuringly, no DNA from marine mammals where detected in any of the control reactions.
Apart from the background contamination, a low level of false positives is observed from close phylogenetic relatives to the expected species, such as the identification of Phoca largha and Phoca fasciata in libraries with high concentrations of harp seal DNA or the identification of Pusa sibirica and Pusa caspica in libraries with high concentrations of ringed seal DNA (Supplementary table 2). Such false positives are presumed to be an effect of DNA damage or sequencing errors causing a read to resemble a close relative to the true species. This hypothesis was tested by trimming all reads from each end with two trim sizes: 2bp and 5bp. Using these trimming settings we detect a reduction in reads assigned to the 4 expected false positives Phoca largha, Pusa sibirica, Pusa caspica and Phoca fasciata from 68 reads in total to 44 and 24 reads in total for 2bp and 5bp trimming from each end, respectively. This suggests that a large fraction of these assignments can be explained by 5' C to T misincorporations. However, with these trimming settings we loose valuable data, as a fraction of the asigned reads -7.1% and 27.2% of all vertebrate reads for 2bp and 5bp, respectively -drops below the size threshold. Based on these results, we have decided to retain as much data as possible by not applying any trimming.
In addition to false positives from close phylogenetic relatives, we detect false positives presumed to be a result of closest match assignments where the true species of origin is absent from the database. This is illustrated by the identification of Larus dominicanus at Qajaa. Both Larus hyperboreus and Larus glaucoides have previously been identified at Qajaa 13 , however, these species are absent from the mitochondrial database. Accordingly, the identification of Larus dominicanus is assumed to represent a closest match assignment (Supplementary table 3). Lastly, we detect DNA from Turdus merula in allmost all groups. These reads map exclusively to a 74bp region of the Turdus merula mitochondrial genome, which has high sequence similarity to bacterial DNA. Hence this identification most likely represents a false positive.
In the analysis of plant content in the sediments, we identify DNA from the family Juglandaceae which is not native to Greenland. Considering the low resolution in the taxonomic identification of plants and the co-occurrence of DNA from Juglandaceae and Betluceae, these reads could very well represent authentic DNA from the birch family.
Another potential bias comes from the higher coverage of GC-rich areas obtained from next generation sequencing data. To test whether this bias had any effect on the species identified here, we compared the GC content of each sequence in the mitochondrial database with the GC-content of the mitochondrial sequences from the species identified by the LCA algorithm. With a mean GC-content of 38.1% (SD: 8.5%) in the full mitochondrial database and 37.9% (SD: 8.1%) among the mitochondrial genomes from species identified here, we conclude that the GC bias of next-generation sequencing did not affect our results. Lastly, a bias could be introduced from DNA leaching between sediment layers. However, the absence of mammal DNA in the peat samples confirms that DNA leaching is not a problem in this experiment. Even though a substantial amount of vertebrate reads are identified in the Dorset layer at Qajaa, no signal from vertebrate species are detected in the peat layer below.

Supplementary Note 2 -Authenticity of results
While a few studies have been published 1-4 , the use of shotgun sequencing to characterize compositions of higher eukaryotes remains largely untested, and thus results should be scrutinized to confirm data authenticity. Several lines of evidence point towards the validity of the data presented in this study. First, strict precautions were taken, both at the experimental and analytical stages of the study to ensure data authenticity. Sampling were carried out wearing gloves and facemask, and sample extraction and library building were carried out in dedicated ancient DNA facilities following strict aDNA guidelines 56 . At the analytical stages of the study, data authenticity is ensured by the use of a database containing all available mitochondrial genomes within Metazoa, as opposed to a small, curated database containing only species expected to be present in a given sample 7 . Furthermore, instead of maximizing the output by mapping to all sequencing data available 1 , we limit the database to contain only mitochondrial DNA, assuring that each species in the database is represented by similar quantities of DNA data. Lastly, we apply strict filtering to remove duplicate reads and DNA of low complexity or poor quality (see Methods). As a result, we are able to reliably identify the majority of mammal species present in the ancient refuse, even though no a priori assumptions of the expected findings were made (see Fig. 3). A second factor confirming the authenticity of the data is the presence of the unambiguous DNA damage patterns associated with ancient DNA, presented in Fig. 2 and Supplementary Figures 3-5. Illustrating the time dependent nature of DNA damage accumulation, these post-mortem modifications are more pronounced in the oldest Saqqaq layers, while they are entirely absent in the young deposits at Sandnes and Fladstrand 8 . Furthermore, read coverage across the entire mitochondrial genomes of harp seal, bowhead whale and Taenia hydatigena, demonstrated in Supplementary Figures 3-5, serves as an additional proof of data validity, since laboratory contamination from PCR fragments are expected to map exclusively to small regions of the reference 9 . Lastly, the congruency of plants identified with trnL metabarcoding, shotgun metagenomics and previous macrofossil and pollen analyses at Sandnes 10 provides compelling evidence for data authenticity (Supplementary Figure 2).

Supplementary Note 3 -Inferring biomass from DNA profiles
The sedaDNA approach applied here provides an excellent means to investigate the taxonomic distribution across a variety of taxa based on a few grams of sediment. As demonstrated in Figure 3, there is a good correlation between the DNA read counts and the expected biomass for harp seal, ringed seal, birds and fox at Qeqertasussuk. However, as discussed in the manuscript, the DNA distribution might not always reflect the biomass of the different species, as, e.g. defacation and urine might inflate the DNA record for domesticate species. Hence, when analysing sedaDNA results, the DNA sources should be carefully considered and, if available, the DNA data should be correlated with osteological evidence. In this study, the identification of hardened blubber oil within the sediment at Qeqertasussuk (Morten Meldgaard, personal communication) together with the absence of associated cetacean bones, suggests that the main source of bowhead whale DNA at Qeqertasussuk is blubber and meat. Alternative sources of DNA from marine mammals in this study could arise from the proccessing and usage of blubber. Blubber from seals or whales were used as fuel in lamps 14 ; If such lamps were emptied onto the midden, the DNA signal from marine mammals could have been inflated. Similarly, the wastewater from boiling of skin and blubber in order to retrieve oil could have been discarded at the midden. However, the contribution of such alternative sources of DNA is unlikely to be significant as the blubber was heated, causing the DNA to be heavily damaged. In summary, based on the presented evidence, it cannot be conclusively shown that the biomass for bowhead whale can be inferred directly from the DNA read counts, as is the case for harp seal. However, it can safely be concluded that the level of bowhead whale exploitation at Qerqertasussuk and Qajaa, by far exceeds what has been estimated from the bone record previously.

Supplementary Methods
DNA Metabarcoding. The trnL p6 loop of plant chloroplasts was amplified using the primers trnL-g (5'-GGGCAATCCTGAGCCAA) and trnL-h (5'-CCATTGAGTCTCTGCACCTATC) described in 7 , each tagged with a unique 6 nucleotide 5' identifier to distinguish sequences from different samples 15 . trnL sequences were generated from 16 extracts from Sandnes, every sample represented by at least one PCR reaction. For samples S2 -3, -4, -5, -6 and -7, trnL amplicons were generated in duplicates. To enhance the PCR reaction, DNA extracts were subjected to a secondary inhibitor removal step with the PowerClean ® Pro Clean-Up Kit (MO-BIO). Depending on the DNA concentration, 1 to 5 μL purified extract was added to each 25μL reaction. PCR amplifications were carried out with 0.2μL Omni Klentaq DNA polymerase (DNA Polymerase Technology, Inc.) for 55 cycles in a reaction mixture with 2.5μL buffer, 12.5 μL PCR Enhancer Cocktail P (DNA Polymerase Technology, Inc.), 10mM dNTP and 1mM of each primer. The following PCR conditions were applied: 94°C for 4 min, 55 cycles of: 94°C for 30 seconds, 57°C for 30 seconds and 68°C for 60 seconds, followed by a final elongation phase at 68°C for 7 minutes. After purification with MinElute columns, PCR products were visualized on an agarose gel (2%) and a 2200 Tapestation (Agilent) using the D1000 screen tape assay. Lastly, PCR products were pooled in equimolar amounts, based on DNA concentrations measured on a Qubit fluorometer (Life Technologies) and prepared for sequencing using the NEBNext DNA Library Prep Master Mix for 454 (E6070) as described in Methods.
Sequence Analysis of Plant Barcodes. Data from trnL amplicons were demultiplexed and trimmed with Novobarcode and AdapterRemoval as described in Methods, only retaining collapsed reads. Followingly, amplicon reads were assigned to their corresponding samples based on primer tags using the ngsfilter program from the OBITOOLS package (http://www. grenoble.prabi.fr/trac/OBITools). Only sequences with two tags showing a complete match and primers with maximum 2 mismatches were considered. Next, reads were dereplicated with obiuniq and denoised with obigrep, discarding sequences shorter than 10bp or 15 represented by fewer than 2 reads. The obiclean program 16 was then applied to cluster variants of the same sequence as a result of amplification or sequencing errors, linking two sequences if the count of the rare sequence was less than 5% of the count of the abundant sequence. Subsequently, the ecoTag 17 program was applied to assign taxa to each amplicon, using a custom made database of trnL sequences from plant species of Greenland described in Bocher 18 . Finally, taxa represented by less than five reads or assigned at a taxonomic resolution above family level were discarded.
Phylogenetic analyses and NMDS plots. Consensus sequences were called using ANGSD 19 (0.911) and phylogenetic trees were constructed with MrBayes (3.2.6) 20 , based on best substitution models identified by jModelTest (2.1.7) 21 : GTR+I+G for whale and helminth phylogeny and GTR+G for harp seal phylogeny. For each tree, four runs of four MCMC chains were run for 5,000,000 iterations sampling every 1,000 generations. Majority rule consensus trees were constructed using a burnin of 25% ( Variations within vertebrate and plant taxa were visualized with non-metric multidimenstional scaling (NMDS) plots based on hellinger transformed bray-curtis distance calculations, using the R packages vegan (https://cran.rproject.org/web/packages/vegan/index.html) and MASS (https://cran.rproject.org/web/packages/MASS/index.html). NMDS plots were based on reads assigned within Vertebrata and Viridiplantae, respectively. For vertebrates, plots were generated from read counts of the 42 vertebrate taxa identified in Supplementary Table 2 and 3. Samples from peat layers were excluded, since no vertebrates reads were identified in these libraries. For plants, the plotting is based on plant families represented by more than 50 reads across the data set.