Genomic sex identification of ancient pinnipeds using the dog genome

Determining the proportion of males and females in zooarchaeological assemblages can be used to reconstruct the diversity and severity of past anthropogenic impacts on animal populations, and can also provide valuable biological insights into past animal life-histories, behaviour and demography, including the effects of environmental change. However, such inferences have often not been possible due to the fragmented nature of the zooarchaeological record and a lack of clear diagnostic skeletal markers. In this study, we test whether the dog (Canis lupus familiaris) nuclear genome is suitable for genetic sex identification in pinnipeds. We initially tested 72 contemporary ringed seal (Pusa hispida) genomes with known sex, using the proportion of X chromosome DNA reads to chromosome 1 DNA reads (i.e. chrX/chr1-ratio) to distinguish males from females. This method was found to be highly reliable, with the ratios clustering in two clearly distinguishable sex groups, allowing 69 of the 72 individuals to be correctly identified according to sex. Secondly, to determine the lower limit of DNA reads required for this method, a subset of the ringed seal genome data was randomly down-sampled. We found a lower threshold of as few as 5000 mapped DNA sequence reads required for reliable sex identification. Finally, applying this standard, sex identification was successfully carried out on a broad set of ancient pinniped samples, including walruses (Odobenus rosmarus), grey seals (Halichoerus grypus) and harp seals (Pagophilus groenlandicus). All three species showed clearly distinct male and female chrX/chr1 ratio groups, providing sex identification of 42–98% of the samples, depending on species and sample quality. The approach described in this study should aid in untangling the putative effects of human activities and environmental change on populations of pinnipeds and other animal species.


Introduction
Sex determination of ancient zooarchaeological bones provides a valuable source of information for understanding anthropogenic impacts on animals, notably hunting through the archaeological and cultural aspects of prey availability, as well as hunting strategies and preferences (Weinstock, 2000(Weinstock, , 2002Gotfredsen and Møbjerg, 2004;Magnell, 2005). Furthermore, sex identification can contribute to essential biological insights into the ecology, behaviour, demography and life history of past animal populations including effects of human activities and environmental changes (Allentoft et al., 2010;Pečnerová et al., 2017). Such effects have been illustrated in contemporary animal populations (Taylor et al., 2008;Marealle et al., 2010), however they remain comparatively less common in ancient fauna studies.
In a contemporary context, animals are typically sex determined by external or internal sex-specific morphology, using the presence of characters such as antlers or tusks, examination of reproductive organs or differences in bone morphology. However, morphological sex identification of archaeological material is not always possible, as soft structures are lacking and only few types of bone elements show reliable indicators of sex. Some specific bone characteristics which do allow morphological sex identification include suid canines (Mayer and Brisbin, 1988), ungulate horn cores and the innominate bones (Hatting, 1995;Greenfield, 2006). Alternatively, certain bone measurements can assign sex based upon sex-specific size categories (osteometric sex identification), as has been demonstrated for both domestic and wild animals (e.g. Bartosiewicz et al., 1997;Weinstock, 2002). In pinnipeds (seals, sea lions, fur seals and walruses), osteometric sex identification is only possible for certain taxonomic groups with pronounced osteological sexual dimorphism (for example otariids). Walruses can have osteometric sex identification based upon measurements of the mandibles as established by Wiig et al. (2007). Unfortunately, zooarchaeological remains from groups such as phocid seals with only limited or no sexual dimorphism (King, 1983;Fay, 1985), do usually not allow osteometric sex identification. However, for any animal, osteometric sex identification is only possible for fully grown adults, thus excluding pups and juveniles. Furthermore, anthropogenic fragmentation prior to deposition and secondary diagenesis due to the various taphonomic processes that the skeletal remains undergo after deposition (Lyman, 1994), might alter the few morphologically sex identifiable bone parts to such a degree that sex identification is no longer reliable. These limitations make it important to seek alternative methods to investigate sex ratios of zooarchaeological material.
One promising alternative to morphological sex identification is the use of genetic sexing analyses. These approaches most commonly make use of differences in sex chromosome composition and sex-specific genes. For instance, PCR primers have been used for several decades to target specific chromosomal regions such as the zinc-finger protein domain (Aasen and Medrano, 1990;Berube and Palsbøll, 1996;Morin et al., 2005;Curtis et al., 2007;Svensson et al., 2008) and the amelogenin gene (Akane et al., 1991;Faerman et al., 1995;Stone et al., 1996) to distinguish sex chromosomes and hence sex of the individuals. Over recent decades, the advent of high-throughput sequencing methods have allowed alternative genetic sexing approaches that go beyond targeting a short specific DNA region (Skoglund et al., 2013;Park et al., 2015;Pečnerová et al., 2017;Bro-Jørgensen et al., 2018;Ebenesersdóttir et al., 2018;Nistelberger et al., 2019). Advantages of high-throughput sequencing over PCR-based approaches include a higher success rate on highly fragmented ancient DNA, and that data can be used concurrently for both population genetic inference and sex identification. Specifically, as high-throughput shotgun sequencing does not target any specific regions of the genome it generates a raw data set roughly representative of the genome components of the sampled individuals. As reads will be obtained at random from both the autosomal and sex chromosomes, it is possible to identify the sex of an individual based on the relative quantity of DNA reads mapped to the X chromosome (in mammals), provided sufficient DNA sequence reads are available. Since mammalian males, in contrast to mammalian females, carry only one X chromosome, male individuals will have a relative representation of X chromosome reads about half that of females. For both males and females, the autosomal (non-sex) chromosomes are represented by two copies in the genome and are therefore roughly represented by reads in proportion to their chromosome size. Therefore, the relative quantity of DNA reads representing the X chromosome in a sample can be compared to the autosomal chromosomes to reveal an individual's sex. However, while high-throughput sequencing of ancient DNA is becoming increasingly common, the sexing method also requires the availability of an annotated nuclear genome for DNA reads to be mapped against. This is currently not available for all mammalian species or groups, such as pinnipeds, necessitating the use of genomes from related species.
In this study, we present a comparative sexing method based on the relative read representation of chromosome X for use with shot-gun sequencing data using the annotated dog reference genome to identify the sex of a set of ancient pinnipeds consisting of walrus, grey seal and harp seal. A data set of contemporary ringed seals with known sex was used to test the accuracy of the method.

Materials and methods
2.1. Sampling, DNA extraction and sequencing 2.1.1. Contemporary ringed seal samples Samples from contemporary Arctic ringed seals were obtained from research monitoring programmes and Inuit subsistence hunts in Greenland and Canada. All sampled individuals were age and sex determined in the field. DNA extractions were carried out using Thermo Scientific KingFisher Duo Prime, the KingFisher Cell and Tissue DNA Kit (Germany) and the KingFisher Duo Combi Pack for 96 DW Plate. Pairedend 150 bp libraries were built using the method described by Carøe et al. (2018), and sequenced using Illumina HiSeq 4000 platform.

Ancient pinniped samples
The samples obtained for ancient DNA analysis included archaeological and historical bone and teeth from walrus, grey seal and harp seal. Samples were chosen to represent a broad range of geographic regions and time periods, regardless of their size and level of degradation. Ancient DNA extraction and sequencing was conducted using a range of laboratory methods to test the approach's applicability across various methodologies and datasets.
The ancient grey seal and harp seal samples were extracted using a lysis buffer consisting of EDTA (0.5M, pH8), Triton X and Proteinase K (100 mg/mL). Extracts were concentrated using Amicon Ultra Centrifugal Filters (Sigma-Aldrich, Darmstadt, Germany) and eluted in MinElute-spin columns (Qiagen, Hilden, Germany). Libraries were built using the method described by Meyer and Kircher (2010). All steps were carried out in the Clean Laboratory at the Archaeological Research Laboratory, Stockholm University, Sweden. The DNA content of libraries was tested on a 2100 Bioanalyzer using High Sensitivity Kit (Agilent Technologies), based on which samples with too low DNA content were excluded prior to sequencing. Following size selection by Ampure beads the samples were pooled and sequenced at SciLifeLab Stockholm, Sweden. A total of 14 harp seal samples were sequenced on Illumina HiSeq X, one was sequenced on Illumina HiSeq2500 and the remaining 64 harp seal samples were sequenced on NovaSeq S1. Fifty-eight of the grey seal samples were sequenced on Illumina HiSeq2500 and one on NovaSeq S1. Further details about the grey seal samples and the laboratory work is found in Ahlgren et al. (n.d.).
The walrus samples were extracted using a lysis buffer consisting of EDTA (0.5M, pH8), Urea (1M) and Proteinase K (10 mg/μL), and eluted using Zymo-spin reservoirs (Zymo Research, CA, USA) combined with MinElute-spin columns (Qiagen, Hilden, Germany) (Dabney et al., 2013). The DNA content of extracts was quantified on High Sensitivity TapeStation (Agilent Technologies). Once again, samples with insufficient DNA yield were excluded prior to sequencing. Libraries were built using the method described by Carøe et al. (2018), as detailed in Keighley et al. (2019). All steps were carried out in the ancient DNA laboratory at the University of Copenhagen's Globe Institute, Denmark.

Data analyses
As there are currently no published pinniped genomes with annotated sex chromosomes, we used the dog (Canis lupus familiaris) genome, publicly available in NCBI GenBank (CanFam3.1, GCA000002285.2), as the template for assigning sex to our pinniped samples. This dog genome consists of 38 autosomal chromosomes, the X chromosome and the mitochondrial genome. To quantify the relative quantity of DNA reads representing the X chromosome, a ratio based on the number of reads mapped to chromosomes 1 and X was chosen, given their similarity in size (112.68 Mb and 123.87 Mb respectively).
Information on number of reads (hits; duplicates excluded) per chromosome were extracted for each sample from the coverage files and chrX/chr1 ratios were calculated by dividing the number of reads that mapped to chromosome X with the number of reads that mapped to chromosome 1. The total number of reads that mapped to the dog genome was used to assess the reliability of the chrX/chr1-based sex identification for each sample.
Further, as ancient DNA analyses often are characterised by a low yield of endogenous DNA, we explored the minimum amount of DNA sequence data required to accurately determine the sex of a sample. To this end the genomic data from ten male and ten female contemporary ringed seal samples were randomly sub sampled five times down to 8000, 6000, 5000, 4000, 3000 and 2000 DNA reads, giving a total of 50 observations within each sex group for every of the six read number groups. ChrX/chr1 ratios were calculated for all observations and compared between the groups.

Method verification on contemporary ringed seal material
As expected, the genomic data from contemporary ringed seal reference samples showed an approximately positive linear correlation between the number of mapped reads and chromosome size (Fig. 1), except for chromosome X in males. By estimating the ratio chrX/chr1 between the number of reads mapping to the X chromosome and the number of reads mapping to chromosome 1, the sex of 69 (95.8%) out of the 72 contemporary ringed seal samples was correctly determined, suggesting a very high accuracy of our method. The three incorrectly identified samples are most likely due to incorrect sex determination in the field or reporting by sample collectors, but could also represent sample mix-up during handling or laboratory work, or less likely some bias in the genetic sex determination. Sex-chromosomal imbalance (e.g. XXY aneuploidy), as recently attested in cetaceans by Einfeldt et al. (2019), could lead to misidentification of sex using X chromosome quantification. However, with a frequency of only 0.2% in the human male population (Visootsak and Graham, 2006), and no studies so far proving its occurrence in pinnipeds, we find it unlikely that XXY aneuploidy should pose a major limitation to using this sex identification method.
In the down-sampled contemporary ringed seal genome data, there was a clear negative trend between total DNA read number and standard deviation of estimated chrX/chr1 ratios (Fig. 2, Table S1). Random variation in the down-sampling was found to have a more pronounced impact on the chrX/chr1 ratios when the read number was small. Additionally, the distribution of the chrX/chr1 values was larger in the lower read groups with minor overlap between the male and female groups, while the male and female groups become more defined and distinct from each other when a greater number of DNA reads are included. Based on these findings, approximately 5000 endogenous DNA sequence reads, at a size of roughly 150 bp, was deemed sufficient to reliably determine the sex of a sample. This is a conservative threshold, and a lower one could be applied if accepting larger uncertainties in the sex classifications (Fig. 2), or if combining the genomic sex determination with other methods, e.g. morphology.

Sex identification of ancient pinnipeds
DNA was extracted and sequenced from a total of 296 ancient pinniped samples (Table 1). Of these, 41 of the 158 walrus samples were excluded from further sex identification analysis due to insufficient DNA yield with less than 100 DNA reads per sample mapping to the dog genome. For each of the remaining 255 samples (117 walruses, 59 grey seals and 79 harp seals), the chrX/chr1 ratio could be successfully calculated, and the results evaluated based on the threshold defined for contemporary ringed seals (minimum 5000 total number of reads mapped to the dog genome) (Fig. 3, Table S2). When applying this Fig. 1. An example of a contemporary ringed seal male individual showing an approximate linear relationship between number of reads mapping to a certain autosomal chromosome (blue dots) and the size of that chromosome. Only being found in one copy in males, the chromosome X is here represented by approximately half as many mapped reads as expected by the chromosome size alone. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.) threshold, no critical outliers were found in the putative male and female chrX/chr1 ratio distributions.
In conclusion, across the total 296 ancient samples, 165 individuals were identified to sex following our genetic approach, giving a total sex identification success of 56% across all three pinniped species. The success rate varied among species, likely reflecting variation in sample preservation states. Specifically, a little less than 50% of the total number of samples was possible to identify to sex in harp seals and walrus, whereas grey seal samples had almost a 100% success rate due to extraordinary good preservation and high yield of DNA reads from almost all the samples (Table 1). The success rate in walrus and harp seals, could be increased to over 50% when considering samples that yielded more than 3000 endogenous reads, as an additional 7 harp seal and 10 walrus samples could also be sex determined, but the uncertainty in these classifications is larger (Fig. 3).   Fig. 2. The distribution of the chrX/chr1 ratios for each of the down sampling tests on contemporary ringed seal genomes, as well as for the total read number of the full sample set are shown in the order of decreasing read number. In the total dataset, females (red) and males (blue) can be distinguished based on their chrX/chr1 ratio, whereas these ratios start to overlap for datasets with low read numbers, making sex identification difficult. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

Implications for understanding human hunting practice and pinniped biology
Sex ratios in zooarchaeological assemblages do not necessarily reflect the sex ratio of the ancient populations themselves, but could also reflect various ecological and anthropological parameters. These could include the extent of seasonal and regional prey availability or accessibility, the specific ecology and breeding patterns of the prey, the cultural context, hunting methods and subsistence of human hunters (Grønnow et al., 1983;Gotfredsen and Møbjerg 2004;Glykou, 2014). For human hunters with a relatively opportunistic subsistence strategy, the sex ratios of the hunted prey are likely to be fairly representative of wild populations (Rivals et al., 2004). However, a substantially different sex ratio would be expected should hunters have preferentially chosen one sex over the other. A preference for one sex over another may have arisen from a number of reasons, including prey availability. For example, hunting targeting female harp seals during the breeding season would have been comparatively easy given their vulnerability and ease of access. Hunting during the breeding season may also have allowed pups to be caught alongside females together, as suggested for the hunter-gatherers who exploited breeding colonies of harp seals in the Baltic Sea during the Mesolithic. However, this pattern would not be expected to be the case for harp seals outside the breeding areas or season (Glykou, 2014). In this study, we found close to an equal proportion of males and females for grey seal and harp seal zooarchaeological samples, suggesting an opportunistic hunting strategy. In the case of the walrus, nearly 75% of the samples were identified as male individuals. Although our samples represent a large geographical region and a broad time scale, it is likely that this overrepresentation of males reflect some overall degree of hunting preference. Prehistoric hunting with a focus on achieving large amounts of ivory, meat or hide (Pierce, 2009;Frei et al., 2015;Star et al., 2018;Keighley et al., 2019) might have primarily targeted larger, older males, with typically longer and thicker tusks as well as larger body size (Kastelein, 2009).
Determining whether selective hunting did occur can offer insights into hunting strategies and cultural context, however it also has important biological implications. The sex-biased selective hunting of mammals by humans can potentially have severe effects on species biology, demography and life-history. For example, heavily skewed hunting can trigger or exaggerate existing declines in effective population size and increase the likelihood of inbreeding depression in smaller populations. This has the potential to lead to local or complete extinction when combined with the effects of demographic and environmental stochasticity (O'Grady et al., 2006). In pinnipeds and other species with complete or partial polygynous mating strategies (Stirling, 1983), genetic and biological effects of selective hunting would be most pronounced when the bias is towards females. However, it must be noted, that ancient and historic hunting need not be sex-biased to have substantial effects on pinniped abundance and distribution (Fietz et al., 2016;Olsen et al., 2018;Keighley et al., 2019).
Ultimately, while selective hunting practices might explain the observed skewed sex ratio for walruses, such patterns and their putative effects on species biology, demography and life-history should be further examined within each specific regional and cultural context before drawing any conclusions. Additionally, taphonomic biases should be taken into account, as larger and more robust bones of males from sexually dimorphic species might have been more likely to survive in situ to be recovered during archaeological excavation. It is therefore important to consider the excavation methods and sampling process and to include bones of varying preservation or size when inferring hunting practice and animal biology from sex distribution in zooarchaeological material, although this may come at a cost of selecting samples with suboptimal DNA preservation.

Using distant related reference genomes for sex identification
Is the dog genome a proper reference for pinnipeds? Canids and pinnipeds both belong to the order of carnivore, but they fall into different families within the caniformia suborder (Delisle and Strobeck, 2005). Most pinnipeds, including walrus, grey seal, harp seal and ringed seal, have 32 autosomal chromosomes (Arnason, 1974), while the dog genome has 38. To investigate the effect of this variation in karyotype and the genetic dissimilarity we calculated both the theoretical chrX/chr1 values for dogs and the actual, observed values for pinnipeds. In dogs, the size of chromosome X divided by chromosome 1 (123.87 Mb divided by 122.68 Mb), results in a theoretical median value of approximately 1.010 in females and 0.505 in males. However, this is unlikely to be the case for pinnipeds. Firstly, the size of both chromosome 1 and X are likely to differ between dogs and pinnipeds, resulting in a different number of reads that will map to either chromosome. Variation in chrX/chr1 values between different pinniped species could also emerge from differences in chromosome size. Secondly, due to their separate evolutionary histories since the divergence of canid and pinniped ancestors some 40-50 million years ago, multiple coding and non-coding regions of chromosome 1 and X (and all other chromosomes) also likely differ to smaller and bigger extend between dog and pinnipeds, and among the different pinnipeds. This implies that not all reads from e.g. grey seal chromosome 1 will map to the equivalent chromosome in dog, and the number of reads mapping from the different pinniped species will differ as well. This study shows that despite the potential limitations of using a distant relative, it is possible to obtain valid sex identification using the dog reference genome to quantify the relative representation of X chromosome DNA reads in pinnipeds, as long as sufficient total read numbers of approximately 5000 are met.
The approach outlined in this study that allows sex identification based upon a distantly related reference genome has far-reaching Fig. 3. Sex identification of ancient pinniped samples by estimation of chrX/ chr1 ratios. Samples with more than 5000 reads (green) or 3000-5000 reads (yellow) could easily be classified, whereas samples with lower DNA sequence yields were associated with larger uncertainty. A total of 5000 reads is presented as the limit for reliable sex identification. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.) applications, as many species still lack annotated genomes. This is particularly true for many wild animals that do not yet have fully annotated genomes which includes one or more sex chromosomes. Reference genomes from for example domestic animals, more commonly available, could therefore help elucidate the impacts of prehistoric and contemporary anthropogenic activities, as well as fundamental species biology, demographics and life-history of a wide range of taxa. It is probable that the dog genome could also enable sex identification for other caniformia species, such as bears (Ursidae). For example, the house mouse (Mus musculus) and brown rat (Rattus norvegicus) genomes could potentially be used for sex identification of other rodents, including beavers (Castoridae). The chicken (Gallus gallus) genome and turkey (Meleagris gallopavo) genome include the sex chromosomes W and Z (in birds, the female is the heterogametic sex) and could therefore be tested in their usability to identify the sex of wild Galliformes species, which appear in archaeological contexts as game fowl (e.g. Boev, 1997). Likewise, the vaquita genome (Phocoena sinus) from the Parvorder Odontoceti could be used for sex identification of other cetaceans, and the red deer (Cervus elaphus) genome could be used for sex identification of other species in the Cervidae family such as moose (Alces alces), reindeer (Rangifer tarandus), fallow deer (Dama dama) and roe deer (Capreolus capreolus), and might even be applicable to giraffe and okapi (Giraffidae), as they all belong to the Artiodactyla order. In theory, the amount of homologous regions that enable mapping of a sample to a reference genome will decrease with larger evolutionary distance between sample and reference, and thus based on fewer regions, there will be an increasing risk of biased sex identification. Expanding the use of this method by choosing a reference genome beyond suborder-level (or equivalent) might therefore require additional testing using modern samples with known sex. However, if samples with sufficiently high endogenous content fall into two clearly defined groups based on their relative X chromosome representation, this study, exemplified by pinnipeds, have demonstrated that such two groups are highly likely to represent males and females. Though since the actual size of the chromosomes may no longer be representative of the amount of homologous regions shared between distantly related species, it might be necessary to test the X chromosome representation against different autosomal chromosomes or against an average of the autosomal coverage in order to detect a clear division between the male and female samples.
For the already mentioned species, and many more, our study therefore demonstrates how a reference genome from a distantly-related species can enable sex identification for other species which do not yet have their own annotated reference genome.

Conclusion
By quantifying the number of mapped reads aligned to representative sex chromosomes and autosomes, this study illustrated how genomic sex identification of pinnipeds is possible using a dog reference genome. Based on the down-sampling of contemporary ringed seal genomes with known sex, our results suggest that a minimum of 5000 mapped reads is required to ensure that samples are correctly identified by sex. When read numbers are lower there is a substantial overlap between the male and the female chrX/chr1 ratio distributions. However, lower thresholds might be applied if one accepts larger uncertainties in the sex determination. The sex identification success rates among the ancient pinniped species were between 41.8% and 98.3%, likely reflecting varying sample preservation states. Across geographic regions and time periods, harp seal and grey seal showed a more or less equal sex ratio, while about 75% of the walrus samples were identified as males. The approach described here should aid future studies in untangling the putative effects of ancient hunting practice and preference on past pinniped populations, as well as a broad range of other species groups and archaeological contexts.

Author contributions
MHBJ, XK, KL, and MTO conceived the study; ARA, RD and SHF provided funding, samples and sex identification of contemporary ringed seals; AG and KL identified and provided ancient seal samples; ABG, MTO, XK and PJ identified and provided the ancient walrus samples; HA, XK, CHSO, and MHBJ carried out molecular laboratory work; MHBJ performed the data analyses; MHBJ and MTO drafted the manuscript; All authors read, commented on and approved of the final version.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.