Comparing ancient DNA survival and proteome content in 69 archaeological cattle tooth and bone samples from multiple European sites.

Ancient DNA (aDNA) is the most informative biomolecule extracted from skeletal remains at archaeological sites, but its survival is unpredictable and its extraction and analysis is time consuming, expensive and often fails. Several proposed methods for better understanding aDNA survival are based upon the characterisation of some aspect of protein survival, but these are typically non-specific; proteomic analyses may offer an attractive method for understanding preservation processes. In this study, in-depth proteomic (LC-Orbitrap-MS/MS) analyses were carried out on 69 archaeological bovine bone and dentine samples from multiple European archaeological sites and compared with mitochondrial aDNA and amino acid racemisation (AAR) data. Comparisons of these data, including estimations of the relative abundances for seven selected non-collagenous proteins, indicate that the survival of aDNA in bone or dentine may correlate with the survival of some proteins, and that proteome complexity is a more useful predictor of aDNA survival than protein abundance or AAR. The lack of a strong correlation between the recovery of aDNA and the proteome abundance may indicate that the survival of aDNA is more closely linked to its ability to associate with bone hydroxyapatite crystals rather than to associate with proteins.


SIGNIFICANCE
Ancient biomolecule survival remains poorly understood, even with great advancements in 'omics' technologies, both in genomics and proteomics. This study investigates the survival of ancient DNA in relation to that of proteins, taking into account proteome complexity and the relative protein abundances to improve our understanding of survival mechanisms. The results show that although protein abundance is not necessarily directly related to aDNA survival, proteome complexity appears to be.


Introduction
In the last two decades the field of biomolecular archaeology has rapidly expanded, and nowadays genomics and proteomics as well as other omics techniques are frequently applied to archaeology studies [1,2]. Analysis of ancient DNA (aDNA) is increasingly used worldwide, partly due to the advent of Next Generation Sequencing (NGS) technologies that have overcome some of the former limitations of aDNA (such as fragmentation, low copy number, damage and contamination). There are now numerous detailed and rigorous experimental and bioinformatic guidelines [3][4][5][6][7] set out for the successful extraction, sequencing and analysis of aDNA from a wide range of geographic and environmental areas, with recovery of aDNA as far back as 700k years in permafrost [8] and ca. 500 K in non-permafrost environments [7,9,10]. However, despite the recent technological revolution of NGS, the reality is that the routine use of these technologies is limited to only a few laboratories in the world, given the high costs of these procedures and the difficulties handling massive amounts of sequence data. There are therefore still a large number of aDNA studies that do not use NGS [11,12], and so improving our understanding of aDNA survival, as well as the development of a reliable screening method to determine the presence of aDNA in a sample would be a powerful tool, allowing only those bones most likely to contain aDNA to be selected for further analysis. Improved understanding in ancient biomolecule survival and improved screening techniques would be useful in preventing unnecessary sample destruction, minimising resource wastage and increasing the chances of successful extractions. Furthermore, understanding the molecular taphonomy of ancient bones and teeth through biomolecules (and particularly proteins given their role in bone structure) will build our knowledge of how degradation/preservation of organic material occurs within the archaeological record. Here we present for the first time a direct comparison of the ancient proteome and aDNA in bones from four different environments in a total of 69 ancient cattle samples as a means to investigate whether ancient protein survival is a useful biomarker for aDNA preservation. One of the earliest and simplest proposed methods of evaluating archaeological bone for its preservation, to the extent that it could screen for the presence of aDNA, involved detailed examination of the histology of the sample [13]. After the death and burial of an organism, its tissues undergo a number of diagenetic processes, e.g. the removal of collagen fibrils by saprophytic bacteria, (see Smith et al. [14] and Nielsen-Marsh et al. [15] for reviews) which can affect the microstructure of the bone [16,17]. The histological integrity of archaeological bone can be measured using BSE-SEM (back-scattered electron microscopy) and generally good bone preservation is correlated with highly ordered bone histology with well-defined physical features [18,19]. Good structural preservation is therefore correlated with improved biomolecule survival, with several studies indicating that better preserved bones are more likely to contain amplifiable aDNA [13] and other biomolecules [20,21]. Assessment of bone histology has also been combined with measurement of the total nitrogen content of the sample (as a proxy for total protein content and therefore an indication of biomolecule preservation), where bones with good histological preservation and high nitrogen content are considered most likely to have good biomolecule preservation [22,23]. However, histological structure is not inextricably linked with good biomolecule preservation, with some specimens showing poor histological preservations that were successful for aDNA analyses [13].

Amino acid racemisation
Amino acids with a single chiral centre exist in one of two isomeric formsthe D-enantiomer and the L-enantiomer. In life amino acids exist exclusively in the L-form, but after the death of the organism they begin to racemise to the D-form, to ultimately form an equilibrium of L-and D-isomers in roughly equally proportions, known as a racemic mixture. The rate of amino acid racemisation (AAR) is determined predominantly by time and temperature [24], but also by environmental factors. The racemisation of aspartic acid (Asp) has been proposed as a screening method for the presence of aDNA ( [21], but see also [37]), measured as the Asx DL ratio as it also includes the extent of asparagine hydrolysis. AAR values were suggested as a good indicator of the likelihood of DNA survival as the kinetics of racemisation appeared to mirror the rates of DNA depurination [25][26][27], which is thought to be the major limiting factor in aDNA survival.

Proteomic analyses of ancient bone
Early investigations into biomolecule survival in archaeological remains indicated that some small bone proteins, such as osteocalcin (OC), may have had much greater survival rates than DNA in ancient bone due to their functionally important mineral-binding properties [28]. OC was particularly ideal for analysis because it was easy to isolate through application directly to a solid phase extraction cartridge [29] and was therefore potentially useful as a biomarker for the presence of aDNA. However, analysis of intact OC in archaeological bone using 'top-down' proteomic methods showed this not to be the case, with fewer samples containing intact OC than were successful for aDNA [30].
In recent years, several studies have been carried out on archaeological bone using bottom-up proteomic techniques [31,32], which have proved effective even at the low protein levels present in ancient bone. The advantage over top-down approaches is that it is possible to evaluate the presence of numerous non-collagenous bone proteins (NCPs) within a single analysis. The sensitivity, high success rate and relatively low running costs of such analyses in comparison to aDNA analysis means that proteomic analysis of archaeological remains is becoming an attractive tool for the characterisation and/or species identification of complex tissues [33,34]. However, they may also yield insights into aDNA survival and potentially offer an alternate screening method for bones most likely to contain aDNA. In this study, the success or failure of aDNA extractions from the same series of archaeological cattle bones and teeth was compared with their corresponding Asx DL values and bone proteomes in an attempt to identify potential protein biomarkers for the presence of aDNA and to improve our understanding of aDNA survival.

Materials and methods
We expand upon the findings presented in Buckley et al. [30] wherein the amino acid racemisation values, amino acid concentrations and the success or failure of aDNA and OC extractions were reported for 34 archaeological cattle bones. Herein, a total of 69 tooth and bone samples from archaeological cattle (including some of the 34 from [30]) were analysed by in-depth proteomic analysis, all of which had previously undergone aDNA extraction and analysis carried out by Anderung [35]. Of these 69 samples, 36 were teeth and 33 were bones (see Supplementary Table S1 for skeletal information). Samples were recovered from several different sites with different mean annual temperatures (MAT) and effective burial temperatures (T eff ): the Bronze Age site Asine (15 samples) and the preclassical site Lerna (four samples) both in Greece (T eff 18°C;~2000-1500 BCE for Asine and~500-800 BCE for Lerna); Zauschwitz (10 samples) and Dresden-Cotta (three samples) in Saxony, Germany (these 13 samples were grouped together and considered "Saxony"; T eff 12.5°C including a temporal range from 5500 to 600 BCE); and Bronze Age material from El Portalón cave in northern Spain (37 samples; T eff 9.5°C) (Supplementary Table S1). Of the 37 samples from El Portalón, 19 were taken from a collection that had been stored in a museum for several years and 18 were more recently excavated.
Mitochondrial DNA extraction, purification and analysis were carried out by Anderung [35] using hybridisation and magnetic bead separation following a modified method from [36]. Amino acid racemisation analyses were carried out as described in Buckley et al. [30]. Proteomes were obtained from~50 mg bone powder per sample following Wadsworth & Buckley [32]. Peptide masses obtained via LC-MS/MS analysis were searched against the SwissProt database for matches to primary protein sequences using the Mascot search engine. Each search included the fixed carbamidomethyl modification of cysteine and the variable modifications for deamidation, pyroglutamate formation, and oxidation of lysine, proline and methionine residues. Enzyme specificity was limited to trypsin with up to 2 missed cleavages allowed, mass tolerances were set at 5 ppm for the precursor ions and 0.5 Da for the fragment ions and all spectra were considered as having either 2+ or 3+ precursors. Proteome complexity was calculated by manually examining peptide matches from Mascot searches for the relevant genus (Bos or Bison) which have an ion score above 20, and proteins were included in the count only if they had at least three unique high-confidence peptides matches.
Relative abundances were calculated using a label-free quantitation method carried out by Progenesis QI software. Three analyses on this dataset were run on the Progenesis QI software; one comparing the relative abundances of proteins in samples from the four different 'environmental groupings', one comparing protein abundances collectively in tooth samples vs bone samples and the final analysis comparing protein abundances between the museum and recently excavated samples from El Portalón cave site. Principal Component Analysis (PCA) was performed on normalized abundances exported from Progenesis using R Software with the package FactoMineR, and plots were produced using the Python programming language (Python Software Foundation, https://www.python.org/). This facilitated a visual separation according to the variations in abundance of proteins in each sample, and helped to detect clustered data and/or outliers. To clean the data in order to leave out unreliable proteins, we also excluded from the study proteins for which the number of unique peptides was ≤1. We also used a protein score cut-off to remove proteins which score was smaller than the median score of the whole dataset (score = 122).

Results
Of the 69 samples tested, 56 yielded aDNA (31 teeth and 25 bone) and 13 samples failed to yield aDNA (5 teeth and 8 bone). NCPs were successfully identified in 66 samples (samples AS3, AS9 and LE4 yielded no identifiable proteins), with 49 of those samples containing 10 or more NCPs. As there are many different NCPs found in all samples, this study will focus on seven selected proteins: fetuin-A, prothrombin, pigment epithelium-derived factor (PEDF), lumican, chondroadherin (CHAD), secreted phosphoprotein 24 (SPP24) and matrix metalloproteinase-20 (MMP20), which have been commonly identified in ancient bone in other studies [31,32] and which have different biological functions and properties. In particular, three of the proteins of interest are known to be collagen-binding proteins (PEDF [37], lumican [38] and CHAD [39]) and three are mineral-binding proteins (fetuin-A [40], prothrombin [41] and SPP24 [42]); the choice of these specific proteins was also intended to compare their survival related to their specific localisation within the bone tissue (Table 1).

Variations in proteome complexity across different archaeological sites
The samples analysed in this study came from multiple different sites across Europe, which included different site types (e.g. open air, cave) and different MATs and T eff . Collagens alpha 1 (type I) and alpha 2 (type I) (hereafter collagen α1(I) and α2(I) respectively) were identified in every sample as the highest scoring proteins (i.e., combined peptide scores) in Mascot database searches and had high relative abundances according to the Progenesis QI analysis. Four other collagen types were commonly identified but were not as ubiquitous as collagens α1(I) and α2(I), these are collagens α1(II), α1(III) and α1(XI) (in 53, 54, and 34 samples respectively) and collagen α2(XI) (in 66 samples).
When considering the NCPs, proteome complexity varies greatly between sites, and as expected the samples from sites with warmer climates which have a higher MAT and T eff (Lerna and Asine) have poorer proteomes (less complex) than the samples from sites with lower T eff and MATs (Saxony and El Portalón) even though some of the latter were older. The four samples from Lerna had the poorest protein complexities with one sample containing no NCPs (but five collagen types) and the other three samples containing one, four and six NCPs. The 15 samples from Asine had relatively poor protein complexities with ≤10 NCPs identified in all but one sample (which contained 18 NCPs). Samples recovered from Saxony had relatively well preserved proteins, with eight out of 13 samples containing ≥10 NCPs and three of these having over 20 identified NCPs. The 37 samples from El Portalón cave site have the richest proteomes; no sample has b 10 identified NCPs, 12 have between 15 and 20 NCPs in total and 25 have over 20 identified NCPs (up to a maximum of 37 NCPs) (Fig. 1).
Two of the seven proteins of interest (fetuin-A and PEDF) were found in at least one sample from each site, with fetuin-A being identified in 62 samples and PEDF in 55 samples. Prothrombin, lumican and MMP20 are found in at least one sample from each site except Lerna (54, 45 and 32 samples respectively); SPP24 is not found in any samples from Asine or Lerna (but in 37 samples overall; mostly those from El Portalón cave; Table 1).
With regards to the amino acid racemisation data, there was no discernible correlation between proteome complexity and the Asx DL values obtained for each sample in this study. There was a weak correlation between Asx DL values and the MAT and T eff in that five of the samples from El Portalón (the site with the lowest MAT and T eff in the study) have the lowest recorded Asx DL values of 0.09 and one of the samples from Lerna, (the site with the highest T eff in the study), has one of the highest Asx DL values of 0.24 (Fig. 1).
The number of NCPs was correlated with the MAT and T eff of a site, where samples from the colder site (El Portalón) showed the highest proteome complexity, followed by samples from Saxony (with intermediate MAT and T eff ) and finally followed by samples from the warmer sites in Greece. There was also an interesting correlation between the age of different specimens from the same site and the number of NCPs, where the oldest samples from Saxony (DD4, DD11, DD75, DD76 and DD9) showed the lowest proteome complexity within all samples from Saxony (with the only exception for DD29, see Supplementary Table S1) (Fig. 1). However, there seemed to be no clear correlations between Asx DL and the number of NCPs detected (Fig. 1).
Ancient DNA survival in the samples analysed was generally good, with 56 of the 69 samples yielding aDNA. The success or failure of the aDNA analysis appears to be highly dependent on the site the samples were recovered from. The samples from El Portalón cave were the most successful in terms of aDNA extractions, as only one of the 37 samples did not yield any aDNA. Three of the 13 samples failed for aDNA retrieval from Saxony, with six failures out of 15 samples from Asine and Lerna was the worst in terms of preservation, with only one of the four samples being successful for aDNA analysis (Fig. 1). We noticed a correlation between the number of NCPs identified in a sample and the presence of aDNA; in fact, when the number of NCPs was equal or bigger than 15, usually aDNA recovery was successful (only two out of 13 samples with number of NCPs = 17 and 18 failed the aDNA analysis, see Supplementary Table S1) and that when the NCPs number was higher than 18, aDNA analyses were always successful.

Protein abundances across different archaeological sites
Principal component analysis of the relative abundances of each protein matched in each of the 69 samples indicated that not only are bone and tooth dentine proteomes distinct, but that there are signals specific to the archaeological site (Fig. 2), as well as to the approximate levels of degradation and 'thermal ages' of the samples [43]. The majority of NCPs detected in the archaeological samples were at least one order of magnitude lower in abundance compared with the dominant collagen (collagen α2(I)). The only exception (not including albumin due to potential laboratory contamination issues) was fetuin-A, which was the  most abundant protein recovered in all samples apart from those from Lerna, in which prothrombin was the most abundant (Fig. 3). Interestingly, fetuin-A and prothrombin are both mineral-binding proteins, and this suggests a potential better preservation of these proteins within the specimens compared with the collagen-binding proteins. MMP20 was abundant in the El Portalón dentine samples, whereas the dentine samples from the other sites showed low abundance values but when present they were noticeably greater than bone (Fig. 3). Similar to that observed for the proteome complexity and amino acid racemisation levels, the relative abundances do not show any particular correlation with AAR values (Supplementary Fig. S1) where many of the lowest Asx DL had some of the lowest relative abundances for many proteins.

Proteome changes during museum storage
Of the 37 samples from El Portalón cave site, 19 samples had been selected for analysis from a museum collection ('S' samples) and 18 ('M' samples) had been more recently excavated. Ancient DNA preservation was very good in the samples from El Portalón cave with only one sample (S1) from the museum samples failing to yield aDNA. Both the museum and recently excavated samples had similar ranges of proteome complexities ( Fig. 1 and Fig. 2).
The average relative abundances of the seven proteins of interest are also very similar in the museum and recently excavated samples (see Supplementary Fig. S2) although the museum samples have slightly higher relative abundances for chondroadherin, lumican, PEDF, fetuin-A and prothrombin than the recently excavated samples, whereas the opposite is true for MMP20 and SPP24.

Comparison of bone and dentine proteomes
Of the 69 samples analysed in this study, 36 were tooth (dentine) samples and 33 were bone samples; the ranges of proteins recovered from the two types of sample were similar. However, of the seven selected proteins of interest focused on in this study, chondroadherin (thought to be cartilage-specific) appears to be more commonly (though not exclusively) identified in bone samples, and with greater relative abundance than in dentine samples. Interestingly, SPP24 and MMP20 were more commonly identified (and with greater relative abundances) in dentine samples; in particular, MMP20 is identified in 30 of the 36 tooth samples and only in one bone sample that originated from a mandible; this is in accordance with other studies which describe the role of MMP20 in dental enamel formation [44,45] (Table 1 and Fig. 4), where its observation in the one bone sample in this study may be due to contamination during the sampling process.
The success rate for aDNA extractions appears to be marginally higher in the tooth samples than bone samples, as only five of the 36 tooth samples failed to yield aDNA, compared to eight of the 33 bone samples. However, 26  cave which appears to have the best conditions for biomolecular preservation of the sites studied here, so this increased success rate may be more related to the site conditions rather than whether the sample was from a tooth or bone.

Discussion
There are several studies which have suggested that proteins are more likely to have better survival rates than aDNA [46], as DNA is a more labile molecule than most proteins. In this study the samples which were successful for aDNA analysis often had richer proteomes than the samples which were unsuccessful, but there were also some exceptions (there were some samples which either had relatively high NCP counts and failed aDNA analysis or which had no identifiable NCPs but were successful for aDNA analysis).
Despite suggestions that the storage of archaeological skeletal materials in museums may promote biomolecular degradation [47], in this study no significant differences in either the frequency of successful aDNA extractions nor the total NCP counts were identified between the recently excavated and museum-stored samples from El Portalón cave; indeed the one sample from this site which failed the aDNA analysis (S1) had been recently excavated. However, it is important to note that both the museum-stored and more recently excavated samples tested here are exceptionally well preserved, and that more poorly preserved samples may degrade faster in a museum environment and therefore display a greater difference in biomolecule preservation between museum-stored and recently excavated samples.
It appears that protein and aDNA survival is heavily dependent on the MAT and T eff of the site that they were recovered from (Fig. 1); although other factors such as hydrological activity and environmental pH could not be as easily evaluated. However, of the sites analysed here, the samples from sites with a higher T eff (Lerna and Asine, T eff 18°C for both sites) had much poorer proteomes and a higher number of aDNA failures than the samples from sites with lower T eff (Saxony and El Portalón, T eff 12.5°C and 9.5°C respectively). It is therefore likely that aDNA and overall protein survival in ancient bone are relatively independent of one another and that both are more dependent on the site MAT and T eff of the site. The relationship between aDNA and protein survival is not well understood and remains unclarified with this study; this may therefore limit the use of proteomics as a screening method for the presence of aDNA.
Of the proteins identified here, there are no proteins which are present solely in samples containing aDNA or vice versa, and this was essentially due to the presence of two samples which failed aDNA extraction but which had a good number of NCPs (17 and 18). Our previous suggestion that fetuin-A could be a promising source of phylogenetic information in archaeological bone and tooth samples [32] due to its abundance, survival and high level of sequence variability between taxa [48] was supported here; fetuin-A was identified in 58 of the 69 samples tested here, including some of the most poorly preserved samples. Furthermore, we also observed a 2-fold higher relative abundance of fetuin-A in bone and tooth samples successful for aDNA extraction than the second most abundant NCP considered in the study (prothrombin). It is therefore possible that such relative abundances of selected proteins could also be a potential indicator of aDNA extraction success. It should be noted that OC peptides were observed in most samples but excluded from this report (Table S1) because of our stringent requirement for three unique peptides even though in the case of OC, as one of the smallest proteins only approximately 49 amino acids long this would equate to most of the protein.
The relative abundances of the seven selected proteins between samples, which do or do not contain aDNA, are not significantly different and therefore are unlikely to be useful in determining the likelihood of a successful aDNA extraction. Asx DL values also do not appear to correlate directly with whether a sample contains aDNA or not, nor with the number of NCPs a sample contains or the relative abundances of the selected NCPs, although there is some indication that threshold values could be indicative. It is therefore probable that Asx DL values are not wholly representative of the state of NCP or collagen preservation in a sample, but more likely to reflect the amount of soluble collagen retained by the sample, as suggested by Collins et al. [49] and Dobberstein et al. [50]. Amino acid racemisation values were generally relatively high (N 0.08), and although samples with low DLs (b0.12) were highly likely to yield successful aDNA while samples with DL N0.15 did not, there was no correlation with proteome complexity. Interestingly, we were able to extract aDNA from samples with higher than 0.08 AAR values in contrast with previous findings [25].
A reliable screening method would ideally focus on a protein or a group of proteins that are always identified in the ancient bone proteome when the aDNA analysis is successful, and that are missing when the sample is failing aDNA analysis. Unfortunately we were not able to find any proteins matching these criteria, nor did we find any NCPs consistently identified in every sample; for this reason, we were not able to identify protein biomarkers useful to estimate the presence of aDNA in a sample. Although the search for a protein marker failed, we still noticed an interesting correlation between the number of recovered NCPs and the outcome of the aDNA extraction. In particular, the number of NCPs could be for a useful approach to evaluating the presence or absence of aDNA in the specimen where in this case samples with18 or more NCPs were positive for aDNA analysis. However, additional work is Fig. 4. Average relative abundances for seven commonly identified proteins in tooth and in bone samples which were successful for or failed aDNA analysis. Tooth samples have higher relative abundances of fetuin-A, SPP24 and MMP20 whereas bone samples have higher prothrombin, PEDF, chrondroadherin and lumican. By comparison, the relative abundance of COL1A2 was 1.13 × 10 7 and 1.42 × 10 7 for bone samples with and without aDNA respectively and 1.27 × 10 7 and 1.11 × 10 7 for dentine samples with and without aDNA respectively. required to refine this property, which may be able to identify particular biomarker proteins, in order to be able to apply this methodology in the future.

Ancient DNA survival mechanisms
The lack of correlation between NCP and aDNA survival suggests that each biomolecule may rely on a different preservation mechanism. Many of the NCPs which are commonly identified in ancient bone are extracellular matrix (ECM) proteins, therefore likely to be of high abundance in bone matrix, or are known to associate with collagen which is highly abundant and resistant to decay [15,43]. However, DNA is known to adsorb to hydroxyapatite [51], where they are both thought to be preserved under the same conditions (a neutral or slightly alkaline pH; [52]). Hydroxyapatite-bound DNA is also more resistant to hydrolytic depurination [53] and may also be resistant to spontaneous decay as well as the enzymatic actions of DNase [54]. It therefore seems likely that DNA survival in ancient bone would be better screened for using methods that evaluate the state of preservation of hydroxyapatite crystals such as FTIR [55] or X-ray diffraction based approaches [56].

Conclusions
Proteomics can be a useful tool in the analysis of ancient bone and teeth, however this study has proved that this technique, in its current form, is unlikely to be useful as a screening method to determine whether a sample is likely to contain aDNA. Despite the identification of several proteins across the 69 samples analysed, there are none which fit the criteria to be a suitable biomarker for the presence or absence of aDNA. Additionally, the Asx DL values measured do not appear to correlate with the survival of NCPs in a sample or indicate which samples may be more likely to contain viable aDNA, but rather appear to reflect the state of collagen preservation [24].
However, even though we were not able to propose a precise protein (or group of proteins) as a suitable biomarker for the presence of aDNA in the ancient bone sample we did observe that samples yielding a high number of NCPs (N 18) were always successful for the aDNA analysis. Therefore, the number of NCPs in a sample could potentially be used as a new way to evaluate the likely presence of aDNA in archaeological samples, although further studies of this type including a higher number of ancient samples would be needed to validate this preliminary finding. Although this study was unsuccessful in identifying potential biomarkers for aDNA, it does improve our understanding of aDNA survival mechanisms, potentially highlighting its stronger association to the inorganic phase than the organic phase of bone.

Transparency document
The Transparency document associated with this article can be found, in online version.