Quantifying Inter-Laboratory Variability in Stable Isotope Analysis of Ancient Skeletal Remains

Over the past forty years, stable isotope analysis of bone (and tooth) collagen and hydroxyapatite has become a mainstay of archaeological and paleoanthropological reconstructions of paleodiet and paleoenvironment. Despite this method's frequent use across anthropological subdisciplines (and beyond), the present work represents the first attempt at gauging the effects of inter-laboratory variability engendered by differences in a) sample preparation, and b) analysis (instrumentation, working standards, and data calibration). Replicate analyses of a 14C-dated ancient human bone by twenty-one archaeological and paleoecological stable isotope laboratories revealed significant inter-laboratory isotopic variation for both collagen and carbonate. For bone collagen, we found a sizeable range of 1.8‰ for δ13Ccol and 1.9‰ for δ15Ncol among laboratories, but an interpretatively insignificant average pairwise difference of 0.2‰ and 0.4‰ for δ13Ccol and δ15Ncol respectively. For bone hydroxyapatite the observed range increased to a troublingly large 3.5‰ for δ13Cap and 6.7‰ for δ18Oap, with average pairwise differences of 0.6‰ for δ13Cap and a disquieting 2.0‰ for δ18Oap. In order to assess the effects of preparation versus analysis on isotopic variability among laboratories, a subset of the samples prepared by the participating laboratories were analyzed a second time on the same instrument. Based on this duplicate analysis, it was determined that roughly half of the isotopic variability among laboratories could be attributed to differences in sample preparation, with the other half resulting from differences in analysis (instrumentation, working standards, and data calibration). These findings have serious implications for choices made in the preparation and extraction of target biomolecules, the comparison of results obtained from different laboratories, and the interpretation of small differences in bone collagen and hydroxyapatite isotope values. To address the issues arising from inter-laboratory comparisons, we devise a novel measure we term the Minimum Meaningful Difference (MMD), and demonstrate its application.


Introduction
The past thirty years have witnessed an explosive increase in the ubiquity of stable isotope analysis of osseous remains in the fields of archaeology, paleoanthropology, and paleoecology ( Figure 1). Indeed, stable isotope analysis of preserved osseous tissues has become a mainstay of paleodietary and paleoenvironmental reconstruction across anthropological subdisciplines. However, this growth in popularity has outpaced validation of the method's assumptions in at least one key area -the assessment of interlaboratory variation. The present work aims to rectify this lacuna through experimental establishment of the degree and possible causes of inter-laboratory variation in stable isotope signatures of ancient bone collagen (col) and hydroxyapatite (ap).
The importance of stable isotopes for archaeology was first realized by Robert Hall in the late 1960s when he noted anomalously young radiocarbon dates produced by maize or any other species enriched in 13 C [1], leading him to posit the utility of stable isotope analysis for the differentiation of archaeological browsers and grazers [2]. The first practical application of stable isotope analysis to the study of ancient human diet did not come until 1977 [3]. In this first study [3], and many thereafter [4][5][6][7], the main matter of concern was the timing of the introduction of maize agriculture, an event that is fairly obviously evidenced by a dramatic enrichment of consumers' collagen and hydroxyapatite d 13 C signatures. Shortly after this first publication, DeNiro [8] established the fundamentals of d 13 C col and d 15 N col in controlled diet experiments with a variety of animals [9,10]. He then used these two isotope systems of bone collagen to demonstrate a diachronic dietary shift among the prehistoric inhabitants of the Tehuacan Valley of Mexico. More or less contemporaneously, Tauber [11] used collagen carbon isotope values in his study of prehistoric and historic Danish fishers and farmers, Chisholm and colleagues [12] studied the exploitation of salmon by Northwest Coast Amerindians, Schoeninger and colleagues [13][14][15] demonstrated that both d 13 C col and d 15 N col values could be used to discriminate between habitual consumers of marine versus terrestrial foodstuffs, and Ambrose documented the importance of both diet and environment on collagen isotope values [16][17][18][19].
Paleoanthropological and paleoecological applications of stable isotope analysis have a history of comparable duration, although hydroxyapatite of bone, and more often dental enamel, has been the target osseous biomolecule. In the early 1980s, Sullivan and Krueger [20,21] outlined the basics of stable isotope paleodietary reconstruction from biological apatites (d 13 C ap ), and realized the potential for the technique's application to specimens from ''well back into the Pleistocene'' [20:335]. With Lee-Thorp and Van Der Merwe's [22] confirmation that d 13 C ap values of dental enamel preserved biogenic signatures, the die for such work was cast (although see also [23]). Focusing on carbon isotopes in the inorganic fraction of bone (and more often tooth enamel [24]), various studies have pushed back the temporal horizon of stable isotope analysis well into the Miocene and earlier (see [25][26][27][28][29] for review of the pertinent paleoanthropological literature and recent examples).
In the four decades since these first applications, isotope analysis of human (and hominid) dental and skeletal remains has become commonplace. Indeed, Figure 1 demonstrates clearly the method's increasing popularity over the past thirteen years, as represented by the number of publications in three topical journals.
Curiously, however, while the pace of archaeological and anthropological applications of stable isotope analysis has increased, the validation of the technique's assumptions has lagged behind. There have been numerous studies of relevant methodological issues, including isotopic routing [30][31][32][33][34][35][36], controlled diet experiments [31,[37][38][39][40], variability among individual laboratory preparation techniques [41][42][43][44], the causes and consequences of diagenetic and taphonomic change [24,[45][46][47][48][49][50][51][52][53][54][55][56][57][58], and the importance of consistent data normalization and calibration procedures for inter-laboratory comparability [59,60]. However, to the best of our knowledge, there has never been a controlled study assessing the amount of inter-laboratory variation or the degree to which inter-laboratory variation stems from differences in preparation/extraction methods versus difference in analytical instrumentation and data calibration. The present work is intended to remedy these obvious lacunae in our knowledge and assess the confidence in which comparisons of results from different laboratories might be held. This represents a crucial step in assessing just how (dis)similar the conclusions of two laboratories might be when analyzing the same source materials.
The results of this study suggest that, in general, isotopic data from bone collagen (d 13 C col , d 15 N col ) derived from different laboratories are directly comparable. However, the direct comparison of isotopic data derived from bone hydroxyapatite (d 13 C ap , d 18 O ap ) is more dangerous because variability engendered by differences in pretreatment, analysis, and standardization is of a far greater magnitude. To remedy this issue, we introduce what we have termed the Minimum Meaningful Difference (MMD) value, which serves as an empirically derived threshold by which the significance of values obtained in different laboratories might be judged. In the end, the results of this study have serious implications for choices made in the preparation and extraction of target biomolecules, the comparison of results obtained from

Methods and Materials
The fundamental premise of the present study is that the best assessment of inter-laboratory variability in stable isotope analysis would require replicate preparation and analysis of the same demonstrably ancient bone sample by a large number of participating laboratories. As such, the initial four major methodological components were: 1) identification and subsampling of a suitable ancient human bone sample, 2) verification of this bone's antiquity, 3) recruitment of a representative cohort of participating laboratories, and 4) construction of a rigorous survey and reporting regime by which both laboratory methods and results could be compiled in a manner that would facilitate subsequent statistical analysis. The goal of this work is to characterize the amount of variation present among laboratories rather than comment on ''better'' or ''worse'' preparation methods or analytical facilities.
In 2011, one of us (WJP) obtained a presumably ancient unprovenienced human femoral diaphysis from the Museo Gustavo Le Paige in San Pedro de Atacama, Chile. All necessary permits were obtained for the described study (Consejo de Monumentos Nacionales Ord. No. 3682/12, FONDECYT No. 1120376), which complied with all relevant regulations, and the field studies did not involve endangered or protected species. This specimen was judged to be appropriate based on its apparent excellent state of preservation (which is typical of intentionally buried ancient human bone from this hyperarid region of Chile), large size (.100 g), and likely ancient date.
The specimen was AMS 14 C dated at the University of Arizona NSF-AMS facility following their established protocols for 14 C dating of bone (acid-base-acid pretreatment, gelatinization, filtration, graphitization). The resulting AMS date for this specimen (laboratory #AA99865) is 1728647 14 C years before present (d 13 C -17.3%), This equates to a 2-sigma calibrated age range of 238-470 cal AD when calibrated using Calib 7.0 and the SHCAL13 southern hemisphere terrestrial curve [61,62].
Subsequent to radiocarbon dating, the authors solicited forty-six archaeological and paleoecological isotope laboratories in order to assess their willingness to participate in this study. Interested laboratories were informed that they would be provided with sufficient sample material to prepare and analyze at least three collagen and three hydroxyapatite replicates (although it was understood that not all laboratories would be able to comply with a full set of both collagen and apatite measurements). While this study was intended to document variation in the analysis of both collagen and hydroxyapatite, participants were asked to perform both types of analysis only if this was routine for their laboratory. In addition to isotopic data (d 13 C, d 15 N, d 18 O), participating laboratories would be expected to provide details on pretreatment and analytical methods, as well as sample preservation assessments (e.g., sample yield, elemental values, amino acid analysis, FTIR spectra, etc.). All potential participants were informed that while their participation in the project would be made known, laboratory attributions of individual results would be kept confidential and all publicly disseminated data would be presented using randomly generated designators. However, in order for each participating laboratory to be able to assess its results compared to those for other study participants, the respective PI's were informed that they would be provided, at the conclusion of the study, with a full complement of the study's results with their data indicated.
Of the forty-six solicited laboratories, twenty-one (46%) ultimately committed to participate (Table 1). When laboratories provided reasons for not participating, they most often cited factors such as cost and time, although other laboratories declined on the basis that they were no longer performing such analyses. Based on the number of participating institutions, we used a handheld Dremel rotary tool equipped with a diamond cutoff wheel to divide the femur into 112 pieces, each weighing approximately 0.75 g. This large number of samples allowed each laboratory to receive five separate individual samples drawn at random from the overall assemblage of 112 pieces, thereby randomizing intra-bone variability and controlling for any random error engendered by differences in sample pretreatment within each laboratory.
In addition to receiving five bone samples, each participating laboratory was provided with an instruction letter requesting that they prepare all five replicate samples using the same standard laboratory method and four standardized survey forms (Figures S1-S4) to use for recording (as appropriate): collagen preparation methods, collagen results, hydroxyapatite preparation methods, and hydroxyapatite results. The use of such standardized forms was intended to maximize comparability of laboratory protocols and to streamline statistical analysis. Some of the participating laboratories did not follow instructions to process all 5 samples or to do so with identical pretreatment.
Summaries of the collagen and hydroxyapatite protocols for each laboratory are provided (using anonymous identifiers) in Tables S1 and S2 (Supporting Information). Twenty of the twentyone participating laboratories performed collagen extractions, with one laboratory, Laboratory D, performing two different kinds of extractions. Sixteen of the twenty-one laboratories extracted and Table 1. Participating institutions and laboratory PIs.

Institution
Laboratory PI analyzed hydroxyapatite, with one laboratory, Laboratory N, performing two different kinds of extractions. It should be immediately evident that while there are some broad similarities in sample preparation across laboratories (for example, twenty of the twenty-one collagen preparations (95%) were performed using hydrochloric acid (HCl) as the demineralizing agent), the variation in particle size, reagent concentrations, treatment times, temperature, etc., is substantial. The number and diversity of variables makes identification of particular causes of variability challenging (see discussion below), however we were able to identify protocols that overall yield more (or less) similar results.
To control for at least one potential source of variability, isotopic analysis, we reanalyzed as many samples as possible on one instrument. Eighteen of the twenty laboratories that performed collagen extractions (90%), and eleven of the sixteen laboratories that extracted hydroxyapatite (69%), returned aliquots of prepared material for reanalysis. Three aliquots of collagen and hydroxyapatite (when available) were selected from each laboratory's returned samples for reanalysis. Collagen samples were reanalyzed at the UC-Davis Stable Isotope Facility using a PDZ Europa ANCA-GSL elemental analyzer interfaced to a PDZ Europa 20-20 isotope ratio mass spectrometer (Sercon Ltd., Cheshire, UK). Elemental concentration was standardized by reference to Glutamic Acid, and stable isotope composition was standardized by reference to bovine liver, nylon, and USGS-41 Glutamic Acid. Hydroxyapatite samples were re-analyzed in the Stable Isotope Geochemistry Stable Isotope Laboratory at the Rosenstiel School of Marine and Atmospheric Sciences at the  University of Miami using a Kiel-IV Carbonate Device (Thermo-Electron, Bremen, Germany) coupled to a Thermo-Finnigan Delta Plus (Thermo-Electron, Bremen, Germany), and standardized in reference to NBS-19 (TS-Limestone). These duplicate analyses allowed us to independently assess the degree to which isotopic variability resulted from pretreatment versus analysis.
In addition to the use of a battery of well-established statistical analyses (z-score calculation, t-test, ANOVA, Levene's test for equality of variance, Pearson's bivariate correlation, all performed using SPSS v.20 [IBM, New York, USA]), we also used heat maps to visually identify which pretreatment protocols clustered together (i.e., produced similar results). Heat maps visually display data patterns by assigning a gradation of color to numerical values. The heat maps depict the difference in the values obtained by each pair of labs (yellow means no difference, with increasingly red values getting more different). Both axes were clustered using average linkage hierarchical clustering, with Euclidean distance as the distance metric. Heat maps were generated using the Genesis software package developed by Alexander Sturn and Rene Snajder (available freely at http://genome.tugraz.at/ genesisclient/genesisclient_description.shtml). Significance for all analyses was set at a = 0.05.
Finally, we developed a novel metric for the evaluation of interlaboratory variation, the Minimum Meaningful Difference (MMD). The intent of this metric is to establish a means by which to evaluate isotopic results obtained from different laboratories, or when comparing newly obtained results to previously published values in the literature. Our hope is that these values will be treated as an experimentally generated threshold value that one could quickly use when evaluating whether newly generated isotopic data are significantly more enriched or depleted than another laboratory's results or previously published isotopic data. This metric is far more meaningful than a simple t-test, for example, as it explicitly takes into account inter-laboratory variability.
The development of MMDs assumed that the values obtained in the course of the present study are representative of the possible isotope values that might be obtained from any laboratory currently performing such analysis. Minimum Meaningful Differences were calculated by adding the average pairwise interlaboratory difference for each isotope system plus four times the average of the standard deviations obtained by each laboratory participating in the present study (we used four times the standard deviation for each laboratory in order to account for 95% of the laboratory error from both laboratories in each pairwise comparison). Using this value, a researcher can evaluate with ,95% confidence the likelihood that a newly obtained isotope value is different from another value as a consequence of bona fide     biogenic differences rather than laboratory pretreatment and analysis.

Collagen
Although the present work does not focus on the mostcommonly employed indicators of collagen quality (collagen yield, weight %C, weight %N, and atomic C:N), we present these for comparability with other studies. Across all laboratories, the respective values for these metrics were: collagen yield = 16.467.9% (the large range of which is explained, at least in part, by the fact that some laboratories employed ultrafiltration whereas the majority did not), weight %C = 41.765.3%, weight %N = 15.261.9%, and atomic C:N ratio = 3.260.1. These data robustly confirm the excellent quality of preservation of the collagen in the selected specimen.
Across all laboratories, d 13 C col values averaged 217.060.3% and had an overall range of 1.8% (Table 2, Figure 2, top). Of the ninety-six measured values, six were apparent outliers (note red cells in Figure 3, top): one from Laboratory L (z-score 22.1 [p = 0.02]) and all five from Laboratory Q (z-scores from 3.0 [p, 0.01] to 4.1 [p,0.01]). Overall, the laboratories cluster into four distinct groups, with Laboratory Q as a clear outlier (Figure 2, top, Figure 3, top). Nitrogen isotope values averaged 9.060.3%, with an overall range of 1.9% (Table 2, Figure 2, bottom). Of the ninety-six measured d 15 N col values, seven were outliers (with zscores greater than 2.0 [p = 0.02]): One from Laboratory B, two from Laboratory H, and four from Laboratory L (note red cells in Figure 3, bottom). Four major d 15 N col groups emerged, with two clear outliers (Laboratories L and H) (Figure 3, bottom). A statistically significant, but overall weak, Pearson correlation (r = 0.26, p = 0.01) was observed between d 13 C col and d 15 N col values ( Figure 4).
Analysis of inter-laboratory variation indicates significant differences among laboratories for the two isotope systems of interest. For d 13 C col , the average pairwise inter-laboratory difference was 0.2% ( It is worthwhile noting that neither the choice of demineralizing agent (HCl versus EDTA) nor the decision of whether/how to remove humic acids (NaOH, KOH, or no treatment) engendered significant differences in the resulting isotopic signatures. The offset seen between samples demineralized using HCl versus EDTA was only 0.2% for d 13 C col (t = 1.3, df = 94, p = 0.2) and 0.2% d 15 N col (t = 1.5, df = 94, p = 0.2). No significant differences in d 13 C col or d 15 N col were observed as a result of humic acid removal reagents; no treatment, NaOH, and KOH produced indistinguishable d 13 C col (ANOVA, F 2,93 = 1.8, p = 0.2) and d 15 N col (ANOVA, F 2,93 = 0.6, p = 0.5) values. It is possible that the lack of appreciable differences in isotope values between laboratories that did and did not remove humic acids could be a consequence of the sample's low initial humic content.
Reanalysis of the collagen samples affected the number and distribution of outliers. For d 13 C col ( Figure 5, top), three samples, one each from Laboratories A, B, and L, possessed z-scores between 2.1 (p = 0.02) and 2.5 (p,0.01). For d 15 N col (Figures 5,  bottom), a different set of three samples, one each from Laboratories D1, F, and M, had outlier z-scores between 2.0 (p = 0.02) and 2.3 (p = 0.01). Mean isotope values for laboratories that initially had uniformly outlying values (Laboratory Q for d 13 C col and Laboratory L for d 15 N col ) were no longer aberrant after reanalysis. This strongly suggests that although collagen pretreatment methods are responsible for some of the observed Box lines represent first quartile, second quartile (median), and third quartile; whiskers at 95% confidence intervals; dots represent weak outliers (more than 2 standard deviations from mean); asterisks represent strong outliers (more than 3 standard deviations from mean). doi:10.1371/journal.pone.0102844.g005 Table 4. Results of secondary reanalysis of bone collagen. isotopic differences among laboratories, differences in instrumentation or data calibration also drive a large amount of the observed variation in isotope values among laboratories: 69% (1.2% of the initially observed 1.8% range) for d 13 C col and 48% (0.9% of the initially observed 1.9% range) for d 15 N col [59,60].

Establishing Minimum Meaningful Differences for Collagen
The Minimum Meaningful Difference (MMD) value, which takes into account both the average inter-laboratory difference and the typically observed intra-laboratory variability (in the form of the standard deviation of each laboratory's replicate measurements), was determined to be a modest 0.6% for d 13 C col (Table 5). This means that a difference in isotope values obtained from two different analyses is likely to be bona fide if that difference exceeds the threshold value of 0.6%. MMD for d 15 N col (Table 5) was slightly higher (0.9%). The relatively small magnitude of these values, as will be discussed below, provides substantial reassurance about the relative comparability of collagen isotope results obtained from different laboratories.

Hydroxyapatite
Across all laboratories, d 13 C ap values averaged 211.760.6% and had an overall range of 3.5% (Table 6, Figure 6, top). Four measured values, all from the same laboratory (Laboratory R), were apparent outliers, with z-scores between 2.0 (p = 0.02) and 4.4 (p,0.01). Oxygen isotope values averaged 24.661.7%, with an overall range of 6.7% (!) and no apparent outliers (Table 6, Figure 6, bottom). No significant Pearson correlation (r = 20.1, p = 0.2) was observed between d 13 C ap and d 18 O ap values ( Figure 7).
Analysis of inter-laboratory variation indicates significant differences among laboratories for the two isotope systems of interest (note red cells in Figure 8). For d 13 C ap , the average pairwise inter-laboratory difference was 0.6% (  Figure 9, bottom). Indeed, it is this preparation difference that helps to explain why there are two clear clusters (high-level branchings) in the heat maps of Figure 8. As the difference in mean d 18 O ap values for samples processed by hydrogen peroxide versus bleach (2.4%) is greater than the average pairwise difference in d 18 O ap values between any two participating laboratories (2.0%), it would appear that the choice of reagent used for organic removal is a prime driver in inter-laboratory variation in d 18 O ap [41]. The same is not the case for d 13 C ap , as the difference in means between oxidation methods (0.3%) is less than the average pairwise interlaboratory difference (0.6%).
Differences in the labile carbonate removal technique (both concentration of acetic acid and the use of buffered versus unbuffered acetic) did not have a significant effect on d 13 C ap or d 18 O ap values. The offset seen between samples processed with 0.1-0.2 M versus 1.0 M acetic acid was only 0.03% for d 13 C ap (t = 20.19, df = 68, p = 0.9) and 0.15% d 18 O ap (t = 20.33, df = 68, p = 0.7). The differences between samples treated with buffered and un-buffered acetic acid were similarly small: 0.06% for d 13 C ap (t = 0.37, df = 68, p = 0.7) and 0.2% for d 18 O ap (t = 20.49, df = 68, p = 0.6). This result is somewhat unexpected, as previous studies [22,42] have reported that acid strength has an impact on d 13 C ap and d 18 O ap values.

Hydroxyapatite reanalysis
Subsequent reanalysis of a subset of the hydroxyapatite samples on the same instrument revealed at least two interesting phenomena. First, reanalysis produced significantly enriched isotope results for carbon isotope values, which increased modestly from 211.760.6% to 211.260.5% (t = 24.0, df = 99, p,0.01; Table 8, Figure 10, top), as well as oxygen isotope values, which increased more dramatically from 24.661.7% to 23.460.9% (t = 24.6, df = 99, p,0.01; Table 8, Figure 10, bottom). Second, reanalysis on the same instrumentation significantly reduced the variance for d 18 O ap , almost halving the standard deviation of the measured samples from 1.7% to 0.9% (W = 25.3, df = 99, p, 0.01). While the variance for d 13 C ap was also reduced during reanalysis, the observed difference (0.6% versus 0.5%) was modest by comparison, and not significant (W = 0.01, df = 92, p = 0.92).
Two other phenomena of note were evident after reanalysis. First, the samples from Laboratory R, which had outlying d 13 C ap values in the initial run, were not outliers in the reanalysis (z-scores between 20.5 [p = 0.3] and 0.9 [p = 0.18]). Indeed, there was only one outlier among the two isotope systems, a solitary samples from Laboratory O which had a d 13 C ap value with a z-score of 2.4 (p, 0.01). This finding suggests that Laboratory R's aberrant isotope values in the first round of analysis were the result of analytical instrumentation, working standards, or data calibration rather than a preparation step [59,60]. Indeed, these factors, rather than pretreatment per se, would seem to drive a large portion of the observed variation in isotope values for hydroxyapatite: 44% (1.5% of the initially observed 3.5% range) for d 13 C ap and 54% (3.6% of the initially observed 6.7% range) for d 18 O ap . These results echo the recent findings of Carter and Fry [59] who demonstrated that differences in data calibration and correction can lead to substantial isotopic differences among laboratories. Second, while reanalysis on the same instrument reduced the difference in d 13 C ap values between samples oxidized using bleach versus hydrogen peroxide (,0.3%, t = 21.1, df = 29, p = 0.3), a significant difference in d 18 O ap values remained (1.3%, t = 23.6, df = 29, p,0.01). Therefore, although instrumentation drove some of the isotopic variation among laboratories, differences in preparation (particularly oxidation/organic removal) were also responsible for observed differences in isotopic values obtained from different laboratories, a finding that is in agreement with previous studies [41,43].

Establishing Meaningful Minimum Differences for Hydroxyapatite
As noted above, the average inter-laboratory pairwise differences for d 13 C ap and d 18 O ap were 0.6% and 2.0% respectively. The Minimum Meaningful Difference (MMD) value for d 13 C ap was determined to be 1.2% (Table 5). This value suggests that a difference in isotopic signatures obtained from two different analyses is only likely to be bona fide when that difference exceeds the threshold value of 1.2%. MMD for d 18 O ap (Table 5) was much larger (3.1%). This latter value is of particular concern, as it is greater than the difference in bona fide d 18 O values that might be expected to result from biological or environmental differences (e.g., residency, residency or paleoclimate), as is more fully elucidated below.

Conclusions
The present study began with the goals of: 1) quantifying interlaboratory variability in stable isotope analysis of bone collagen and hydroxyapatite, and 2) tracing the likely causes of this observed variability. For bone collagen, we found statistically significant inter-laboratory variation for both carbon and nitrogen isotope values among laboratories. However, the average pairwise difference between any two participating laboratories was only 0.2% for d 13 C col and 0.4% for d 15 N col . These values are of such a small magnitude as to not be cause for great concern. As to causality, neither of the most obvious differences in pretreatment between participating laboratories (demineralizing reagent or humic acid removal) had a significant effect on the resulting isotope values.
Subsequent reanalysis of a subset of samples on the same instrument indicates that the prime driver of inter-laboratory variation in collagen stable isotope analysis is differences in analytical instrumentation and/or standardization rather than pretreatment (accounting for 48-69% of the observed initial interlaboratory variation). Finally, the Minimum Meaningful Difference (MMD) value establishes a threshold by which results obtained from two laboratories might be evaluated. Differences exceeding 0.6% for d 13 C col and 0.9% for d 15 N col have a high likelihood of being of biological origin rather than an artifact of pretreatment or analysis. In sum, it would appear that the results of stable isotope analysis of bone collagen from one laboratory can be compared (if cautiously) with results obtained elsewhere. Overall, inter-laboratory variability in collagen isotopes would not appear to be of paramount concern.  For bone hydroxyapatite, the results of the present study are somewhat less reassuring. Inter-laboratory variability for both d 13 C ap and d 18 O ap was significant, and while the average pairwise difference between any two participating laboratories was only 0.6% for d 13 C ap , for d 18 O ap , that value rose to 2.0%, a difference which could easily change interpretations of past residency or paleomobility. It is unlikely that anything more than a small portion of this variability is the result of differential diagenesis [43], as variability within laboratories (each of which received a randomized set of bone samples) was significantly less than interlaboratory variability. Instead, as previously suggested [41], differences in oxidation treatment (NaOCl versus H 2 O 2 ) appear to be a prime driver of d 18 O ap variability, but not for d 13 C ap . However, differences in the method for removing labile carbonate (acid concentration and buffering agent) do not have a significant effect on either isotope system, counter previous suggestions [22,42] to the contrary. Perhaps such difference were not observed because all laboratories that used strong acid used buffered acid or very short treatment times.
As with collagen, the subsequent reanalysis of a subset of samples on the same instrument indicates that differences in analytical instrumentation and/or standardization (rather than pretreatment per se) were a prime driver of inter-laboratory variation in hydroxyapatite stable isotope values (accounting for 44-54% of the observed initial inter-laboratory variation). Efforts to unify data correction among laboratories would likely decrease this variability [59,60]. The Minimum Meaningful Difference (MMD) values suggest that results obtained from two laboratories have a high likelihood of being bona fide rather than an artifact of different pretreatment or analytical methods only if they exceed 1.2% for d 13 C ap and 3.1% for d 18 O ap . The magnitude of these MDD values, particularly for d 18 O ap , might call into question the attribution of biological significance oftentimes given to different d 13 C ap and d 18 O ap values obtained in different laboratories. In sum, it would appear that inter-laboratory variability could be a significant concern for hydroxyapatite. Analytical results from different laboratories might not be directly comparable, particularly in the case of d 18 O ap .
Three final points merit consideration. First, it should be noted that while the present study addresses the wisdom of (over)claiming the significance of dissimilar results from different laboratories, even small differences (if replicable) obtained in one laboratory can still be considered reliable. Second, the bone hydroxyapatite results presented here may not be directly applicable to comparisons of enamel hydroxyapatite, a tissue thought to be far more resistant to diagenesis [43]. However, the results presented here may be used as a cautionary starting point for enamel  Box lines represent first quartile, second quartile (median), and third quartile; whiskers at 95% confidence intervals; dots represent weak outliers (more than 2 standard deviations from mean); asterisks represent strong outliers (more than 3 standard deviations from mean). doi:10.1371/journal.pone.0102844.g009  comparisons. Third, and finally, the results presented here ought to be thought of in terms of providing a minimum estimate of potential variability that could be generated among laboratories.
The extremely high quality of preservation of the selected ancient bone specimen might lead us to underestimate possible interlaboratory variation in preparation methods. We would expect to see larger isotopic differences among laboratories if they prepare and analyze a poorly preserved sample with lower collagen yield, greater post-mortem humic contamination, or more non latticebound carbonates. Box lines represent first quartile, second quartile (median), and third quartile; whiskers at 95% confidence intervals; dots represent weak outliers (more than 2 standard deviations from mean); asterisks represent strong outliers (more than 3 standard deviations from mean). doi:10.1371/journal.pone.0102844.g010 Table 8. Results of secondary reanalysis of bone hydroxyapatite.