Ancient Dna from Hunter-gatherer and Farmer Groups from Northern Spain Supports a Random Dispersion Model for the Neolithic Expansion into Europe

Background/Principal Findings: The phenomenon of Neolithisation refers to the transition of prehistoric populations from a hunter-gatherer to an agro-pastoralist lifestyle. Traditionally, the spread of an agro-pastoralist economy into Europe has been framed within a dichotomy based either on an acculturation phenomenon or on a demic diffusion. However, the nature and speed of this transition is a matter of continuing scientific debate in archaeology, anthropology, and human population genetics. In the present study, we have analyzed the mitochondrial DNA diversity in hunter-gatherers and first farmers from Northern Spain, in relation to the debate surrounding the phenomenon of Neolithisation in Europe.


Introduction
The phenomenon of Neolithisation refers to the transition from a hunter-gatherer way of life to an agro-pastoralist lifestyle, involving crop farming and livestock herding. There is consensus on the origin of the agro-pastoralist lifestyle associated with the Neolithic in the Near East, from where it spread throughout Europe. Yet, there is no such agreement on the mechanisms and means of transmission of farming to Europe. Traditionally, the spread of crop farming and livestock herding in Europe during the Neolithic has been framed within a dichotomy based either on an acculturation phenomenon or on a demic diffusion process. The demic diffusion model describes a migratory process based on a population expansion from the Near East into Europe, whose consequence was the assimilation of the genetic pool of the indigenous hunter-gatherer groups by the expanding of the farming community [1][2][3][4][5][6]. On the other hand, the acculturation model posits that this transition occurred through the adoption of the agro-pastoralist system by local indigenous groups, without receiving any genetic input [7][8][9]. Between these two models there are others that suggest a varying intensity of the genetic impact from the Neolithic farming communities that spread throughout Europe from the Near East [10,11].
The analysis of the genetic composition of present-day populations in Europe and the Near East, has tried to establish the origin of their extant genetic variability. Based on classical genetic markers, Ammerman and Cavalli-Sforza [1], proposed the wave-of-advance model, whereby the demic diffusion from the Near East towards Europe contributed to the genetic composition of present-day populations. Nevertheless, based on modern European patterns of mitochondrial diversity, Richards et al. [9] argued that this mitochondrial diversity had a predominantly Paleolithic origin, with a small Neolithic contribution (12%), which would favour the cultural diffusion model. Subsequent studies applying new methodologies have allowed quantifying the contribution of Neolithic farmers to the genetic pool of presentday European populations at 23% [12,13].
Several studies on the variability of the non-recombining region of the Y chromosome (NRY) have detected the presence of a southeast-northwest gradient in Eurasia, which has been interpreted as the genetic fingerprint of Neolithic expansion [14,15]. A recent analysis of the variability of the Y chromosome in more than 2,500 samples taken from present-day European population revealed that, the diversity of haplogroup R1b1b2 (the most common one in Europe) is best explained by the spread from a single source in the Near East via Anatolia during the Neolithic [16]. This proposal contradicts prior studies, which consider this haplogroup to be a marker of the Mesolithic re-expansion from the glacial refuge in the Franco-Cantabrian region, the Balkans and the Alps [14,15].
The analysis of the DNA recovered from ancient human remains has highlighted a more complex pattern than the one revealed by the studies of contemporary populations [17][18][19][20][21][22] (amongst others). Studies hitherto conducted on ancient DNA (aDNA) from European Neolithic populations indicate two routes to explain the Neolithic dispersion to Europe: one, supporting an acculturation model that followed a route of dispersion through the North-Central Europe [8,23], and a second one, supporting a demic diffusion model through a Mediterranean route [24,25].
Ancient DNA studies of hunter-gatherers from Scandinavia and Central Europe have pointed to a genetic discontinuity between hunter-gatherer and Neolithic populations in these two geographic regions [26,27]. The analysis of mtDNA in the first Neolithic farmers of Central Europe (7.5-7 kya) showed a high frequency of haplogroup N1a (15%), which is rarely found in present-day populations (0.2%) [8,23]. On the other hand, these first farmers showed a genetic affinity with the present-day populations from the Near East and Anatolia, supporting a major genetic input from this area during the advent of farming in Europe [23].
The mtDNA analysis from the Neolithic site of the Iberian Mediterranean region named, ''Camí de Can Grau'' in Catalonia (5.5-5 kya), [24] has provided results that are different to those published by Haak et al. [8,23]. The absence of haplogroup N1a and the presence of mitochondrial lineages typical of present-day populations from this European Mediterranean area, led Sampietro et al. [24] to propose a demic diffusion only in Mediterranean area of Europe.
Likewise, the combined analysis of the mtDNA and NRY variability in 29 individuals from a French Mediterranean region, named ''Cave I of Treilles'' in Aveyron, (5 kya), [25] reaffirmed the aforementioned proposal, based mainly on the absence of the mitochondrial haplogoup N1a and the R1b of the NRY, considered Neolithic dispersion markers through Central Europe [8,16,23].
In view of the debate surrounding the phenomenon of Neolithisation in Europe, we have carried out a genetic analysis of Paleolithic hunter-gatherer and Neolithic samples recovered from nine sites in the North of the Iberian Peninsula (Basque Country, Navarre and Cantabria), whose chronologies date from the Upper Paleolithic through to the Bronze Age ( Figure 1 and Table 1). The aim of this study is to explain the Neolithisation process along the Iberian Cantabrian fringe within the context of European populations.

Authenticity of the results
We have successfully analyzed 49 out of 54 individuals recovered from the nine prehistoric sites analyzed in the present study; four sites correspond to hunter-gatherer groups (La Chora, La Pasiega, Erralla and Aizpea), another four to the Neolithic period (Los Cascajos, Paternanbidea, Marizulo and Fuente Hoz) and the last one to the Bronze Age (Urtiaga). Out of these 49 individuals, 30 (61% of the individuals) were analysed in duplicate in our laboratory; in addition, a third sample from each of 22 individuals (44% of the individuals) was replicated at an independent laboratory (some in the University of La Laguna and others in the INTCF of Madrid, Spain).
The number of molecular targets (mtDNA copy number) was quantified for each extract by means of quantitative real-time PCR (qPCR). The results showed that the number of molecules per ml in the extracts ranged between 100-12,000 (Table S1). These values are within the limits proposed for reproducibility in aDNA studies [28].
In addition, a total of 69 PCR products from 49 individuals were cloned. A minimum of ten clones per PCR product were amplified and sequenced. The results were used to determine the coincidence between the consensus sequence obtained from the clones and the sequence obtained by direct sequencing of the initial PCR product. A mean of 4.20 mutations per fragment cloned (,100 bp) were rejected, as these mutations were found in single clones, possibly as consequence of post-mortem damage to the aDNA molecules (Table S2).
In order to identify any possible contamination that might have occurred in the different stages of the laboratory work, at least two extraction controls and several PCR negative controls were included in each amplification. If any contamination was detected, the results obtained were discarded.
Finally, the mtDNA HVR-I segment from the researchers and archaeologists that handled the samples was sequenced in order to rule out any possible contamination (Table S3). A comparison between the haplotypes of the researchers/archaeologists and those obtained in the aDNA samples ( Table 2) produced two cases of coincidence, one with the rCRS haplotype, which is the most frequent one in European populations, and the other with haplogoup K (ht18: 92-224-311), which is widely present not only throughout the whole of the Iberian Peninsula [29] and the Canary Islands [30], but also Europe [9,12,13] and North Africa [31]. HVR-II sequencing was performed on those prehistoric samples that coincided with the haplotypes of the researchers and/ or archaeologists in order to discard a complete coincidence between them.

Analysis of mtDNA variability in the prehistoric samples
We have successfully analyzed 49 individuals from nine prehistoric sites and obtained 25 different mitochondrial haplotypes belonging to eight mitochondrial haplogroups (H, U, K, J, HV, I, T and X) ( Table 2). Haplogroup H is the major one, showing a frequency of 45% in the ancient samples analysed. This figure is similar to that observed in some present-day European populations including the North of the Iberian Peninsula [13,[32][33][34]. The second most frequent haplogroup was haplogroup U (34.7%) ( Table 2), being the Neolithic populations of Navarre (Los Cascajos and Paternanbidea) the ones that showed the highest frequencies for this haplogroup (29.6% and 11.1%, respectively). A special mention should be made of sub-haplogroup U5, as it is one of the oldest found in Europe (,36.9 kya), and has been linked to the colonisation of Europe by anatomically modern Homo sapiens (AMHS) in the Early Upper Paleolithic [13,[35][36][37]. The average frequency of U5 in the present study was 12%, considering huntergatherer samples (Erralla and Aizpea), Neolithic samples (Los Cascajos and Marizulo) and the Bronze Age sample (Urtiaga) ( Table 2).
Haplogroup J exhibited a frequency of 4.08% in the samples analysed, being found in two individuals from Los Cascajos, with two different haplotypes: ht7 and ht12 (Table 2).
Finally, haplogroup K was only found in the Neolithic sites of Navarre (Los Cascajos and Paternanbidea), showing an average frequency of 8.2%. All the other haplogroups (HV, I, T and X) had a frequency of around 2% each, with haplogroups HV and I being only found in Paternanbidea, and haplogroups T and X in Los Cascajos (Table 2).

Statistical and multivariate analysis of the mitochondrial variability of prehistoric populations
The prehistoric populations analysed in this study (Table 1) were compared with other present-day populations from the Iberian Peninsula, Europe and the Near East, as well as with those prehistoric populations reported in the literature (Table S4).
In order to evaluate the differences between the populations compared, genetic distances have been calculated using the F ST statistic (Table S5). The three groups of hunter-gatherers considered in this analysis (from Scandinavia, Central Europe and the Cantabrian fringe on the Iberian Peninsula) did not show statistically significant differences between one another. However, the hunter-gatherers from Scandinavia (HG_SCA) and those from Central Europe (HG_CE) showed statistically significant differences with other Neolithic, Chalcolithic and present-day populations. The hunter-gatherer group in the Cantabrian fringe (HG_CANT) did not show statistically significant differences with any other populations in the comparison, due probably to its small sample size (n = 4), for this reason it was not included in the MDS analysis ( Figure 2).
The Neolithic populations from Catalonia (NEO_CAT), France (NEO_FR) and Navarre (NEO_CAS and NEO_PAT), did not show statistically significant differences either between each other or with present-day populations. However, the Neolithic population from Central Europe (NEO_CE) showed statistically significant Table 2. Distribution of the frequencies (%) of the mtDNA haplotypes (HT) and haplogroups (HG) obtained in the prehistoric samples from the present work (see nomenclature in Table 1). differences with the Neolithic populations from Navarre (NEO_CAS and NEO_PAT), as well as with present-day populations in both Europe and the Near East (Table S5). The samples from Los Cascajos and Paternanbidea (NEO_NA-VARRE) were merged as they did not show statistically significant differences (Table S5) and because the sites were very close in terms of chronology, culture and location ( Figure 1).
Concerning the Chalcolithic populations in the Basque Country (San Juan ante Portam Latinam (SJaPL), Pico Ramos and Longar), they did not show statistically significant differences with any of the populations in the comparison set (except with huntergatherer groups) (Table S5).
Regarding present-day populations, statistically significant differences were apparent between those in the Near East and those in Europe, indicating a genetic differentiation between these two geographic regions (Table S5).
A Multidimensional Scaling (MDS) analysis was carried out providing a two-dimensional view of the F ST distance matrix ( Figure 2). This analysis showed an RSQ of 0.980 and a stress of 0.0260. The results of the MDS clarified some of the results obtained in previous F ST pair-wise comparisons (Table S5) and in Principal Component Analysis (PCA) (data not shown), as it is the difference between the hunter-gatherers (HG_SCA and HG_CE) and the rest of populations. The hunter-gatherer samples showed the highest frequency values for haplogroup U (50%-80%) than any other present-day population (this haplogroup occupied one end of the first axis in PCA, data not shown). The Neolithic groups are heterogeneous, setting the Neolithics of Central Europe (NEO_CE) apart from the rest of Neolithic groups (NEO_FR, NEO_CAT, and NEO_NAVARRE). This is because they showed a high frequency for haplogroup N1a, that is absent in other Neolithic populations. The Neolithic sample of France (NEO_FR) is closer to present-day populations in the Near East, because it shows similar frequencies for haplogroups TX, W, J, H and U (the ones with the highest correlation for the first axis in the PCA, data not shown). However, the Neolithic populations from the Iberian Peninsula (NEO_CAT and NEO_NAVARRE) are located between the variability of present-day European populations and those in the Near East. Likewise, the Chalcolithic populations in the Basque Country (Longar, SJaPL and Pico Ramos) occupied a similar position to the Neolithic groups of the Iberian Peninsula ( Figure 2).

Phylogenetic analysis of ancient samples
We have constructed a Median Joining Network (MJN) (Figure 3) with the HVR-I sequences of the ancient samples obtained in this study, together with a set of sequences of European hunter-gatherer and farming individuals published to date (Table S4). Given the high mutation rate of HVR-I, we applied the substitution rates obtained by Meyer et al. [38,39] to establish varying mutational weights ranging from 0 to10. The resulting MJN is shown in Figure 3, where the central node is represented by the rCRS, which is shared by all the population groups included in the analysis.
The hunter-gatherer groups in Scandinavia (pink), Central Europe (orange) and Cantabrian fringe (dark blue) share a high number of haplotypes, located in the nodes of the network corresponding to haplogroup U. This figure confirmed the close phylogenetic relationship between the three groups which show the same subsistence pattern (Figure 3).
The sequences corresponding to haplogroup N1a have been found only in the Neolithic groups in Central Europe (purple) and in one individual of a Neolithic site in northwest France (Prissè-la-Chorrière, Deux-Sèvres) (brown), where only three individuals were recovered [40]. This indicates that haplogroup N1a, which has a high frequency in the first Neolithic farmers in Central Europe, is also present in northwest France associated to a late Neolithic group (Figure 3). Furthermore, we have found shared haplotypes for the Neolithic groups from the Cantabrian fringe (blue), the French Mediterranean area (white) and Catalonia (green). Moreover, we have also found relationship between these three Neolithic groups and the hunter-gatherers from the Iberian Peninsula (dark blue) and Europe (pink and orange). This indicates the presence of shared haplotypes between hunter-gatherers and Neolithic groups from different regions and timeframes (Figure 3).

Discussion
We have carried out the analysis of the mtDNA variability of a sample of 49 (out of 54) individuals recovered from nine archaeological sites in the north of the Iberian Peninsula (Navarre, Basque Country and Cantabria), with a chronological range from the Upper Paleolithic to the Bronze Age (Table 1). Our goal was to evaluate the model of transition from hunter-gathering to farming in the Cantabrian fringe. These 49 individuals provided 25 different mitochondrial haplotypes, which indicates high genetic variability ( Table 2). Both Paleolithic and Neolithic individuals and even present-day populations along the Cantabrian fringe, share haplotypes albeit at a different proportion between populations.
The mitochondrial variability found in the prehistoric samples analysed here was classified into eight haplogroups: H, U, K, J, I, HV, T, and X, which are amongst the most frequent mitochondrial haplogroups in Europe (Table 2). Nevertheless, the distribution of these haplogroups is different to that in presentday European populations.
The F ST test indicated that European hunter-gatherer groups (HG_CE, HG_SCA and HG_CANT) did not show statistically significant differences between them, but they are significantly different from any population compared, because to the high frequency of haplotypes within the haplogroup U (50%-80%) (Table S5, Figure 3). On the other hand, it was noted some differences within the European Neolithic groups, with the Neolithic group in Central Europe (NEO_CE) showing the highest number of statistically significant differences in F ST test, whereas the Neolithics from France and Catalonia (NEO_FR and NEO_CAT) showed the lowest number of statistically significant differences (Table S5). The differentiation of the Neolithic group in Central Europe (NEO_CE) is due largely to its high frequency of haplogroup N1a (15%).
Haplogroup N1a was proposed by Haak et al. [8] as a possible genetic marker for the first Neolithic farmers who lived in Central Europe 7,500 years ago [8,23]. This haplogroup was also found in one individual recovered from a Neolithic burial site in northwest France, confirming its spread to another geographic region in addition to Central Europe [31] (Figure 3). Nonetheless, haplogroup N1a was absent in the Neolithic groups analysed here (NEO_NAVARRE), as well as in other Neolithic samples from the Mediterranean area (NEO_CAT and NEO_FR) [24,25], which might suggest that the genetic influence of Neolithic groups from the Near East in Central Europe was different to that in the western Mediterranean area. This proposal fits well with the lack of simple patterns detectable in the mtDNA data that are available so far.
Haplogroup J, also proposed as a marker for the spread of Neolithic farmers from Near East [13], in the present study has been found only in the ancient Neolithic groups from Navarre (NEO_NAVARRE), with a frequency of 6.25%. This value is the lowest of those described for Neolithic populations studied so far, which showed heterogeneous frequencies (NEO_FR: 20%, NEO_CAT: 18% and NEO_CE: 9.5%) [8,[23][24][25]. Regarding those lineages belonging to haplogroup J, their influence was not so important in the sites from Navarre compared to other European regions. This result highlights the complex biological patterns resulting from Neolithisation in contrast with the simpler and more evident cultural patterns.
The differences in the genetic composition of Neolithic groups in Europe (including those analysed here) are also evidenced from their respective frequencies for other mitochondrial haplogroups (H, U, I, K, HV, V, T and X). We would like to highlight the absence of haplogroup V, proposed as a marker of the post-glacial recolonization from Franco-Cantabrian refuge [41]. The 49 prehistoric samples analysed in the present study, have not produced a single individual belonging to haplogroup V, which is consistent with previous aDNA studies carried out in the Basque Country [42]. This may suggest the existence of a sub-structuration in populations from Cantabrian Fringe during Paleolithic period that it is maintained up to the present, given the variation of frequencies of haplogroup V in the present-day population from this geographic region [42,43].
The MDS analysis revealed the difference between the huntergatherers and all the other populations compared, as well as the heterogeneity of European Neolithic groups (NEO_CE, NEO_FR, NEO_CAT, NEO_NAVARRE) ( Figure 2). Moreover, it is highlighted the unique position of Central Europe (NEO_CE) in relation to their southern counterparts (NEO_FR, NEO_CAT and NEO_NAVARRE). This difference seems to indicate that the human groups associated with the spread of agriculture in the region of Central Europe had a different genetic composition to those that followed a Mediterranean route, which were devoid of haplogroup N1a.
The distances observed between the Neolithic groups from the Mediterranean area (NEO_FR, NEO_CAT) and Navarre (NEO_NA-VARRE) (Figure 2) are also explained by their different frequencies for certain mitochondrial haplogroups, with haplogroups J, U and H standing out (the ones with the highest correlation for the first two axes of the PCA, date not shown). The distances observed between these Neolithic groups could be due to a different genetic impact of Neolithic farmers from Near East into the indigenous populations in the different regions of Europe. This genetic influence could be greater in southern France (NEO_FR) than in the Iberian Peninsula. The sites of Los Cascajos and Paternanbidea (NEO_NAVARRE), from Early Neolithic, although they showed a substantial Neolithic cultural influence, they seem to show a lower genetic contribution of female migrants from the original areas where Neolithic first developed (Figure 2).
In this regards, the Chalcolithic populations in the Basque Country, dated towards the end of the fifth millennium (SJaPL, Pico Ramos and Longar), showed similar distances with the Neolithic groups in Catalonia (NEO_CAT) and in Navarre (NEO_NAVARRE) (Figure 2). This may be due to the demographic stabilisation of human groups from the Neolithic onwards, which would have attenuated the genetic variations existing during the Paleolithic and first stages of the Neolithic.
In sum, the differences between Hunter-Gatherer and Neolithic groups in Europe can be attributed to the restructuring of their genetic make-up due to the incoming gene flow from the Near East. The genetic data obtained in this mtDNA study contradicts the total acculturation and replacement models proposed for explaining the phenomenon of Neolithisation. Whereas in Central Europe-Northern France a Neolithic clinal gradient can not be discarded from aDNA data so far, the differences in the mtDNA found between Neolithic sites at the Mediterranean area provide support for a random dispersion model. Maritime colonization, transporting small and different Neolithic groups from the Near East pool could contribute to explain the difference.
The Neolithic sites from Navarre (Paternanbidea and Los Cascajos) provide the largest samples in this study and are of considerable importance, as on the one hand, they are dated to the Early Neolithic (6.2-5.2 kya) and, on the other, they are found in association with archaeological evidence of fully Neolithic lifestyle (settlements, agriculture and livestock herding) [48]. All the other sites are exceptional in terms of their antiquity and cover an ample spectrum of the geography along the Cantabrian fringe ( Figure 1).
The entire excavation process involved strict precautions designed to avoid contamination. Furthermore, the anthropological remains were immediately removed, without undergoing washing, often still embedded in the sediment, and taken to our laboratory where cleaning was performed in dry conditions. The samples analysed in this paper are mostly dental pieces, which constitute the most isolated system in skeletal remains and, therefore, are less liable to outside contamination. The processing of the samples in the laboratory involved the application of a series of strict criteria for the authentication of results. We used some of the criteria detailed in [54][55][56][57], such as DNA quantification, cloning, duplication and replication of the results by two independent laboratories. In addition, the extraction and preparation of the PCR were undertaken in a positivepressure sterile chamber, physically separated from the laboratory where post-PCR processes were carried out. All the work surfaces were cleaned regularly with sodium hypochlorite and irradiated with UV light. Suitable disposable clothing was worn (lab coat, mask, gloves and cap). Contamination controls were applied in both the extraction and amplification processes. In this study, we did not use racemisation of the ancient samples, as several authors have showed this method does not provide efficient information on the reproducibility of the aDNA results [58].

Collection, Selection and Preparation of Ancient Samples
We have selected those teeth without caries or deep fissures that might extend into the pulp. Whenever possible, more than one tooth was taken from each individual for duplicate analysis, with the duplicates being analysed in different sessions at the University of the Basque Country (UPV/EHU). In addition, a third sample from some individuals was analysed in independent laboratories (either University of La Laguna or National Institute of Toxicology and Forensic Sciences (INTCF), Spain). When dental pieces could not be obtained, bone tissue from long bones was collected (14 samples).

Extraction of DNA Using Phenol/Chloroform
In order to eliminate surface contamination, the teeth were subjected to a process of depurination using acids, and the entire surface was irradiated with ultraviolet light [59]. In the case of bone-based extraction, the external part of the bone was cleaned by physical scraping, with a piece of bone being then removed and powdered (Freezer Miller Spex 6770, Edison N.J.). The extraction process followed the protocol described by Hagelberg et al. [60]: the tissue was incubated with stirring for 2 hours at 56uC in a lysis buffer (5 ml) (0.5 M EDTA pH 8.0-8.5; 0.5% SDS; 50 mM Tris HCl pH 8.0; 0.01 mg/ml proteinase K). The DNA was recovered using phenol and chloroform and then concentrated and purified (Centricon-30, Amicon). Each extraction session involved two contamination controls that were applied to the entire process, except no dental or bone tissue was added.

Analysis of the variability of mtDNA
Sequencing of mtDNA HVR-I, nucleotide positions (nps) 15,998-16,400, and mtDNA HVR-II, nps 16504-429 as per [61], was undertaken in six overlapping fragments, each with a length of approximately 100 bp (base pair). HVR-II sequencing was carried out in samples with no polymorphic positions in HVR-I (Table S6). Similarly, the fragment between primers 8F and 8R (Table S6) was amplified in all samples to determine position 73 of HVR-II. The PCRs were performed in 25 ml of reaction mixture containing 10 mM Tris-HCl pH 8.3; 2 mM of MgCl 2 , 0.1 mM of each dNTP, 0.4 mM of each primer, 5 units of AmpliTaq Gold (Applied Biosystems) and 10 ml of diluted DNA (1 ml of DNA extract in 10 ml of 1 mg/ml BSA). Cycling parameters were 95uC for 10 min; followed by 40 cycles of 95uC for 10 sec, annealing temperature for 30 sec, 72uC for 30 sec; and a final step of 72uC for 10 min. The annealing temperatures of the primers of HVR-I were as follows: 60uC for the A1/A1R primer pair, 58uC for 2F/ 2R and 4F/4R, 57uC for 1F/1R and 55uC for 3F/3R and 5F/5R, the primer sequences are listed in [62]. The sequence and annealing temperatures of the HVR-II primers are shown in Table S6. The amplification of each fragment was undertaken in independent PCRs and each fragment was amplified and sequenced twice from two independent DNA extract. In the case of positive amplification and the absence of contamination, the amplifications were purified by ExoSAP-IT (USB Corporation), with subsequent sequencing in an ABI310 automatic sequencer using chemistry based on dRhodamine. The results obtained were edited with BioEdit software (http://www.mbio.ncsu.edu/ BioEdit/bioedit.html) and the sequences were aligned manually.
In order to classify the mitochondrial variability of the individuals analyzed in this study, we proceeded to amplify 11 markers, which are required for defining the 10 Caucasian haplogroups described [63]. The protocol and primers are described in [17,20,42]. The digestion patterns were verified using a fragment Bioanalyzer (Agilent Technologies).

Authentication Methods
In addition to the precautions taken to avoid contamination, other authentication criteria such as quantification, cloning and sequencing were applied.
Duplication. 61% of individuals were analyzed in duplicated at different times and by different researchers at the University of the Basque Country (UPV/EHU).
Replication. The HVR-I of a sub-sample of 21 individuals (43% of the samples) was replicated; almost all of them at the University of La Laguna (Tenerife, Spain) (20 individuals) and the remaining one at the INTCF (Madrid, Spain).
Quantification of target DNA. We have quantified the target DNA analysing by quantitative real-time PCR (qPCR), a PCR fragment of 113 bp length of HVR-I, by means of SYBR Green. The methodology is a modification of the one described in [64]. A segment of 113 bp of HVR-I was amplified using primers 59-CACCATTAGCACCCAAAGCT-39 and 59-ACATAGCGGTTGTTGATGGG-39 [64] in a final volume of 30 ml containing: 26Power SYBR Green Master Mix (Applied Biosystem), 5 mM of each primer and 10 ml of DNA (dilution 1/ 10 BSA). Cycling conditions were 1 cycle at 95uC for 10 min, 40-45 cycles at 95 uC for 15 sec and 60uC for 1 min, and a denaturation cycle of 60uC to 96uC. Four replicates were performed for each sample. The standard curve was drawn using eight serial dilutions of a fragment of 405 bp length of the HVR-I, which included the fragment to be quantified. There were four replicates for each dilution obtaining typical values of efficiency (101%) and r 2 (0.988). In each qPCR, we included the corresponding extraction and amplification controls.
Cloning. In order to detect possible heterogeneities in the PCR products that may correspond to either post-mortem damage and/or mixed contamination, a fragment of HVR I was cloned in a sub-sample of 10 individuals by means of the TOPO TA CloningH Kit (Invitrogen). Linkage to the vector pCRH2.1-TOPOH and chemical transformation of the cells TOP10F' (One ShotH E. coli) were performed following the supplier's instructions. After culturing the cells overnight at 37uC in a selective environment, 10 blank cultures were collected and each one cultured in a non-selective environment. Five ml of purified product (using a QIAprep Spin Mini prep Kit, QIAGEN) of each clone were used for the sequencing reaction.
We determined the HVR-I and HVR-II sequence of the mtDNA of the researchers and archaeologists who handled the samples in order to discard possible contamination (Table S3).

Statistical Analysis
Genetic diversity [65] and genetic distances (F ST distances by Reynolds) were calculated on the basis of haplogroup frequencies (Table S4) using Arlequin 3.11 [66].
In addition, the distance matrix was calculated between the populations studied and those existing in the literature by means of Arlequin 3.11 [66] (Table S4). This distance matrix has been depicted in two dimensions by means of a Multidimensional Scaling (MDS) analysis (SPSS 17 Software) (Figure 2). Furthermore, a Median-Joining Network has been constructed using the 175 sequences of hunter-gatherer and farming individuals that have so far been published (including the samples in this study), using the Network 4.6.0.0 program (www.fluxus-engineering.com) (Figure 3).

Supporting Information
Table S1 Summary of results of quantification, replication and cloning from the samples analyzed in the present study (see nomenclature in Table 1 and Table 2). (DOC)  Table S4 Prehistoric and present populations compiled from literature that constitutes the database of HVR-I sequences of mtDNA for the present study. Abbreviations of the populations (Abbrev), number of individuals analyzed (N), number of haplotypes (HT), number of polymorphic sites (s), haplotype diversity (Hd), Variance of Hd, standard deviation of Hd (sd Hd), average number of nucleotide differences (k), Haplogroup (HG) and gene diversity calculated from the frequencies of mitochondrial haplogroups. See abbreviations in Figure 2; * see chronology in Table 1