Genotypic identification and accession denominations
Establishing the geographical origin and the correct prime name of a grapevine variety can become difficult due to the existence of a great number of synonyms and homonyms locally used, especially in the Mediterranean basin, as a result of the historical population displacements and migration through the centuries [3]. Therefore, population structure analyses such as those performed with STRUCTURE or Darwin software could be helpful to solve doubts about the correct geographical assignation. A consistent number of mislabeling and not verified denominations have been detected, as well as some discrepancies in allele size with respect to VIVC and other databases (Table 1 and Additional file 1). Despite VIVC is continuously updated, the volume of data managed is very huge and possibly still contains some not fully revised information collected before the GrapeGen06 SSR set was provided as a common SSR coding method. Some VIVC prime names of Spanish varieties do not correspond to the prime names reported in the commercial variety national catalogue (https://www.mapa.gob.es/app/regVar/ResBusVariedades.aspx?id=es&TxtEspecie=VID&IDEspecie=119) (e.g. Albillo Forastero instead of Forastera Blanca or Mouratón instead of Juan García). A list of changes will be privately suggested to JKI; in this work Blanca Gordal has been assigned for genotype 62 (Additional file 1) instead of Corazón de Cabrito (variety number VIVC 24550) since it was a clear mistake (personal observation).
Regarding the twenty-three somatic mutants found, eighteen are berry color sports, two are pulp pigmentation mutants (Gamay Teinturier de Bouze and G.T. Freaux) and three are leaf morphology sports (Additional file 1).
Population structure, genetic diversity and genetic differentiation
Grapevine collections maintained at many germplasm banks have been built over several decades through different networks of national and international partnerships. However, all of them are far to comprise all grapevine cultivars worldwide which are estimated to be approximately 10,000 held in field collections, in addition of an undefined number of local minority grapes not yet prospected [23]. Therefore, although several studies have explored the genetic information of these germplasm banks, no one could be fully conclusive about genetic structure of the entire cultivated grapevine gene pool (for review see [9]). The collection studied here is almost entirely composed by wine and dual-use varieties and about 75% of the accessions have supposedly been originated in Central and Western Europe. In any case, the genotype partitioning in STRUCTURE subgroups seems to be stable even when the data set analyzed presents an unbalanced repartition of genotypes from the different regional groups [9]. The vinifera pool of the grapevine collection characterized in this study showed a great He (Table 3), similar to that displayed in larger collections [9,10]. In such high genetic variation conditions, only minimal gains in terms of total variability are possible through extending the genetic pool with entries from diverse eco-geographic sources [24]. Therefore, genetic structure and diversity studies may improve breeding programs.
The ΔK criterion give rise to the first structural level in the data [25] that depends on the nature of the samples analyzed. In the present study, the highest value was obtained for K=2 (Additional file 2) that splitted up accessions cultivated in the Mediterranean climate from those of the Oceanic and Continental climates (Figure 1), unlike previous reports where SSRs at K=2 distinguished among V. vinifera and non-vinifera [10], between subsp. vinifera and sylvestris [26], or among proles and specific sub-proles [13] according to Negrul’s classification [22]. At the next level of stratification (K=3) an additional cluster containing non-vinifera genotypes (RS, IC), table grapes and others with dual-use was pointed out. At K=5 and K=7 the grouping proposed by Negrul [22] was retraced with some additional partition. Varieties in sP2 (Central-Eastern Mediterranean wine varieties) and sP3 (dual-use and table varieties) essentially belong to proles pontica and orientalis respectively, and it is consistent with Darwin trees obtained combining our data with eco-geographic groups inferred at FEM [10] (figures C to H in Additional file 5). Interestingly, Tempranillo, the most cultivated variety in Spain, and Tinto Velasco, also very interesting in Andalusia for his flavor and drought adaptation, fit in sP2. According to Terral et al. [27], Tempranillo shows morphological similarities with some ancient French varieties; in our opinion, it could be related with the not fully disclosed origin of both Tempranillo parents, Albillo Mayor and Benedicto [28]. The origin of Tinto Velasco is still under investigation [21]. The sP4 is primarily composed by Northern Italy varieties and by some from Southeastern France, which are mainly admixed at K=5. In particular, Italian genotypes presented a very high admixture percentage (Additional file 3) in each stratification level according to the weak structuration detected by Cipriani et al. [29]. French varieties also present high admixture level at K=5 and K=7. It is worth noting that they mostly split into sP4, 5 and 6 according to regional cultivation areas similarly to Aradhya et al. [24]. Instead, Spanish varieties show very low admixture levels, especially at K=2 and 3, in disagreement with Bacilieri et al. [9] and Laucou et al. [30], although probably it depends on the nature and composition of the set of samples analyzed. Both Spanish and Portuguese cvs. mainly split into sP1 (Mediterranean Iberian Peninsula, mainly proles orientalis in NJ, figure C in Additional file 5) and sP5 (Northern Iberian Peninsula and Western France, proles occidentalis figure G in additional file 5), but the relative proportion is opposite, probably because in Spain as a whole prevails the Mediterranean climate whereas in Portugal prevails the Oceanic one. The constitution of sP1, that includes most of the Spanish varieties from Mediterranean climate, some Portuguese and few French ones, fits with the cluster identified in the largest grapevine collection worldwide by the 18k SNP genotyping array [30].
All V. vinifera groups inferred by STRUCTURE showed consistent genetic diversity (He) in the same range of previous reports [9,24,26]. In all cases, the value increased only by less than 1% when the most permissive level of ancestry is considered (Table 4). Each group presented a slight excess of heterozygosity with respect to Hardy-Weinberg equilibrium that is supported by the negative values of F. Only sP6 He is slightly below 0.7 probably due to the low number of individuals with strong ancestry. The sP3 should be expected to have the biggest value because of the higher diversity contained in the proles orientalis [9], but probably, the number of individuals with Asiatic origin included in this collection is not enough to confirm this hypothesis. Portuguese cultivars presented the highest diversity among the main represented countries (Table 4). FST values were statistically consistent. Nevertheless, only the general tendency can be compared with other studies because of the different definitions, estimation methods, and interpretations of FST generate some confusion in the literature [31]. Mediterranean Spanish cvs. (sP1, mainly allocated within the proles orientalis-antasiatica in NJ, although some accessions laid within the proles pontica, figure C in Additional file 5) resulted genetically closer to Central-Eastern Mediterranean wine varieties (sP2, predominantly assigned to proles pontica figure D in additional file 5) than to other groups (Figure 3), and showed the highest divergence with Central European group (sP6, assigned to proles occidentalis figure H in Additional file 5, Figure 3). It should be noted that sP3, according to NJ, comprise individuals from both subproles caspica and antasiatica within proles orientalis (figure E in Additional file 5), and these genotypes are mainly related to muscats while those composing sP1 are mainly related with Heben cv. (Additional file 3). Bacilieri et al. [9] found a closer relation of Iberian cvs. with table East varieties than with Balkan ones, accordingly to Emanuelli et al. [10], whom cluster Spanish varieties into the proles orientalis-antasiatica. In the same study, hierarchical STRUCTURE by SNPs clearly separated Spanish cvs. from proles pontica. On the contrary, in Laucou et al. [30] Spanish varieties showed the lowest pairwise FST with Balkan group that should be mainly composed by proles pontica and orientalis-caspica grapes [13]. Clustering methods and markers used can provide not fully consistent outcomes anyway [9,24]. In any case the hypothesis that Phoenician and Greeks introduced in Spain grapevines belonging to proles pontica and orientalis [22,32] is always corroborated. When separating groups by country of origin, Spain shows the lowest FST versus Portugal (Figure 3), depending not only on geographical proximity but also on partition of both accession pools into sP1, sP2 (mainly admixed) and sP5 (Additional file 3), while similar FST values are shown for pairwise Spain-Italy and Spain-France.
Differences among groups account for 10% of total genetic diversity in the strongly assigned accession sub-set (Q ≥ 0.78, Figure 3B). When eco-geographic are referred to a less extended total area, this percentage tends to decrease [13,33], although a higher value was shown when comparing well-clustered wild and cultivated forms [26]. Genetic diversity within groups is almost totally due to the intra individual allelic variation, pointing out the high grapevine heterozygosity [3]. However, in our germplasm collection it seems that the scenario of a vinifera structure linked to a large complex pedigree with grape breeding restricted to a relatively small number of elite cultivars [34] has been further stressed by receiving preferentially selected genotypes throughout time. Extending AMOVA to admixed accessions, the extent of genetic diversity due to differentiation among sPs decreases as well as FST from F statistics does (Figure 3B), indicating a low probability of genetic drift depending on geographical separation, especially when only the sub-area including Italy, France, Spain and Portugal is considered (Figure 3C). These results, together with the consistent admixture levels in each K (Additional file 3) and the weak relationship between pairwise FST comparison values and eco-geographic distances (Figure 3), support that the structure of modern grapevine population has been shaped by a long history of combination of natural hybridization, breeding, selection, human-mediated movements of seeds and cuttings and other factors, as it was proposed by Bacilieri et al. [9].
Mediterranean Iberian Peninsula genetic pool
Myles et al. [34] supported that domestication of Vitis vinifera subsp. sylvestris into cultivated forms (subsp. vinifera) began 6,000-8,000 years ago in the Near East, and then grape growing and winemaking expanded westward toward Europe. Central and Western European grapevine groups showed some degree of genetic relationship with Eastern sylvestris confirming an East-West gene flow by the movement of cultivated genotypes [35]. In addition, in these regions secondary domestication events involving local wild forms took place [24,36,37]. Given that wild and cultivated populations showed very close genetic diversity, Myles et al. [34] also suggested that many cultivars in use today may be only a small number of generations removed from the wild progenitor and claimed that introgression occurred from Western sylvestris to Western vinifera but not vice versa. It may explain why some ancient Central European varieties (proles occidentalis) as Clairette, Pinot Noir etc. conserve clear wild morphological traits [27,32,35]. Meanwhile Arroyo-García et al. [15] showed a major insertion of typical Eastern chlorotypes (especially C, D and E) in Italian, French and German varieties than in Spain, where the huge majority of both commercial varieties and still conserved wild types displayed the type A. Thus, at first sight, chlorotype indications seem to disagree from genetic relationships based on nuclear DNA markers, with the latter suggesting Spanish germplasm to be at least as close to Eastern genotypes as Italian, French and German ones are. A possible scenario is deducible from De Andrés et al. [26]: Spanish wild grapevines are essentially divided in two groups, Northern (NSW) and Southern (SSW), being the latest the most genetically closer to cultivated varieties. Individuals from SSW cluster separately from other wild forms at PCA in Myles et al. [34], while members of NSW group together with other sylvestris population. Interestingly, Eastern cultivated grapes are quite closer to SSW than to other European sylvestris. Myles et al. could not prove lack of introgression from Spanish cultivated grapes into SSW because of the very low number of Spanish cvs. included in that study. However, De Andrés et al. [26] detected a significant number of spontaneous vinifera-sylvestris hybrids in Southern wild populations that could means gene-flow occurred in both directions (for example, Zalema cv. that is very important in Andalusia, showed a very close relation to sylvestris genotypes). Therefore, since Eastern grapevines were introduced in Spain by Phoenician and Greeks, putative repeated hybridization and backcrossing events between both subspecies may be supposed, resulting in the reduction of the genetic diversity among them (more than in other European areas) and obtaining new domesticated forms, without totally discarding the possibility that some primordial domestication had occurred even in former times [38,39]. Throughout this complex process, some female domesticated vines appeared and its fertilization with pollen from imported cultivars originated hybrids with oriental phenotypes conserving chlorotype A, as in the case of Hebén cv. that is an ancestor of many sP1 individuals and was shown previously to be a parent of several Spanish and Portuguese varieties [40]. Despite this hypothesis encloses some speculative elements, it is evident that sP1 accession pool originated by a consistent genetic contribution from oriental grapevines and a long-time interaction between wild and cultivated forms. Andalusia has surely represented a pivotal center of biodiversity development given that this region holds the main reservoir of Southern Spain wild vine populations [41] and Hebén, which was first described by Clemente y Rubio [42], has been cultivated in several areas within and close to Andalusia since very long time [21]. To delve into the question, determining the parents of Hebén, as well as the origin of other chlorotype A varieties, would be extremely helpful, given that it is a very hard issue often depending on lucky archaeobotanical findings [3]. Likewise, the discovery of the origin of Garnacha, that fits in sP1 and is a parent of some French varieties included in this cluster, would further clarify grape domestication and evolution in Western Europe. Finally, it worth mentioning that Eastern genotypes’ contribution to Spanish grapevines is additionally proven by the presence of some accessions carrying the chlorotype D, as Palomino Fino, the main wine variety in Andalusia, and Jaén Tinto and Doradilla that are considered autochthonous of this region. Indeed, the type D chlorotype is common in wild forms eastward from Italy to Middle East [15] and its presence in Spain was previously discussed [23].
Core collections
Core-35 and -63 include 7.3% and 13.1% of the total accessions of the collection, respectively, being these results in accordance with previous studies [43]. These cores are consistent because include an acceptable percentage of each cluster inferred by structure analysis (Table 5). Therefore, they may be suitable for future association studies or at least provide an idea about the optimal size and cluster composition. However, when a specific study will be engaged, the real objective of the working core collection must be carefully analyzed and consequently some additional questions should be taken into account: a) the possibility of including phenotypic traits of agronomical and/or commercial interest, b) the chance of genotype-phenotype covariance due to individual relatedness, which should be avoided by removing/substituting some accessions [44], and c) the possibility of including a priori in the kernel file of MSTRAT some key varieties (e.g. Pinot Noir, Merlot and Sangiovese) which are excluded by the present analysis. Finally, we remark the presence in both cores of individuals from sP7, given that only 4 accessions within this cluster are putative Vitis vinifera.