Self-(in)compatibility in apricot germplasm is controlled by two major loci, S and M

Apricot (Prunus armeniaca L.) exhibits a gametophytic self-incompatibility (GSI) system and it is mostly considered as a self-incompatible species though numerous self-compatible exceptions occur. These are mainly linked to the mutated SC-haplotype carrying an insertion in the S-locus F-box gene that leads to a truncated protein. However, two S-locus unlinked pollen-part mutations (PPMs) termed m and m’ have also been reported to confer self-compatibility (SC) in the apricot cultivars ‘Canino’ and ‘Katy’, respectively. This work was aimed to explore whether other additional mutations might explain SC in apricot as well. A set of 67 cultivars/accessions with different geographic origins were analyzed by PCR-screening of the S- and M-loci genotypes, contrasting results with the available phenotype data. Up to 20 S-alleles, including 3 new ones, were detected and sequence analysis revealed interesting synonymies and homonymies in particular with S-alleles found in Chinese cultivars. Haplotype analysis performed by genotyping and determining linkage-phases of 7 SSR markers, showed that the m and m’ PPMs are linked to the same m0−haplotype. Results indicate that m0-haplotype is tightly associated with SC in apricot germplasm being quite frequent in Europe and North-America. However, its prevalence is lower than that for SC in terms of frequency and geographic distribution. Structures of 34 additional M-haplotypes were inferred and analyzed to depict phylogenetic relationships and M1–2 was found to be the closest haplotype to m0. Genotyping results showed that four cultivars classified as self-compatible do not have neither the SC- nor the m0-haplotype. According to apricot germplasm S-genotyping, a loss of genetic diversity affecting the S-locus has been produced probably due to crop dissemination. Genotyping and phenotyping data support that self-(in)compatibility in apricot relies mainly on the S- but also on the M-locus. Regarding this latter, we have shown that the m0-haplotype associated with SC is shared by ‘Canino’, ‘Katy’ and many other cultivars. Its origin is still unknown but phylogenetic analysis supports that m0 arose later in time than SC from a widely distributed M-haplotype. Lastly, other mutants putatively carrying new mutations conferring SC have also been identified deserving future research.


Background
Gametophytic self-incompatibility (GSI) is a system widely distributed in the plant kingdom [1] that prevents selffertilization favoring outcrossing [2]. GSI specific recognition is under the control of a multi-allelic locus, termed S-locus, containing at least two linked genes: a pistil expressed S-RNase [3] and the pollen expressed S-locus F-box [4][5][6]. Slocus F-box proteins are thought to be components of E3 ubiquitin ligase complexes that recognize non-self S-RNases promoting their ubiquitination and degradation by the 26S proteasome proteolytic pathway [7,8]. Recently, the collaborative model proposed in Solanaceae (Petunia) suggests that several F-box proteins are necessary to recruit S-RNases for degradation [9,10]. This system seems to be extended to other plant families exhibiting GSI such as Rosaceae and particularly the Maloideae subfamily [10,11]. However, Prunus does not seem to follow this model since knock-out of the S-locus F-box (SFB) gene leads to self-compatibility (SC) in contrast with the observations in Solanaceae. Reasons behind this behavior have been speculated for a long time [12,13] but only recently evidence have been provided supporting S-locus F-box like-2 protein as a 'general inhibitor' that detoxifies S-RNases unspecifically, unless affected by SFB [14].
In general, self-incompatibility (SI) trait predominates in stone fruits (Prunus genus) in accordance with the high degree of heterozygosity showed by many of these species. However, different 'degrees' of SC have been detected in this genus ranging from sweet cherry, almost strictly self-incompatible (with a few exceptions), or apricot, mainly self-incompatible (SC is restricted to European cultivars), to complete SC in peach [15]. SC is mostly related to mutations in the S-locus genes and mutations affecting both S-RNases and SFBs have been detected in many Prunus species [13,16]. In apricot (Prunus armeniaca L.) the S C -allele known to confer SC has been well characterized showing that a 358 bp insertion in the SFB gene leads to a putative truncated protein lacking the two essential 3′-hypervariable domains HVa and HVb [17]. Furthermore, the origin and dissemination of S C has also been studied, identifying the nonmutated ancestor S 8 -allele and detecting its presence in accessions from different geographic areas [18]. In fact, most S-genotyped self-compatible apricot cultivars have been shown to carry the S C -allele [18][19][20][21][22].
Along with the S-locus specific products, other Slocus unlinked factors are also necessary for the GSI system to work. These factors known as 'modifiers' were firstly identified in Solanaceae [23]. Nevertheless, genetic evidence supporting modifiers have also been accumulated in other species including Prunus spp. [17,24]. In apricot, pollen-part mutations (PPMs) conferring SC by putatively affecting modifiers have been identified in the Spanish local cultivar 'Canino' and the North-American one 'Katy'. Both PPMs were mapped at the distal end of chr.3 within the so-called Mand M'-loci, respectively [25,26]. Given their distinct geographic origins and the significant genetic distance estimated between both cultivars, it was initially hypothesized that these two PPMs could have arisen independently but affecting the same gene [26].
In this context, we thought that additional S-locus unlinked mutations (and not just restricted to the Mlocus) might also be present in apricot germplasm. Thus, in this study we have genotyped both the Sand M-loci in a wide set of apricot cultivars and accessions from different geographic origins, by using SFB and S-RNase introns and M-locus linked SSR markers. These data were combined with those of phenotyping to dissect SC causes and distribution in apricot germplasm.

Self-incompatibility versus self-compatibility in apricot
A set of 67 apricot cultivars and accessions (hereinafter referred only as accessions) representing a wide range of geographic origins has been analyzed. Information about pedigree was only available for a few of them but it was quite useful to support genotyping data. Blooming time and self-(in)compatibility phenotypes were scored according to literature reports. Thus, apricot accessions were classified into 5 gross classes (early, mid-early, mid, mid-late and late) regarding the first phenotype and as self-incompatible or self-compatible regarding the second (Table 1).
Self-(in)compatibility phenotype data were completed with our own results when adult trees were available. In this regard, self-pollination tests were used to determine (8) or to confirm (13) self-(in)compatibility phenotypes but also to obtain progenies useful to search for S-locus unlinked mutations. Data suggest that 5 accessions are self-incompatible (' Aurora' , 'Cow-2' , 'Perla' , 'Veecot' and 'Velázquez') while the remaining 16 show variable fruitsetting ranging from 0.5% ('Búlida') to 55% ('Ninfa'), being recorded as self-compatibles (Table 2). Some progeny could be finally obtained from all accessions of this latter group except from 'Búlida'. Those accessions with a progeny large enough were subsequently analyzed for non-S-locus mutations by S-genotyping embryos. The self-compatible controls ('Canino' and 'Katy'), already studied, and the S C homozygote cultivar 'Galta Roja' were excluded. In summary, 40 out of the 67 accessions analyzed were classified as self-compatible, 16 as selfincompatible, 2 as male-sterile and the 9 remaining as undetermined (Table 3).

S-genotyping: identification of new S-alleles
All the 67 accessions were S-genotyped using primer pairs amplifying different fragments of the S-haplotype region: the first and second S-RNase introns as well as the 5′-UTR SFB intron (Table 3). S-alleles were determined by comparing intron size patterns with previously S-genotyped accessions and, when needed, sequence analyses supported these assignments. Fragment analyses of the PCR products obtained with 5 primer combinations allowed us to detect up to 20 different S-alleles. Seventeen out of them had already been reported in apricot (S 1 , S 2 , S 3 , S 4 , S 5 , S 6 , S 7 , S 8 , S 9 , S 11 , S 15 , S 17 , S 18 , S 19 , S 20 , S 24 and S C ) and in the present work were designed basically according to the nomenclature established by Vilanova et al. [19], Halázs et al. [27] and Halázs [28], except for S 20 , corresponding to the one described by Zhang et al. [29] and S 24 reported by Wu et al. [30] and Gu et al. [31]. Another S-allele (present in 'Cow-1' and 'Cow-2') was found to have a previously non-described intron size pattern being named as S 22 , according to its homology with the NCBI GenBank accession HM053569, corresponding to the S 22 -RNase from the Chinese apricot cultivar 'Tuxiangbai'. Lastly, two additional S-alleles, preliminary suggested as S V and S X , were detected only once. The amplification of first S-RNase intron with the single primer pair SR1-F/SR1-R distinguished up to 13 S-alleles ranging from 260 (S 4 ) to 427 bp (S 17 ) (see Additional file 1: Table S1). Another primer combination (Pru-T2/SR1-R) allowed to amplify and distinguish two additional S-alleles S 6 and S 24 . Exceptions were S 1 and S 7 , and S C and S 8 pairs having exactly the same fragment sizes and S 19, S 20 and S X that could not be PCR-amplified with any of the four different primer combinations tested (data not shown). Sizes for the second S-RNase intron were approximately determined by agarose gel electrophoresis ranging from 300 (S 24 ) to 2800 bp (S C ) and allowed to define 15 S-alleles with a single primer pair (Pru-C2/Pru-C6R). Exceptions were S 5 and S 6 , and S 8 and S C pairs as well as S 19 , S 20 and S V sharing similar fragment sizes, and S 3 that could not be amplified with any of the two primer combinations used. Variability in size of the 5′-UTR SFB intron containing fragments ranged from 189 (S 17 ) to 210 bp (S 1 ) and facilitated the identification of 9 S-alleles. Exceptions were S 2 and S 11 , S 18 and S 22 , and S C and S 8 pairs sharing the same intron sizes and S 4 , S 5 , S 9 , S 19, S 20 , S 24, S V and S X which fragments could not be amplified. Altogether, combined data allowed us to distinguish unambiguously 18 out of the 20 S-alleles detected excluding S V and S X (see Additional file 1:  Table 3) Table S1). S C and S 8 -alleles could only be distinguished by PCR-amplifying a fragment containing the 358 bp insertion present in SFB C and absent in SFB 8 [17,18]. As a result, S 8 was only detected in four Hungarian cultivars ('Ceglédi óriás' , 'Effect' , 'Gönci Magyar' and 'Szegedi Mammut') while S C was found in 38 cultivars from many different origins (Table 3). Some already identified S-alleles (S 3 , S 20 and S 24 ) were not present in previously S-genotyped control cultivars (Table 3). For instance, putative S 3 detected in 'Harlayne' and 'Henderson' has an estimated size for the first S-RNase intron that matches that of the 'Sunglo' S 3 -allele identified by Vilanova et al. [19] and nucleotide sequences from the three cultivars aligned by CLUSTALW showed 99% identity (data not shown). Moreover, according to the pedigree 'Sunglo' is the 'Harlayne' male parent [32]. S 20 could not be amplified for the first S-RNase intron but the estimated size as well as the nucleotide sequence (99% identity) (see Additional file 2: Table S2) of the second one matches almost perfectly that of the S 20 -allele reported by Zhang et al. [29]. Interestingly, S 19 could only be weakly amplified from 'Mari de Cenad' with PruC2/PruC6R primers and the fragment size observed in this work and that previously reported by Halázs et al. [20] is similar to S 20 , suggesting that both S-alleles might be the same. Unfortunately, this fragment could not be finally sequenced to confirm this point. S 24 was only detected in 'Ezzine' but the estimated sizes for the first and second S-RNase introns, and the nucleotide sequence of the first one (248 bp), matches that S 24 -allele reported by Gu et al. [31] (see Additional file 2: Table S2).
As first finding, all SSR-alleles in coupling with the 'Canino' PPM m were found to be also in coupling with the 'Katy' PPM m'. In other words, both cultivars share the same pollen-part mutated haplotype designated hereinafter by m 0 . As a whole, these results lead to define initially four different M-haplotypes (m 0 , M 1 , M 2 and M 3 ) according to the genotypes established for 'Canino' , 'Katy' and the reference cultivar 'Goldrich' (Fig. 1). These seven SSR markers were subsequently PCRgenotyped in the remaining accessions. Structures of up to 34 additional M-haplotypes were then statistically inferred (see Methods for details) while m 0 , M 1 , M 2 and M 3 were fully confirmed (see Additional file 4: Table S4). For the sake of simplicity and to facilitate the graphical representation of their relative frequencies, M-haplotypes were Canino 9-7 ?
Cultivars previously S-genotyped by Halázs et al. [18]; Vilanova et al. [19]; Halázs et al. [20]; Zuriaga et al. [26]; Halázs et al. [27]; Burgos et al. [42]; Alburquerque et al. [79] b Own data on SI/SC phenotype obtained in this work (see Table 2). Additionally, SC had been observed in a set of accessions grown under insect-proof screen house at IVIA ('Rojo de Carlet' , 'Mitger' , 'Palabras' , 'Palau' , 'Currot' , 'Ginesta' , 'Canino 14-4' and 'Canino 14-6') showing moderate fruit-setting (not quantified) across several years c Male-sterility in the ASP accession was indicated by shrunken pale anthers d S-allele nomenclature is proposed according to Vilanova et al. [19]; Halázs et al. [27]; Zhang et al. [29]; Wu et al. [30]; Gu et al. [31] and Halázs et al. [28]. S-haplotype associated with SC (S C ) is written in bold e M-haplotypes were named with two digits. The first one corresponds to the M-haplotype 'main class' and the second to the subtype. M-haplotype variants associated with SC (m 0-0 and m 0-1 ) are written in bold. Haplotypes designated by M ? could not be defined f S-genotype determined for Rózsakajszi (S 2 S C ) was not in agreement with that previously reported by Halázs et al. [18] (S C S C ). Reasons for this discrepancy are still unknown grouped into 'main classes' when they differ in no more than three SSR alleles (resulting in a total of 20 M-haplotype 'main classes': m 0 and M 1 to M 19 ). Thus, M-haplotypes were designed by two sub-indexes, the first one corresponds to the 'main class' and the second defines the subtype (see Additional file 4: Table S4). Structures of the M-haplotypes corresponding to the cultivars 'Kech-pshar' and 'Fergani' could not be inferred but according to the SSR allele sizes all four are distinct (data not shown). Phylogenetic reconstructions from Jaccard's and Bruvo's distances show that some 'main classes' clustered close together suggesting common ancestry such as, for instance, M 7 and M 8 ; M 2 , M 4 and M 16 or m 0 , M 1 and M 13 (Fig. 3). In this latter group, the nearest haplotypes to the majority haplotype

Mutations conferring self-compatibility in apricot
The S C -haplotype known to confer SC [17] was found in 38 accessions (being homozygous in 7) (Table 3). Thirty out of the 38 were self-compatible, 6 have undetermined phenotype and two were male-sterile. The m 0 -haplotype associated with SC by Zuriaga et al. [25,26] was detected in a total of 19 accessions (being homozygous in 8): 8 were previously shown to be self-compatible, 5 were undetermined and one was classified as selfincompatible ('Cow-2') since it did not produce any fruit after self-pollination (Tables 2 and 3). As a whole, all self-compatible accessions analyzed have at least one of these two haplotypes already known to confer SC (S C and/or m 0 ) except for 'Harlayne' , 'Henderson' , 'Shalah' and 'Mariem'. The number of self-compatible haplotypes carried by each accession varied from 0 to 4. A total of 21 accessions do not carry neither S C nor m 0 , 26 carry only one self-compatible haplotype (corresponding to Sand M-genotypes S C /x or m 0 /x), 13 carry up to two (S C /S C , m 0 /m 0 or S C /x-m 0 /x), 1 carry three (i.e. Gavatxet S 20 /S Cm 0 /m 0 ) and 6 carry four (S C /S C -m 0 /m 0 ) ( Table 3).
Segregation of the S-genotypes in self-progenies obtained from two self-compatible cultivars carrying a single copy of the m 0 -haplotype ('Portici' and 'Corbató') supported a mutation outside the S-locus as the cause for the phenotype (Table 4). Progenies from selfcompatible cultivars ('Dulcinea' and 'Bebecou') not carrying the m 0 -haplotype were used as controls to confirm this finding (Table 4). Similar results were observed in other accessions (with fewer offspring) carrying ('Cow-1' and 'Cristalí') and not carrying ('Ezzine' and 'Ninfa') the m 0 -haplotype (data not shown). Moreover, distortion ratios detected in the segregation of SSR markers tightly linked to the M-locus (AGS.20 and PGS3.23) point out that the mutation is located at the M-locus (Table 4).
Interestingly, S-genotypes segregation found in the 'Portici' progeny might also indicate the presence of another mutation affecting the S 2 -haplotype, since the number of S 2 S 2 genotypes was unexpectedly high ( Table 4). Analysis of genomic DNA fragments containing the complete sequence of S 2 -RNase and SFB 2 alleles from 'Portici' revealed only one mismatch (A/G) within SFB 2 located at position 1.296. This change leads to a non-synonymous substitution (lysine by arginine) in the hypervariable region HVb.

Geographical distribution of self-(in)compatibility
For the sake of the analysis, cultivars and accessions studied in this work were grouped as belonging to four big geographic areas according to the country of origin, pedigree (Table 1) and data about dissemination [36][37][38]: North America (NA), Western Europe (WE), Eastern Europe (EE) and Southern Europe/North Africa (SE/NAf). Nonetheless, this classification is obviously arbitrary and exhibits some inconsistencies. This is particularly true for the WE and SE/NAf groups considered separately but clearly interrelated. Attending to geographic criteria, cultivars from Italy, France and Spain (Valencia region) were grouped into the first one while those from Tunisia, Greece and Spain (Murcia region) into the second. Spanish cultivars were divided into two subgroups that might reflect some differences according to [49]. Parentage relations among some cultivars from WE and SE/NAf groups (i.e. 'Canino' with 'Ouardi' and 'Sayeb') underscore the inaccuracy of this classification but, assuming these limitations, it produces more balanced groups without affecting data interpretation. Thus, SC phenotype is distributed across all four groups but frequencies varied from 2/19 (SI/ SC) in WE to 9/16 in NA (Fig. 2). Similarly, blooming time types from early/mid-early to mid-late/late are present in all four groups, but frequency ranged from 9/24 in WE to 1/16 in NA (Table 1). In general it could be said that most early/mid-early classified accessions are self-compatible (13/15) but this phenotype is also generally found in many mid-late/late classified accessions (12/17). Therefore, even considering the reduced number of scored accessions and the limited phenotype data, it seems that there is no correlation between SC and early blooming in cultivated apricots.
Frequencies of the S-alleles also varied between the different regions analyzed. For instance, S 2 and S C are present in all four groups but, on the contrary, S 8 and S 9 are only present in EE, S 3 in NA and S 7 in SE/NAf (Fig. 2). Regarding S-alleles distribution it is also important to take into account synonymies and homonymies. In the first case, BLASTN has revealed that some of the S-alleles initially found and numbered from European accessions have also been detected in Chinese cultivars but named differently. For instance, a 659 bp fragment containing the second intron of the S 17 -RNase present in the North-American cultivar ' Aurora' (as well as in 'SEO' , 'Orange Red' and 'Henderson') shows >99% identity with those from S 44 and S 9 -RNases identified from the Chinese cultivars 'Shailaiyulvke' and 'Xinshiji' , respectively, suggesting these three S-alleles to be the same. Similar examples are reported for S 1 , S 4 , S 9 , S 12 , S 16 and S 20 -RNases. The other way around, BLASTN also points out homonymies. For example, S 9 derived from the Hungarian cultivar 'Ceglédi óriás' is different to that S 9 reported from the Chinese cultivar 'Xinshiji' (see Additional file 2: Table S2). BLASTN has also detected significant similarities between S 13 reported in accessions from Armenia (including 'Shalah'), Turkey, Tunisia and Morocco [20][21][22]   and S 5 , previously found in Southern Spanish cultivars [19], and among S 12 , detected in Turkey and Tunisia [20,22], S 24 and S 4 (see Additional file 2: Table S2). Distribution of the 38 inferred M-haplotypes is not uniform either. The M 1 'main class' is present in all four groups, m 0 is only found in WE and NA and others, such as M 12 and M 2 , were exclusive for EE and NA groups, respectively (Figs. 2 and 3).

S-alleles and the apricot diffusion from Armenian and Chinese centers to Europe
Most apricot S-alleles identified in this work had been previously reported except for S 22 , S V and S X [19,[27][28][29][30]. Geographical distribution of these S-alleles is heterogeneous including a few widely disseminated and others restricted to certain areas. These results are in agreement with previous works. For instance, S 2 and S C had already been found in accessions from all countries analyzed in this work but also in Turkey [20], Morocco [21], Afghanistan [18], Iran [40], etc. On the contrary, S 1 is only present in NA and in a few accessions from WE. Meanwhile, S 10 to S 14 -alleles, suggested to be from Armenian origin by Halázs et al. [20], have been detected in Turkey, Tunisia, Morocco and in a few materials from EE but they were thought to be absent in WE [20][21][22]41]. However, S 5 suggested to be the same that S 13 connects (along with S 6 ) Armenian, Eastern-Turkish and Moroccan accessions [20][21][22] with Southern-Spanish accessions [19,42] supporting the apricot Southwest-Mediterranean diffusion route, from the Irano-Caucasian gene pool, proposed by Bourguiba et al. [38].
The heterogeneous distribution of the apricot S-alleles is a consequence of the crop domestication and diffusion and this process has also produced a genetic bottleneck detected in neutral SSR markers [38]. Though not quantified this loss of genetic diversity can also be grossly appreciated at the S-locus. For instance, China is not only considered a center of origin but also a probable domestication center for the species [38] and at least 20 different S-alleles have been found in just 30 Chinese cultivars [29,30]. This variability is considerably much higher than that detected in European accessions and, interestingly, only 5 of these S-alleles (S 9 , S 14 , S 17 , S 20 and S 24 ) seem to be represented in apricot cultivars outside China. In addition, other S-alleles detected in far-east region cultivars such as S 15 /S 18 in 'Kech-pshar' (Uzbekistan), S 10 in 'Harmat' (Armenia) [27] and S V /S X in 'Fergani' (former USSR) are not found in European accessions. S-alleles can therefore be a useful tool to study apricot dissemination but the establishment of a unified nomenclature would be advisable to favor this goal.

Pollen-part mutated m-haplotype origin and dissemination
The two S-locus unlinked PPMs (m and m') conferring SC in apricot cultivars 'Canino' and 'Katy' were found to be located in an overlapping region at the distal end of chr. 3 [26]. This finding suggested that both PPMs might be the same. However, in addition to their different geographic origins, 'Canino' and 'Katy' exhibited a high value for the Nei's genetic distance (0.83) estimated on the basis of 85 SSR markers distributed across the whole genome [26]. Thus, initially it was hypothesized that these two PPMs originated independently but might affect the same gene. Nevertheless, a more detailed analysis of a set of SSR markers linked to the M-locus has allowed us to determine that haplotypes in coupling with the PPMs have the same structure in both cultivars. Thus, pollen-part mutated mand m'-haplotypes previously associated with SC in 'Canino' and 'Katy' cultivars [25,26] are now considered to be the same (m 0 ). Moreover, in this work the m 0 -haplotype has been detected in 17 additional accessions (excluding 'Canino' clonal sibs) mainly Spanish (12 in total) but also from USA, Australia, France and Italy. Fifteen of them were confirmed as self-compatible (exceptions were 'Cow-2′ described above as well as 'Gandía' , 'Gavatxet' , 'Manrí' and 'Martinet' with undetermined phenotype). The m 0haplotype was frequently accompanied by the S C -allele in confirmed self-compatible accessions (7 cases), which might suggest that mutations conferring SC tend to accumulate once the system is broken due to relaxed selective constraints [43]. Following this reasoning and considering the higher prevalence of the S C -allele in confirmed self-compatible accessions (74%) compared with the m 0 -haplotype (23%), it might also be speculated that PPM m arose in a self-compatible accession carrying S C . However, against this hypothesis is the fact that the m 0 -haplotype alone was also found in 6 confirmed self-compatible accessions and, therefore, further analyses are necessary to clarify this point. This latter group includes 'Portici' (S 2 /S 20 -M 1-4 /m 0-0 ), previously shown to lack the S C -haplotype but which causative mutation was still unknown [44].
Beside the m 0 -haplotype, 37 additional M-haplotypes were identified by SSR analysis being grouped in 19 'main classes'. Microsatellite haplotype distances were analyzed by two alternative clustering methods to validate results. The first one relies on the proportion of shared alleles assuming independence and ignoring mutational processes which can bias distances, particularly when loci are highly polymorphic. This method was based on the similarity coefficient for binary data developed by Jaccard [45]. The second takes into account the stepwise mutation model considering higher likelihood for small than for large changes in microsatellite repeat number and it is based on the Bruvo's distance [46]. Results obtained with both methods were equivalent but the second draws a more accurate phylogeny, where 'main classes' m 0 , M 1 and M 13 group close together suggesting a not too distant common ancestor. Remarkably, the closest haplotypes to m 0-0 (M 1-0 , M 1-1 and M 1-2 ) are widely distributed geographically, while m 0-0 seems to be restricted to NA and WE, and M 13 to EE. The situation is different for the mutated S C -allele present in all geographic areas studied [18][19][20][21] whereas the ancestor S 8 -allele was only detected in Hungarian and Turkish cultivars [20]. Altogether, these results suggest that the m 0 -haplotype arose later in time than S C since its distribution is much more limited. It can also be hypothesized that the last common ancestor for m 0 , M 1 and M 13 spread throughout different regions and mutated somewhere in WE. Nevertheless, in general, it has not been observed a significant correlation between geographic distribution and clustering of the M-haplotypes (see Fig. 3), and this may be due to the frequent crosses between genotypes from different groups [39].

SC and genotyping: uncovering accessions carrying unanalyzed mutations
Part of the materials analyzed in this work could not be self-pollinated because adult trees were not available. However, though phenotype could not be directly assessed in these cases, according to the almost perfect association observed between SC and S C /m 0 -alleles it should be expected that most, if not all, accessions carrying whatever of these two alleles were self-compatible. However, some incongruities between phenotype and genotype need to be highlighted. For instance, in spite of having the m 0-0 haplotype self-pollination of 'Cow-2' (S 20 /S 31 -M 1-0 /m 0-0 ) did not produce any fruit. This behavior could be due to fruit-set problems, as exemplified by the self-compatible cultivar 'Búlida' [47] which setting was also nearly null. Alternately, it may also be possible that m 0-0 in 'Cow-2' does not carry the expected PPM but further analyses are needed to check this point. On the contrary, four cultivars classified as self-compatible do not carry neither the S C nor the m 0-0 haplotypes: questioned in all these three cultivars. However, SC in 'Harlayne' is fully confirmed since it has been tested by several independent sources [47,48]. Regarding genotypes, all the S-alleles carried by these cultivars can be frequently found in self-incompatible accessions and therefore expected to be 'functional'. Something similar happens with M 1-0 and M 2-0 haplotypes but M 8-1 and M 8-2 belong to the M 8 -haplotype 'main class' always detected in self-compatible accessions (a total of 7 in this work and mostly East-European). However, no progenies are available and therefore it could not be tested whether M 8 is mutated (non-functional) or not. As for M 16 and M 19 , they were detected only once in the set of accessions analyzed. Overall, there are still no evidence supporting the nature of the mutation/s conferring SC in these cases and neither S/M-loci unlinked mutations nor other mutations affecting Sand M-loci can be discarded.
Lastly, genetic analysis suggests the presence of a SNP mutation within SFB 2 HVb region in 'Portici' that could also be associated with SC. It could be speculated that a single non-synonymous change within a SFB hypervariable region might alter its specificity, since these domains (strongly hydrophobic and under positive selection) were already suggested to have a role in the specific recognition of S-RNases [49]. Interestingly, this change has also been found in 'Katy' SFB 2 where a similar distortion was detected in the self-progeny [26]. However, further research is needed to validate this observation and other possibilities cannot be discarded (i.e. mutations affecting pollen viability).
Forces selecting for self-compatibility in apricot SC is therefore quite common in apricot but distribution is not uniform across geographic areas. SI is the prevalent phenotype in three out of the four major eco-geographical groups for apricot (centers of origin): Central Asian, Irano-Caucasian and Dzhungar-Zailij and also in the later proposed Chinese group [36]. In fact, studies about Chinese apricots do not report self-compatible cultivars [29,30,50]. On the contrary, SC predominates in the European group but some disequilibrium can also be observed. Among the materials analyzed in this work, SI is frequent in commercial North-American cultivars (two thirds) but unusual in West and East-European countries (one fifth). This might point out a non-European SI ancestral donor for North-American cultivars apricots as previously suggested for PPV resistance trait [51].
It is generally accepted that apricot genetic diversity decreases from east to south-west [36] and, in this context, it is questionable whether SC might be one of the causes. Recent works suggest that a substantial part of this loss is independent of the selection impact due to the domestication bottleneck, being more related to apricot diffusion routes [38]. In general, self-fertility would be advantageous under unfavorable pollination conditions including early blooming [52] but this correlation has not been observed in this study. However, we cannot discard that SC was originally selected from early-flowering apricot types. In fact, in cherry (Prunus avium) only a very few cultivars are reported to be naturally self-compatible and these include two exceptionally early-flowering cherries, 'Cristobalina' and 'Kronio' [53,54]. Moreover, they were selected independently since each one carry a different mutation conferring SC, 'Cristobalina' in a non-S-locus pollen modifier [24] and 'Kronio' in the SFB 5 allele [54]. On the other hand, in spite of the benefits linked to SC for growing stone fruits (increased yield, removing of interspersed pollinators, etc.) it was not usually considered as a target trait. Some morphological and agronomic traits are putatively linked to the S-(fruit shape) and M-loci (polycarpel and flower color) in Prunus [55] but it seems unlikely that SC was brought in by linkage drag with these particular traits. Thus, SC in apricot is more likely the indirect result of selective breeding on major traits, such as ecological adaptation [36], and only recently it has become a breeding objective by itself. In any case, SC must have had an effect on reducing diversity since S C S C homozygote cultivars (potentially resulting from self-pollination events) are frequently found in most European countries.
In this work we have confirmed two independent PPMs (S C and m) as the main causes for SC in apricot. According to [16] a total of 29 mutations conferring SC have been identified in Prunus. Most of them affect S-locus (27) and only two correspond to mutated modifiers, the apricot mutated m 0 -haplotype and a non S-locus PPM in sweet cherry [17,24]. Indeed, these two are located at the distal end of chr.3 but it is still unknown whether they affect different genes [26]. Within the supposedly complex GSI mechanism it is noteworthy this low number of mutations affecting modifiers. One possible explanation is that other modifiers participating in the control of the GSI system may be redundant and when mutated do not confer SC. Furthermore, loss of function of some factors might lead to SI as may be predicted for SLFL2 [14]. Lastly, in spite of the high number of accessions already evaluated in all Prunus species, it can also be argued that a more exhaustive screening might reveal novel mutations. Indeed, in the light of the results of this work, it is conceivable that additional mutations conferring SC may remain 'hidden' in the large amount of unanalyzed plant material.

Conclusions
It has been shown that apricot S-allele geographical distribution is heterogeneous. In addition, it seems that the loss of genetic diversity, hypothesized for this crop during the dissemination process, affected not only neutral markers but also the S-locus. Indeed, only a few S-alleles present in self-incompatible cultivars from China (the domestication center of the species) can be found in European cultivars. However, S C , one of the most diffused S-alleles and the main cause of SC in apricot has not been detected in China. Apart from the PPM associated to S C , other S-locus unlinked PPMs conferring SC have been found in apricot cultivars 'Canino' and 'Katy'. At odds with the initial hypothesis, we have shown that both are associated with the same m 0 -haplotype, in spite of the genetic distance between the two cultivars. Moreover, this haplotype has been surprisingly detected in other Spanish, French, Italian and North-American accessions. Its origin is still a matter of speculation but the phylogenetic analysis supports that m 0 arose later in time than S C from a widely distributed M-haplotype . As a whole, self-(in)compatibility in apricot seems to rely mainly on the S and M loci. However, other selfcompatible mutants putatively carrying different mutations have been identified in this work. These results deepen our knowledge on the self-(in)compatibility trait in stone fruits and open new questions that deserve future research.

Plant Material and self-pollination test
Sixty seven apricot cultivars and accessions from diverse geographic origins were used in this study (Table 1). Most are currently kept at the collection of the Instituto Valenciano de Investigaciones Agrarias (IVIA) in Valencia (Spain). Part of this collection was kindly provided by Frutales Mediterráneo S.A. (FM) company, by the Consellería de Agricultura, Pesca y Alimentación (CAPA) and by the Ministerio de Agricultura, Alimentación y Medio Ambiente of Spain (MAGRAMA). Other samples, already used in previous works [56], were provided by the Departamento de Mejora y Patología Vegetal del CEBAS-CSIC in Murcia (Spain) and by the University of St. Istvan (Budapest, Hungary).
Trees from different cultivars and accessions (Table 2) were tested for SC by self-pollination in the field. Before anthesis, insect-proof bags were put over several branches, containing approximately 200-250 flower buds in total per cultivar ad minimum, to prevent cross pollination. Subsequent fruit set was recorded and fruits collected about 3 months later. Seed-derived embryos were dissected from the rest of the seed tissue and stored at −20°C.

DNA extraction
Two leaf discs were collected from each accession, frozen in liquid N 2 and stored at −80°C before DNA isolation. Genomic DNA was extracted following the method of [57] with slight modifications: the ratio of fresh leaf tissue/CTAB buffer was reduced to 10 mg/250 μl and nucleic acids were recovered by precipitation with 1× volume of isopropanol. DNA quantification was performed by NanoDrop ND-1000 spectrophotometer (Thermo Fisher Scientific, Wilmington, DE) and integrity was checked on 1% agarose gel by comparison with lambda DNA (Promega, Madison, WI, USA). Embryo DNA was extracted by incubating for 10 min at 95°C with 20 μL of TPS (100 mM Tris-HCl, pH 9.5; 1 M KCl; 10 mM EDTA) isolation buffer [58].

S-locus genotyping
Apricot S-alleles were identified by PCR-amplifying fragments comprising first and second introns of the S-RNase as well as the 5′-UTR SFB intron, respectively. Genomic DNA isolated from cultivars and accessions listed in Table 1 was used as PCR template. PCRs were performed in a final volume of 20 μL containing 1 × DreamTaq buffer, 0.2 mM of each dNTP, 20 ng of genomic DNA and 1 U of DreamTaq DNA polymerase (Thermo Fisher Scientific, Waltham, MA, USA). Previously developed primers designed from conserved regions of Prunus armeniaca S-RNase genomic sequences, SRc-F and SRc-R [19,58] and from Prunus avium S-RNase-cDNA sequences, Pru-T2 and Pru-C2R [59], were used to amplify the first intron (see Additional file 5: Table S5) in all four possible combinations. The amplification was carried out using a temperature profile with an initial denaturing of 95°C for 2 min; 30 cycles of 95°C for 45 s, 52°C for 1 min and 72°C for 1 min 30 s; and a final extension of 60°C for 30 min (UNO96, VWR, Radnor, PA, USA). Each PCR was performed by the procedure of [60] using three primers: the specific forward primer with M13(−21) tail at its 5′ end at 0.4 mM, the reverse primer at 0.8 mM, and the universal fluorescent-labeled M13(−21) primer at 0.4 mM. Allele lengths were determined using an ABI Prism 3130 Genetic Analyzer with the aid of GeneMapper software, version 4.0 (Applied Biosystems, Foster City, CA, USA).
The second intron was amplified using two sets of primers designed from Prunus avium S-RNase-cDNA sequences [61]: Pru-C2/Pru-C4R and Pru-C2/Pru-C6R (see Additional file 5: Table S5). PCRs were performed using 0.25 μM of each primer and the program previously described by [62] to amplify long PCR products. PCR products were electrophoresed in 0.8% (w/v) agarose gels using 1 x TBE (89 mM Tris, 89 mM boric acid, and 2 mM EDTA (pH 8.0)) buffer, stained with RedSafe Nucleic acid Staining Solution (iNtRON Biotechnology, Korea) and visualized under UV light. Molecular sizes of amplified fragments were estimated using GeneRuler 100 bp Plus DNA ladder (Thermo Fisher Scientific).
The 5′-UTR SFB intron was also amplified using the degenerate primer pair (F-BOX5′A/F-BOXintronR) developed by Vaughan et al. [63] from sweet cherry sequences (see Additional file 5: Table S5). PCR components, thermo-cycler conditions and detection procedure were identical to that described above for the first S-RNase intron. Two additional primers (RFBc-F/SFBins-R), designed from the consensus sequence of the Prunus SFB alleles [17], were used to amplify the SFB C insertion from genomic DNA of several apricot cultivars in order to distinguish S C and S 8 -alleles (see Additional file 5: Table S5).

M-locus genotyping
Seven SSR markers comprised within (or flanking) the M-locus were genotyped: PGS3.71, PGS3.22, PGS3.62, PGS3.23 and PGS3.96 [25,26] and AGS-20 and AGS-30 (this work) (see Additional file 3: Table S3). SSR amplifications were performed in a GeneAmp® PCR System 9700 thermal cycler (Perkin-Elmer, Freemont, CA, USA) in a final volume of 20 μL, containing 1 × Dream-Taq buffer, 0.2 mM of each dNTP; 20 ng of genomic DNA and 1 U of DreamTaq DNA polymerase (Thermo Fisher Scientific). Regarding primers, PCRs were performed as described above by the procedure of [60]. The following temperature profile was used: 94°C for 2 min, then 35 cycles of 94°C for 45 s, 50-60°C for 1 min, and 72°C for 1 min and 15 s, finishing with 72°C for 5 min. Allele lengths were determined using an ABI Prism 3130 Genetic Analyzer with the aid of GeneMapper software, and 'Cow-2' (S 20 S 22 ) were also sequenced. Primer combinations SRc-F/SRc-R and Pru-C2/Pru-C4R were used in all cases for the first and second introns respectively except for S 24 (Pru-T2/SRc-R) and S 22 (Pru-C2/Pru-C6R). PCR products were electrophoresed in 0.8% or 2% (w/v) agarose gels (second or first intron, respectively) stained with RedSafe Nucleic acid Staining Solution (iNtRON Biotechnology). Molecular sizes of the amplified fragments were estimated using GeneRuler 100 bp Plus DNA ladder (Thermo Fisher Scientific). Fragments were extracted and purified from the agarose gels using the Zymoclean Gel DNA Recovery kit (Zymo Research, Irvine, CA, USA). Sequences were determined