Genome survey of pistachio (Pistacia vera L.) by next generation sequencing: Development of novel SSR markers and genetic diversity in Pistacia species

Pistachio (Pistacia vera L.) is one of the most important nut crops in the world. There are about 11 wild species in the genus Pistacia, and they have importance as rootstock seed sources for cultivated P. vera and forest trees. Published information on the pistachio genome is limited. Therefore, a genome survey is necessary to obtain knowledge on the genome structure of pistachio by next generation sequencing. Simple sequence repeat (SSR) markers are useful tools for germplasm characterization, genetic diversity analysis, and genetic linkage mapping, and may help to elucidate genetic relationships among pistachio cultivars and species. To explore the genome structure of pistachio, a genome survey was performed using the Illumina platform at approximately 40× coverage depth in the P. vera cv. Siirt. The K-mer analysis indicated that pistachio has a genome that is about 600 Mb in size and is highly heterozygous. The assembly of 26.77 Gb Illumina data produced 27,069 scaffolds at N50 = 3.4 kb with a total of 513.5 Mb. A total of 59,280 SSR motifs were detected with a frequency of 8.67 kb. A total of 206 SSRs were used to characterize 24 P. vera cultivars and 20 wild Pistacia genotypes (four genotypes from each five wild Pistacia species) belonging to P. atlantica, P. integerrima, P. chinenesis, P. terebinthus, and P. lentiscus genotypes. Overall 135 SSR loci amplified in all 44 cultivars and genotypes, 41 were polymorphic in six Pistacia species. The novel SSR loci developed from cultivated pistachio were highly transferable to wild Pistacia species. The results from a genome survey of pistachio suggest that the genome size of pistachio is about 600 Mb with a high heterozygosity rate. This information will help to design whole genome sequencing strategies for pistachio. The newly developed novel polymorphic SSRs in this study may help germplasm characterization, genetic diversity, and genetic linkage mapping studies in the genus Pistacia.

eurycarpa, Yalt., P. lentiscus L., and P. terebinthus L., expanded in almost all parts of Anatolia. Other well-known Pistacia species in the world are P. integerrima Stewart and P. chinensis Bunge. [7]. Pistachio plants are long-living with a juvenile period of approximately 5-10 years. In addition, wild Pistacia species have edible seeds and are used as rootstock seed sources for cultivated P. vera, and sometimes, fruit consumption, oil extraction, soap production, and as forest trees [8].
SSRs are useful tools as molecular markers and are very polymorphic due to their high mutation rate, which affects the number of repeat units [17]. They are very useful for assaying diversity in natural populations or germplasm collections, and for fingerprinting and parental identification. They are very valuable markers especially for genetic linkage mapping and evolutionary studies [18] and have a high level of transferability between closely related species. The development of SSR markers from P. vera [11,[19][20][21] and wild Pistacia species has been described in several studies [22,23].
Next generation sequencing (NGS) has provided a new perspective for research, owing to its high throughout and speed of data generation. So far, NGS has been applied to genomics-based strategies to discover sequences for new SSR markers in plants, in a time and cost-effective manner [24]. SSR development studies from a genome survey were performed in different plant species [25][26][27]. Genome survey studies also provide information about genome structure of a plant species, including estimates of genome size, levels of heterozygosity, and repeat contents. A study by Horjales et al. [28] is the only one in the literature to estimate genome size in the genus Pistacia. The genome size of P. terebinthus was estimated to be 2C = 1.32 Gb by flow cytometry.
Recently, genetic structure analyses have focused on the collection, protection, and utilization of germplasm for a plant species [29,30]. It is important to explore population structure to avoid false genetic trends and to identify cultivars with specific or minor alleles that will be important for molecular breeding programs [31]. However, as far as we know, information on the population structure of Pistacia collections assessed using a large and comprehensive set of SSR markers is limited.
In this study, we aim to (1) estimate the genome size, GC content, and heterozygosity rates of pistachio (P. vera cv. Siirt) using a genome survey, (2) to perform genomewide characterization of SSRs in the P. vera genome, (3) develop novel SSR markers for Pistacia species from a genome survey study, (4) determine transferable and polymorphic SSR markers for other Pistacia species, and (5) reveal the population structure of Pistacia germplasm. To our knowledge, this is the first report revealing genome structure and genome-wide SSRs in pistachio. The results of this study will provide essential information for further studies in pistachio such as whole genome sequencing and SSR-based genetic linkage mapping.

K-mer analysis
A total of 26.77 Gb were used for K-mer analysis. The 17mer frequency distribution derived from the sequencing reads was plotted in Fig. 1; the peak of the 17-mer distribution was about 28, and the total K-mer count was 16,684,162,450; therefore, the genome size of pistachio was estimated as 596 Mb. A small peak observed at half the peak-depth showed a high level of heterozygosity for P. vera. Simulation of the P. vera genome with different heterozygosity rates showed it to be about 1% (Fig. 1). We did not observe a fat tail in the K-mer analysis; therefore, the number of repeats in the pistachio genome may be low. The distribution of GC content versus sequencing depth (Fig. 2) may provide information about sequencing bias. The GC content was about 37.1% in pistachio. There is also a region (red region) with an average depth around Fig. 1 17 K-mer analysis for estimating the genome size of Pistacia vera cv. Siirt. The X-axis is depth (X) and the Y-axis is the proportion that represents the frequency at that depth. The H_0.01 means that the heterozygous rate is 1% half that of main average depth, which may be caused by the high rate of heterozygosity.
Assembly and identification of SSR loci in P. vera Assembly was performed using 26.77-Gb Illumina PE reads. The length of contig N50 was 2327 bp, and the scaffold N50 was 3399-bp long. The total length of scaffolds was 513.5 Mb. The number of scaffolds ≥100 bp was 893,901 and ≥2 kb were 44,900 (Table 1).

Development, screening, and polymorphism of SSRs
In the initial screen of 950 randomly selected SSR primer pairs in three (P. vera cv. Siirt, P. vera cv. Bağyolu and one monoecious P. atlantica) genotypes, 610 (64.2%) generated amplification products, 197 (20.7%) loci were monomorphic, and the remaining 143 (15.1%) SSR loci failed to generate amplification products. Of the 610 that amplified, 204 polymorphic and easily scorable SSR loci were selected to study genetic diversity in Pistacia. Of these, 193 were perfect (94.6%), 8 (3.9%) were compound, and 3 (1.5%) were interrupted repeats. Dinucleotide motifs were the most abundant (63.2%), followed by tri-(18.0%), hexa-(12.8%), tetra-(3.8%), and pentanucleotide motifs (2.2%). The sequences of 204 SSR loci were deposited into NCBI and were given in Additional file 1 (GenBank accession numbers KX223398-KX223601; Additional file 1). Two SSR primer pairs (CUPVSiirt568 and CUPVSiirt689) amplified at two loci, and 206 SSR loci were obtained and used to study genetic diversity in Pistacia. GC content and average sequencing depth. The X-axis represents GC content and the Y-axis represents the average depth. Red region whose average depth is around the half of main average depth, which may be caused by the high heterozygous rate

Diversity measures of novel SSR loci in Pistacia
Genetic diversity was studied by analyzing a total of 44 cultivars and genotypes: 24 P. vera cultivars and 20 wild Pistacia genotypes (four genotypes from each five of wild Pistacia species) belonging to P. atlantica, P. integerrima, P. chinenesis, P. terebinthus, and P. lentiscus genotypes. Allele ranges, number of alleles (Na), effective number of alleles (Ne), polymorphism information content (PIC), expected (He), and observed (Ho) heterozygosities of the 206 SSR loci are presented in Table 2.
A total of 2036 alleles were produced by 206 SSR loci in 44 Pistacia cultivars and genotypes, ranging from 2 to 19 per locus. The highest number of alleles was obtained from the CUPVSiirt86 locus. The CUPVSiirt298, CUPV-Siirt956, CUPVSiirt1330, and CUPVSiirt1405 loci also produced a high number of alleles (Na = 18). The effective number of alleles ranged from 1.09 (CUPVSiirt18) to 10.58 (CUPVSiirt1330) with an average of Ne = 4.67. The CUPVSiirt465 (Ne = 9.93), CUPVSiirt616 (Ne = 9.88), CUPVSiirt357 (Ne = 9.24), and CUPVSiirt625 (Ne = 9.05) loci also had high effective numbers of alleles. The observed heterozygosity (Ho) ranged from 0.0 to 0.82 with an average of Ho = 0.41. The CUPVSiirt86 locus was the most heterozygous, whereas the CUPVSiirt17 and CUPV-Siirt924 loci were homozygous. The average He value was 0.74, which ranged between 0.08 (CUPVSiirt18) and 0.91 (CUPVSiirt1330). The PIC values ranged from 0.08 to 0.90, with an average of 0.71 ( Table 2).

Diversity of the SSRs in each of six Pistacia species
In P. vera, all 206 SSR loci generated amplification products, and a total of 897 alleles were produced with an average of 4.5 alleles per locus. Two-hundred (97.1%) SSR loci were polymorphic in 24 pistachio cultivars. The highest number of allele (Na = 11) was obtained from the CUPV-Siirt1330 locus. The effective number of alleles ranged from 1.04 to 7.60 (CUPVSiirt616). The average observed heterozygosity (Ho) was 0.46, and the CUPVSiirt86 and CUPVSiirt1273 loci were the most heterozygous. The highest expected heterozygosity (0.87) and PIC (0.85) values were produced from the CUPVSiirt616 locus. The average He and PIC values in P. vera were calculated as 0.55 and 0.50, respectively (Additional file 2).
In P. atlantica, 200 SSR loci generated amplification products with a high rate of transferability (97.1%). Thirty-nine (19.5%) of the amplified SSR loci were monomorphic and the rest were polymorphic (80.5%). A total of 527 alleles were produced by 161 polymorphic SSR loci, with an average of 3.3 alleles per locus. The average observed heterozygosity (Ho) was 0.48. The highest number of alleles (Na = 7), effective number of alleles (Ne = 6.4), expected heterozygosity (0.84), and     In P. integerrima, the transferability of SSR loci was also high, with a rate of 93.7%. Of the amplified SSR loci, 157 (81.3%) were polymorphic in P. integerrima. A total of 416 alleles were produced by 157 SSR loci with an average of 2.70 alleles per locus, and the highest number (Na = 5) of alleles was obtained from the CUPVSiirt131, CUPVSiirt742, CUPVSiirt838, and CUPVSiirt1330 loci. The highest effective number of alleles (4.57) was calculated at the CUPVSiirt742 locus. The average observed (Ho) and expected (He) heterozygosities were 0.50 and 0.52, respectively. The highest values for expected heterozygosity (0.78) and PIC (0.75) were produced from the CUPVSiirt742 locus. The average PIC value in P. integerrima was 0.44 (Additional file 4).
In P. terebinthus, 183 SSR loci (88.8%) generated amplification products and 142 (77.6%) were polymorphic. A total of 416 alleles were produced by 142 polymorphic SSR loci, ranging from 1 to 7 with an average of 3.4 alleles per locus. The effective number of alleles ranged from 1.28 to 6.40. The observed heterozygosity (Ho) ranged from 0.0 to 1.0, with an average of 0.47. The highest number of alleles (Na = 7), effective number of alleles (Ne = 6.4), expected heterozygosity (0.84), and PIC (0.82) were obtained from the CUPVSiirt1017, CUPVSiirt1326, CUPVSiirt1405, and CUPVSiirt1406 loci. The average He and PIC values for P. terebinthus were 0.56 and 0.50, respectively (Additional file 5).
In P. chinensis, 177 (85.9%) loci amplified in SSR-PCR analysis, and 119 loci (67.2%) were polymorphic. A total of 365 alleles were amplified from 119 polymorphic SSR loci with an average of 3.1 alleles per locus. The average observed heterozygosity (Ho) was 0.48. The highest values for He and PIC were 0.84 and 0.82, respectively. The CUPVSiirt836 locus amplified the highest number of alleles and had the highest level of polymorphism. The average values for He and PIC in P. chinensis were 0.54 and 0.48, respectively (Additional file 6).
In P. lentiscus, 151 (73.3%) SSR loci amplified, with the lowest transferability among the five wild Pistacia species studied. Of the amplified SSR loci, 83 (55.0%) were polymorphic. A total of 217 alleles were obtained by 83 polymorphic SSR loci in P. lentiscus, ranging from 1 to 6, with an average of 2.6 alleles per locus. The effective number of alleles ranged from 1.28 to 4.57. The observed heterozygosity (Ho) ranged from 0 to 1 with an average of 0.50. The average values for He and PIC in P. lentiscus were 0.49 and 0.41, respectively. The highest values for Na, Ne, He, and PIC values were obtained from the CUPVSiirt1797 locus (Additional file 7).  Table 3. The highest transferability was obtained in P. atlantica and P. integerrima, while P. lentiscus had the lowest transferability. In wild Pistacia species, the highest average number of alleles was obtained in P. atlantica and P. terebinthus, while P. lentiscus had the lowest number.

Cluster analysis and genetic structure
Cluster analysis was performed for 24 P. vera cultivars and 20 wild Pistacia genotypes using 136 SSR loci that amplified in all tested Pistacia species. UPGMA analysis showed that all Pistacia species and genotypes were clearly separated from each other. Two main clusters were observed: the first cluster contained all individuals from P. vera, whereas the second cluster included wild Pistacia species: P. atlantica, P. integerrima, P. chinensis, P. terebinthus, and P. lentiscus (Fig. 5). P. atlantica was the closest species to P. vera, while P. lentiscus was the most distant.
The genetic structure of the Pistacia genotypes used in this study is shown in Fig. 6. A model-based clustering method was performed for all 44 genotypes using 136 SSR loci. The most probable number of clusters was identified by calculating the Delta K (ΔK), which is based on the rate of change in the log probability of data between successive K values (K = 1 to K = 10). The peak of the ΔK graph corresponds to the most probable number of populations in the data set. The highest number of delta K (ΔK) was found at K = 2 (Fig. 7), where all 44 genotypes were divided into two main groups similar to the UPGMA dendrogram (Fig. 5). As the value for K increased to 3, the genotypes in group 2 were divided into two sub-groups: the first subgroup contained P. lentiscus and P. terebinthus and the second subgroup contained the other wild Pistacia species. When K = 4, the wild Pistacia genotypes were divided into three subgroups: the first group included P. lentiscus, the second group contained P. terebinthus, and P. chinensis, and the third group contained P. atlantica and P. integerrima. When  6 Population structure of 24 P. vera cultivars and twenty wild genotypes belong to P. atlantica, P. integerrima, P. terebinthus, P. chinensis and P. lentiscus. K = 2 to K = 10 represent the sub-populations K = 5, all Pistacia species were separated from each other, with the exception of P. terebinthus and P. chinensis. When K = 6, all six Pistacia species were clearly separated. When K = 7, the Pistacia species were again divided into six groups, and P. vera cultivars were grouped based on their origins, which was supported by another high ΔK was found at K = 7 (Figs. 5, 6 and 7). The cultivars originated from Iran were in one cluster, and the other cultivars were in the other cluster.

Discussion
Genome size, heterozygosity, and GC content The development of NGS technology has provided researchers with an affordable way of addressing a wide range of questions, especially in non-model species such as pistachio. In addition, the K-mer method has been successfully applied for the estimation of genome size using NGS reads without prior knowledge of the genome size [32]. This approach has been used to analyze a number of plant genomes such as in switchgrass (Panicum virgatum L.) [33], Chinese bayberry (Myrica rubra Sieb. et Zucc.) [26], Chinese jujube (Ziziphus jujuba Mill) [34], and Rosa roxburghii Tratt [32]. Here, for the first time, we report a genome survey of P. vera using whole genome shotgun sequencing. The 17-nucleotide depth distribution suggested that the genome size of P. vera is about 600 Mb, which is close to the size (660 Mb) previously estimated for P. terebinthus using flow cytometry [28]. The estimated genome size of pistachio was found to be smaller than that of apple [35], and larger than that of peach [36], sweet orange [37], and poplar [38]. The small size of the pistachio genome may encourage scientists to perform whole genome sequencing in this species. The K-mer analysis also suggested a high level of heterozygosity for P. vera, which is probably due to the dioecious mating system in this genus. Information about the genome structure of pistachio from this study may be useful for whole genome sequencing in these plants.
The average GC content of the P. vera genome was higher than that of wild sweet potato (36.0%) [39], but lower than that of switchgrass (45.5%) [33] and Chinese jujube (48%) [34]. Different GC contents may result in sequencing bias on the Illumina sequencing platform, and can, therefore, seriously affect genome assembly [40,41]. GC content was one of three factors found to contribute to sequencing bias on the Illumina sequencing platform [42]. High and low GC contents result in reduced coverage in sequencing regions [41].

SSR polymorphisms in Pistacia
From the 59,280 SSRs detected in the genome survey of pistachio in this study, primer design was performed for 950 loci. Initial screening of these loci for polymorphisms and ease of scoring revealed that 206 SSR loci were polymorphic and had good amplifications in genetic diversity analyses of 44 Pistacia genotypes. In P. vera, 200 SSR loci were polymorphic and can be used in further studies investigating genetic linkage mapping, germplasm characterization, fingerprinting, and genetic diversity. Several reports have been published on SSR development in P. vera [11,[19][20][21]. Those authors reported a total of 137 polymorphic SSR loci in P. vera. Therefore, the number of polymorphic SSR loci developed in the present study was higher than that previously reported for P. vera.
Wild Pistacia species have been used commonly for rootstock seed sources and forest trees. We also tested 206 SSR loci in five wild Pistacia species for their possible use in genetic diversity studies. We report polymorphic SSR loci for each species: 161 for P. atlantica, 157 for P. integerrima, 142 for P. terebinthus, 119 for P. chinensis, and 83 polymorphic for P. lentiscus. Albaladejo et al. [22] only developed eight polymorphic SSR loci for P. lentiscus. Zaloglu et al. [20] and Topcu et al. [11] developed SSRs from P. vera and tested them for PCR amplification in wild Pistacia species. The authors did not analyze the SSR loci for polymorphism. Therefore, this study describes an important number of novel polymorphic SSRs for each of five wild Pistacia species.
The transferability rates of SSRs in this study were high; this is consistent with findings from previous studies performed by Zaloglu et al. [20] and Topcu et al. [11]. Taken together, these data demonstrate that SSRs are very powerful tools for use in synteny analysis in Pistacia. The lowest transferability rate was obtained for P. lentiscus, which is one of the most distant species to P. vera in the genus [1,2]. The average number of alleles (Na = 4.5) in P. vera was higher in this study, while PIC, observed (Ho), and expected (He) heterozygosities were similar to the values obtained in previous SSR development studies [11,[19][20][21].
The first genetic linkage map in pistachio was constructed by Türkeli and Kafkas [13] using an F1 interspecific population between P. vera (cv Siirt) and monoecious P. atlantica (Pa-18). The same cultivar and the monoecious genotype were also plant materials of this study. All the 206 SSR loci in this study were polymorphic between Siirt cultivar and monoecious Pa-18 genotype. Türkeli and Kafkas [13] constructed 17 and 19 linkage groups in Siirt and Pa-18, respectively, that were higher than haploid chromosome number of pistachio, n = 15 [43]. Therefore, the polymorphic SSR loci developed in this study may facilitate to construct a reference SSR-based linkage map of pistachio.

Cluster analysis and genetic structure in Pistacia
All the genotypes and all six Pistacia species separated and consistently grouped well in both cluster and structure analysis. The dendrogram and structure analysis at K = 2 divided Pistacia species into two main clusters: the first cluster included all P. vera cultivars, while the second cluster contained the wild genotypes belonging to five Pistacia species. The second cluster divided into two subclusters: the first subcluster contained P. atlantica and P. integerrima species, while the second subcluster contained P. chinensis, P. terebinthus, and P. lentiscus. P. atlantica was the closest species to P. vera, whereas P. lentiscus was the most distant species. Similar results were obtained from other studies undertaking phylogenetic analyses in the genus Pistacia [1,2,9]. Kafkas and Perl-Treves [44] divided the genus into two main groups: the first group included species with large, single-trunked trees, whereas the second group included species that mostly grow as shrubs or small trees. P. terebinthus and P. lentiscus were in the same group. These species were also clustered in the same group and were found to be the closest species in this study. A similar result was reported by Kafkas [1] using AFLP markers.
The most comprehensive study on the relationships between P. vera cultivars was performed by Kafkas et al. [14]. Those authors grouped 69 P. vera cultivars into two main groups: the first contained the cultivars originated from Iran and the second included the cultivars from Mediterranean countries such as Turkey, Syria, and Greece. We also observed similar groupings and genetic relationships among 24 pistachio cultivars in this study. Structure analysis at K = 7 also supported the findings of the previous study performed by Kafkas et al. [14], and was confirmed by the UPGMA clustering analysis in this study. Iranian cultivars such as Ohadi and Kallehghouchi were clustered in the Iranian group while the other cultivars were clustered in the Mediterranean group. The Australian cultivar, Sirora was in the Iranian group, and this was supported by structure analysis at K = 7. There were two Kerman clones in the germplasm of the Pistachio Research Institute, which were introduced from Spain and USA. The clone introduced from Spain was clustered in the Iranian group and the other had a close relationship with the Siirt cultivar from Turkey. Further study is necessary to elucidate the real Kerman. The male cultivars Atli, Uygur, Bagyolu, and Kaska were in the separate group within P. vera in the UPGMA dendrogram. The structure analysis demonstrated that these cultivars have a wild origin as they share an important number of alleles with the wild Pistacia species.

Conclusions
We had approximately 40× Illumina data coverage for the genome survey in P. vera to gain knowledge on the structure of the pistachio genome. The assembled data were also used to search for SSRs in the pistachio genome, to develop novel SSR markers, and to study genetic diversity in six Pistacia species. K-mer analysis indicated that the pistachio genome is highly heterozygous and is about 600 Mb in size. The level of repeats in the pistachio genome was not high and the GC content was about 37.1%. The SSR search in the assembled genome revealed 59,280 SSRs with a frequency of 8.67 kb. A total of 206 polymorphic SSR loci were developed from 950 SSR loci: 136 had amplifications and 41 were polymorphic in all six Pistacia species. In conclusion, in this study, we present the first data on the structure of the pistachio genome, which may help to design whole genome sequencing studies in pistachio. Furthermore, we also report novel polymorphic SSR markers for six Pistacia species, which will enable further genetic mapping, genetic diversity, and germplasm characterization studies to be performed in the genus.
About 4-5 g fresh leaves were sampled from germplasm collections of Çukurova University in Adana and Pistachio Research Institute in Gaziantep. Genomic DNA was extracted using the CTAB protocol [45] with minor modifications as described by Kafkas et al. [46]. DNA concentrations were measured using a Qubit Fluorimeter (Invitrogen) or were estimated by comparing the band intensity with λ DNA of known concentrations following 0.8% agarose gel electrophoresis and ethidium bromide staining. DNA samples were subsequently diluted to a concentration of 10 ng/μL for SSR-polymerase chain reaction (PCR).

Genome survey and microsatellite identification
For the genome survey study, 26.77 Gb clean data were generated after removing low quality reads from two different libraries: 18.72 Gb data was from a 250-bp library with 150-bp pair-end (PE) reads and 8.05 Gb data was from a 500-bp library with 90-bp PE reads. The library constructions and sequencing were performed at the Beijing Genomic Institute, China. All data were used to perform K-mer analysis. Based on the results of the Kmer analysis, information on peak depth and the number of 17-mers was obtained and used to estimate the size of the genome, repetitive sequences, and heterozygosity. Its relationship was expressed by using the following algorithm: Genome size = K-mer num/Peak depth, where the K-mer_num is the total number of Kmer, and Peak_depth is the expected value of K-mer depth. Assembly was performed using SOAPdenovo v2.01 software [47] and the GC depth distribution was determined by SOAPaligner v2.21 [48].
SSR loci were searched using SSRIT [49] software. The search parameters were set for the detection of di-, tri-, tetra-, penta-, and hexanucleotide SSR motifs with a minimum of 6, 5, 4, 4, and 4 repeats, respectively. The SSR loci were subjected to primer design using Primer 3 web based software [50] with the standard parameters.

Primer selection and PCR conditions
A total of 950 randomly selected primer pairs were synthesized and used for SSR development. PCR and capillary electrophoresis were performed to initially screen SSR primer pairs for polymorphism using two P. vera cultivars (Siirt and Bağyolu) and one monoecious P. atlantica genotype (Pa-18), which are parents in our monoecious cultivar breeding program. Then, 204 SSR primer pairs were selected for further studies based on their polymorphism and ease of scoring.
Amplification was performed in two steps as follows: initial denaturation at 94°C for 3 min, followed by 10 cycles at 94°C for 30 s, 58°C for 45 s, and 72°C for 60 s. The second step included 30 cycles at 94°C for 30 s, 58°C for 45 s and 72°C for 60 s, and a final extension at 72°C for 10 min. When the PCR was completed, the reactions were subjected to denaturation for capillary electrophoresis in an ABI 3130xl genetic analyzer [Applied Biosystems Inc., Foster City, Calif. (ABI)] using a 36-cm capillary array with POP7 as the matrix (ABI). Samples were fully denatured by mixing 0.5 μL of the amplified product with 0.2 μL of the size standard and 9.8 μL formamide. The fragments were resolved using ABI data collection software 3.0, and SSR fragment analysis was performed with GeneScan Analysis Software 4.0 (ABI).
SSR markers were prefixed with CUPVSiirt; CU denotes Cukurova University and PVSiirt denotes the Pistacia vera cv. Siirt, from which the SSRs were isolated. Following digits were obtained from the SSR number, x and y were used to identify different SSR loci produced by the same primer pair.

Data analysis
The 204 SSR primer pairs selected in the initial screening were used to evaluate the genetic diversity of 24 P. vera cultivars. The SSR loci were also tested to determine their genetic diversity and transferability to P. atlantica, P. terebinthus, P. integerrima, P. chinensis, and P. lentiscus species. Transferability of the SSR markers was calculated for each Pistacia species by comparing the number of amplified loci with the total number of loci analyzed. Number of alleles (Na), number of effective alleles (Ne), observed (Ho), and expected (He) heterozygosity were calculated using GenAlEx version 6.5 [52]. The polymorphism information contents (PIC) of each locus was calculated using PowerMarker software version 3.25 [53]. A dendrogram was obtained using NTSYSpc v2.21c [54] software by unweighted pairgroup method with arithmetic averages (UPGMA). STRUCTURE 2.3.4 software [55] was also used to determine the number of populations and for construction