Molecular Identification of Date Palm Varieties Using Chloroplast Barcode atpF-atpH Spacer

DNA barcoding is a technique for discriminating and identifying species using short, variable, and standardized DNA regions. Here, we tested for the first time the performance of chloroplast atpF-atpH spacer as DNA barcodes in Phoenix dactylifera varieties. The lack of differential morphological and anatomical useful characters, and interspecific hybridization, make identification of Phoenix species difficult. In this context, the development of reliable DNA markers for varieties identification would be of great utility. therefore, the present study aimed at the evaluation of genetic relationship based on chloroplast atpF-atpH spacer was amplified and sequenced from selected varieties. Phylogram illustrated over all genetic distance of 0.0002 representing close genetic relationship of selected P. dactylifera varieties. Pairwise distance was calculated for atpF-atpH spacer and very low genetic diversity value was observed (0.002). Estimates of average evolutionary divergence of overall sequence pairs and nucleotide diversity were again found very low with 0.008. Based on atpF-atpH genetic makeup, it can be suggested that date palm varieties show very high degree of similarity.


INTRODUCTION
The genus Phoenix L. (Arecaceae) comprises 14 species (Govaerts & Dransfield 2005), Date palm (Phoenix dactylifera L.) is one of the ancient domesticated fruit tree with a great socioeconomic importance and nutritional value (Barreveld, 1993;Elshibi, 2009). It is the major crop for agricultural income in arid and desert areas (Hodel and Johnson, 2007). There are almost 5000 date palm cultivars all around the world (Osman, 1984;Bashah, 1996;Jaradat & Zaid, 2004). Determination of genetic relationships among date palm cultivars is of major importance for characterization of date palm germplasm, breeding programs, and conservation purposes (Haider et al., 2012). Fruit morphology (Sedra et al., 1998) biochemical markers (Herny, 1998;Gothwal et al., 2013) used for genotype identification are found to be complex and altered by environment.
Several molecular markers have been applied for genetic diversity assessment, such as RAPD (Sedra et al., 1998;Trifi et al., 2000;Al-Khalifa and Askari, 2003;Mirbahar et al., 2014), ISSRs (Zehdi et al., 2002) SSRs (Zehdi et al., 2004;Elmeer et al., 2011) RAMPO (Rhouma et al., 2008 and AFLP (Devanand & Chao, 2003;Bandelj et al., 2004;Rhouma et al., 2007;Khierallah et al., 2011). These nrDNA markers revealed high polymorphism among date palm cultivars but it remained difficult to describe cultivars. However, cpDNA sequences can be used to estimate phylogeny (Jamil et al., 2014). CpDNA has high phylogenetic potential than nrDNA as it is enough variable but conserve to be less variable within than between species (Filiz, 2012). Enan and Ahmed (2014) firstly attempted cpDNA in date palm in United Arab Emirates cultivars. There is need to generate suitable molecular markers to get a deeper and enough insight of the genetic diversity of date palm. Hebert et al. (2003) introduced the concept of "DNA barcode" as a new approach to taxon recognition, assuming that a short standardised DNA sequence can distinguish individuals of a species because genetic differentiation between species exceeds that within species. Since then, DNA barcoding has become increasingly important as a tool in taxonomic studies and species delimitation, as well as in the discovery of new (cryptic) species (Hebert et al., 2004;DeSalle et al. 2005, Hebert and Gregory 2005, Savolainen et al., 2005, Hajibabaei et al., 2007. The first DNA barcoding analysis in palms (Jeanson et al. 2011) achieved a 92% success in species discrimination by applying a combination of three markers (the plastid matK and rbcL, and the nuclear ITS2) to the tribe Caryoteae. In order to access phylogentic relationship of selected date palm varieties in this study, atpF-atpH intergenic spacer was evaluated for discrimination power to identify date palm variaties.

Plant Material
Fresh and young leaves of 30 different varieties (Table 1) of date palm (Phoenix dactylifera L.) were collected from varous area of United Arab Emirates for present research work and plant samples were stored at -20ºC. DNA extraction Total genomic DNA was extracted from fresh plant material using the DNeasy™ Plant Mini Kit (Qiagen, UK). For each sample, genomic DNA was extracted from 100 mg of freezeleaf tissue which was first grinded using a bead-blaster homogenizer (Benchmark scientific, USA). Extracted DNA was quantified by means of a Nanodrop ND1000 spectrophotometer (Thermo Fisher Scientific Inc., USA) and visualized on 1% agarose gels stained with ethidium bromide.

PCR amplification and Nucleotide sequencing
The atpF-atpH intergenic spacer of date palm chloroplast DNA was amplified using universal primer atpF-atpH; atpF 5'-ACTCGCACACACTCCCTTTCC-3', atpH 5'-GCTTTTATGGAAGCTTTAACAAT-3'; designed by Lee et al. (2007). PCR reactions were prepared in 25 µl of total volume, containing the following reagent concentrations: 12.5 μL Taq PCR Master Mix (Qiagen, UK), yielding a final concentration of 200 μM of each deoxynucleotide and 1.5 mM MgCl 2 , 1 μM of each primer (Eurofins MWG Operon, Germany), 2 μL (20 ng) genomic DNA, and the rest was adjusted with DNasefree sterile water. PCR amplification was performed using a T100 thermal cycler (BioRad, USA) as follows: 95°C for 5 min, followed by 35 cycles of 95°C for 30 s, 55°C for 30 s, and 72°C for 60 s. The final elongation step at 72ºC for 10 minutes was done to make sure that any remaining singlestranded DNA became fully extended. Cycle sequencing products were performed using the Big Dye Terminator v3.1 kit (Applied Biosystems, USA), then analysed on an ABI 310 automated DNA Sequencer (Applied BioSystems, USA).

Data analysis
Using NCBI, atpF-atpH sequences of all 25 date palm varieties were uploaded and The Basic Local Alignment SearchTool (BLASTn) was performed one by one in query form in comparison to already reported sequences in Genbank. After BLASTn, all sequences generated in the present study were deposited in GenBank for reference; their accession numbers are provided in Table 1. The sequencing data acquired for all 25 genotypes of date palm for the atpF-atpH intergenic spacers was aligned separately using CLUSTALW through MEGA 6.0 (Tamura et al., 2013). Phylogenetic trees were inferred with the maximum likelihood (ML), neighborjoining tree (NJ), and UPGMA methods. The topologies of the phylogenetic trees were evaluated using the bootstrap resampling method with 1000 replicates. Codon positions included were 1st + 2nd + 3rd + noncoding. Pairwise distance, transitional/transversional substitutions, and phylogenetic analyses were conducted using MEGA6.0 (Tamura et al. 2013). Genetic variation among date cultivars was estimated by calculating the number of polymorphic sites and mutations, haplotype diversity, and nucleotide diversity by using the DnaSP software (Librado and Rozas, 2009). Levels of genetic diversity were quantified by indices of haplotype diversity (Hd) (Nei and Tajima 1983) and pairwise estimates of nucleotide divergence (Pi) (Jukes and Cantor 1996).We besides used DnaSP to estimate the average of nucleotide differences (k), and the average number of nucleotide differences between cultivar. To test the population expansion, we performed neutrality tests with Tajma's D (Tajima 1989), and Fu and Li's (1993) in order to experiment the null hypothesis that sequences are evolving according to neutral expectations. For each sequence, length and proportion of GC and AT contents were estimated and transition/transversion bias was calculated. The alignment was manually checked and pairwise sequence divergence between cultivars was calculated according to the Tamura-3 prameter (Tamura, 1992).

Sequence analysis
The sequence data atpF-atpH spacer obtained from date palm cultivars was aligned and subjected to BLASTn using NCBI. Similarity index percentage was checked with P. dactylifera chloroplast complete genome (Accession No. GU811709.2) and accession numbers for all the sequences were obtained from Genbank and published under the accession numbers listed in Table1. For atpF-atpH region, DNA sequence varied from 594 bp for Jabiri cultivar to 708 bp for Barhi cultivar (Table 1) with an average of 672 pb length (Table 1). In addition, the GC content of the atpF-atpH sequences varied from 29.3% to 31.2%, and the AT one from 68.8% to 70.7 % ( Table 1). The atpF-atpH barcode exhibited complete PCR success (100%). High-quality sequencing data were obtained for atpF-atpH with a success rate of 83.3 % ( Table 2). The haplotype diversity (Hd), variance of haplotype diversity, nucleotide diversity (Pi), theta (per site) from Eta, average number of nucleotide differences (K) among all varieteies was found to be 0.953, 0.00058, 0.62918, 0.69463, 373.73, respectively, in atpF-atpH (Table 3). Pairwise distance was calculated based on atpF-atpH region using MEGA6. The value of genetic diversity was 0.002. Mean theta was used for estimating intraspecific divergence (Table 3). The intraspecific divergence was (θ=0.0005). The ideal barcode should show large interspecific differentiation but low intrasspecfic divergence (Table 3). These very low distance values show that all varieties are genetically closely related to each other and there is low genetic diversity among them based on atpF-atpH region. The sequence analysis showed noticeable nucleotide polymorphism among date palm varieties. For the atpF-atpH region, the analysis involved 25 nucleotide sequences. The T to G, A to C, G to C and C to G transversion rate was 7.05, that of A to T, G to T, T to A and C to A was 17.95, no transitional subsitituion are detected. Frequency of the nucleotide susbsitition were A= 35.89, T/U= 35.89, C=14.11, and G=14.11 (Table 4)

Test of selective neutrality
Regularly used approach for detecting selection is to use a neutrality test statistic based on allele frequencies, with Tajima's D being the most famous (Korneliussen et al., 2013). For atpF-atpH region, selective neutrality tests show that tests were negative and not significant (Tajima's D = -0.3795 (P>0.1); Fu and Li's D*= -1.2105 (P>0.1); Fu and Li's F*=-0.8211 (P>0.1). Twenty-five numbers of atF-atpH sequences (m) gave one segregation sites (S) revealing very low nucleotide diversity (π) of 0.0019 (Table 5). This low nucleotide diversity is an indication of close genetic relationship of studied date palm varieties. Phylogenetic analysis: Three tree building methods were assessed to test their identification powers among the date palm varieteies. The neighbor-joining (NJ), Maximum likelihood (ML), and unweighted pair group method with arithmetic mean (UPGMA). In this stutudy no diffeneces between results of NJ-, Ml-and UPGMA-tree based analysis. Overview of phylogenetic trees using atpF-atpH region illustrated t h a t date palm varieties indicated very little genetic distance 0.0002 showing close genetic similarity among them (Figure 1). These phylograms supported the varieties' organization into two main clades denoted by clade I and clade II. Clade I separate the cultivars "Chichi, Umsala, and Gash habash" from all the other ones, with 65% bootstrap value (Figure 1).

DISCUSSION
One of the most significant applications of DNA barcoding is to overcome taxonomic obstacles, where it is difficult to identify unknown or wrongly named species in a family with similar morphology. ) also demonstrated that atpF-atpH, a noncoding spacer could serve as a universal DNA barcoding marker for species-level identification. The current study therefore seeks to among others test the informativeness of this barcode region in discriminating P. dactylifera diversity at sub-species level. As nrDNA molecular markers, RAPD, ISSR, AFLP, RAMPO, microsatellite as well as isozyme markers revealed high level of polymorphism so it remained problematic to effectively characterize at cultivar level in date palm (Baaziz et al., 2000;Zehdi et al., 2002;Al-Khalifa & Askari, 2003;Rhouma et al., 2007;Rhouma et al., 2008;Haider et al., 2012). Al-Qurainy et al., (2011) investigated the molecular phylogeny of eight Saudi date palm cultivars utilizing cpDNA psbA-trnH non-coding regions. Molecular typing of chloroplast rpoBand psbA-trnH has also been studied by many authors (Yao et al., 2009;Song et al., 2009;Feng et al., 2010;Chen et al., 2010). It has been reported that rpoB and psbA-trnH loci showed low efficiency in Picea barcoding (Ran et al., 2010). Therefore, the present study was designed for chloroplast atpF-atpH spacer to evaluate genetic diversity among local date palm varieties. After analyzing the sequence data, it was found that level of polymorphism was very low in the studies date palm varieties. The nucleotide diversity of twenty-five cultivars in present study is very low than Saudi cultivars (Al-Qurainy et al., 2011) which might be due to high selection pressure by farmers in order to maintain pure breed or due to restricted distribution of date palm crop in specific area. Date palm has a long history of domestication with an unknown origin (Wrigley, 1995) and the nature of date palm culture may have an important role in the composition of date palm genomes. Apart from the tissue culture methods, the only way to maintain the genetic integrity of date palm cultivars is propagation by offshoots (Zaid & de Wet, 2002). Our results of low genetic diversity may also an indicative of offshoot propagation method by farmers as seeds with genetic recombinant embryo cause divergence among date palm population. Hence it is concluded that date palm showed high level of similarity and low genetic diversification among studied varieties. The high genetic similarity values lead us to the conclusion that they have been under high selection pressure. Eswaran et al. (2005) pointed out that a negative Tajima's D* signifies an excess of low frequency polymorphisms relative to expectation, indicating population size expansion and/or purifying selection. The observed variation pattern provides evidence that date palm trees have been undergoing rapid expansion. Fu (1993) suggests a different statistic based on the infinite sites model of mutation. He suggests estimating the probability of observing a random sample with several alleles equal to or smaller than the observed value under given the observed level of diversity and the assumption that all the alleles are selectively neutral. Fu's simulations suggest that Fs is a more sensitive indicator of population expansion and genetic draft than Tajimas D. We can resolve that Fu and Li's parameters accept the presence of background selection in the analyzed region and give evidence for primordial population expansion of the date palm varieties. The maximum likelihood substitution matrix using MEGA 6.0 shows the probability of substitution from one base to another. These changes include the substitution of a pyrimidine by a purine or a purine by a pyrirnidine (transversion). The lack of sequence variation in P. daclylifera may he due to low rates of sequence evolution and taxonomic misidentiflcation (Kress & Erickson, 2007).
Mean theta was used for estimating intraspecific divergence (Table). The lowest intraspecific divergence was for atpF-atpH (θ=0.0005). The ideal barcode should show large interspecific differentiation but low intrasspecfic divergence. Yan et al., 2011 reported that psbK-psbI had relatively low intraspecifc divergence among non-coding regions.
The the relatively high AT values in atpF-atpH spacer sequence of date palm cultivars may explain the high proportion of transversions. Base content may explain the occurrence of a relatively high proportion of transversions in view of the fact that in several substitution studies it has been found that in a situation of high AT content, the transversions occurred with a higher frequency than in a high GC context (Bakker et al., 2000) our barcoding data showed that closely related subspecies of P. dactylifera could not be separated from each other. These sister-subspecies share identical sequences for barcoding marker, which would require a search for additional barcoding markers with greater sequence polymorphism. On the other hand, use of next-generation sequencing technologies and corresponding software applications could provide the necessary resolution of subspeceis.

CONCLUSION
In this study we have demonstrated that atpF-atpH noncoding spacer could not serve as a universal DNA barcoding marker for cultivarlevel identification of phoenix dactylifera. Based on our results, it may be useful to include more coding and non-coding regions for a precise and comprehensive system of subspecies identification in P. dactylifera.