ASSESSMENT OF GENETIC DIVERSITY IN SOME WILD PLANTS OF ASTERACEAE FAMILY BY RIBOSOMAL DNA SEQUENCE

he Asteraceae (Compositae, alternate name) with its approximately 1,620 genera and more than 23,600 species is the largest family of flowering plants. The family is distributed worldwide except for Antarctica and is especially diverse in the tropical and subtropical regions of North America, the Andes, eastern Brazil, Southern Africa, the Mediterranean region, central Asia, and southwestern China. In Egypt, family Asteraceae is represented by 98 genera and 228 species, annuals, perennials, wide spread in all its phytogeographical regions. Many of these genera have economic potentialities, e.g. folk medicine (Bolous, 2002).

With the rapid development of molecular biology studies of Asteraceae germplasm identification and genetic diversity offer numerous reliable molecular marker information by means of Random Amplified Polymorphic DNA (RAPD) (Badr et al., 2012), Restriction Fragment Length Polymorphism (RFLP) (Ito et al., 2000), Inter-simple sequence repeat (ISSR) (Gharibi et al., 2011), Simple Sequence Repeat (SSR) (Simko, 2009), RAPD, ISSR and RFLP (Abd El-Tawab et al., 2010), amplified fragment length pol-ymorphisms (AFLP) (Czarnecki et al., 2008) etc.Nevertheless, DNA-labeling techniques still have many problems particularly the wild medicinal herbs such as low repeatability, high subjectivity of experimental results and unshared study data from different laboratories (Wang et al., 1999).Thus, molecular sequence markers have become a significant tool in species classification, dependent on sequencing the differences in genome directly without environmental effects (Li et al., 2010).
Accordingly, advances in DNA sequencing techniques have allowed the extensive use of short DNA fragments, especially of the ribosomal DNA (rDNA).The utility of genes coding for ribosomal RNA (rDNAs) is found in the ubiquitous presence and relative conservation of many regions of their nucleotide sequences.A cluster of rDNA (Fig. 1) consists of the gene coding for 18S rRNA (small subunit ribosomal RNA, SSU rRNA), two internal transcribed spacers (ITS1 and ITS2) separated by the 5.8S rRNA gene, and the gene coding for the 28S rRNA (large subunit of ribosomal RNA, (LSU rRNA) (Hillis and Dixon, 1991).Additional two external transcribed spacers (ETS) are located upstream of the 18S rDNA and downstream of the 28S rDNA.A nontranscribed spacer (NTS) separates adjacent copies of the rDNA repeat unit.Both spacers (ETS and NTS) are also called intergenic spacers (IGS) (Dabert et al., 2006).The major concerns with the use of the rDNA locus in taxonomic and phylogenetic analyses are the existence of polymorphisms among repeated units, which may cause extensive differentiation even within a single individual and provide useful tools for phylogenic studies (Wei et al., 2010).Accordingly, sequence comparison of the rDNA region is widely used in taxonomy and molecular phylogeny.It has typically been most constructive for variation between species, populations and even individuals (or inbred lines) as in tomato (Jo et al., 2009), rice (Chang et al., 2010), apple (Giaretta et al., 2010), andCompositae, Anthemideae (Sonboli et al., 2012).
In this study, we examined the molecular divergence of complete rDNA sequenced of ITS and IGS regions among four wild species of Asteraceae germplasm in Egypt.To our knowledge, even now there has been no report about the comparison of ITS and IGS sequences and their efficiency and utility as molecular markers in Asteraceae family.

A. Plant Materials
A total of four wild and endemic species of Asteraceae family (Echinops spinosus L., Achillea santolina L., Matricaria recutita L. and Artemisia monosperma Delile) were collected from different eco-geographical localities of the natural habitat in Sinai protected area, Egypt.In Egypt, these species is recorded as rare, endemic and neglected wild medicinal taxon.Five plants from each species were collected together as bulk materials.The list of species and their collection sites (Latitude longitude and altitude) are presented in Table (1).

DNA Isolation
Total genomic DNA was isolated from fresh leaves following the procedure as previously described by Pirttila et al. (2001).Five DNA samples of each species were dissolved together as bulk DNA.The quality and concentration of the DNA samples were checked in a UV-1601 spectrophotometer (Shimadzu, Japan) and a portion of the DNA was diluted to 50 ng/μl for use in ITS and IGS analyses.Both the stock and diluted portions were stored at -20C.

PCR amplification and sequencing of the ITS region
Complete ITS region of rDNA was amplified with universal primers ITS-1 (5' TCCGTAGGTGAACCTGCGG3') as forward primer and ITS-4 (5'TCCTCCGCTTATTGATATGC 3') as reverse primer as described by White et al. (1990).Final reaction volumes of 25 μl each contained 50 ng genomic DNA, 0.5 pmol of each primer, 0.2 mM dNTPs, 1U Taq DNA polymerase (Fermentas, Shenzhen, China), 2 µl of 10x PCR buffer sup-plied by the manufacturer and about 2.5 mM MgCL 2 .The amplification programmed consisted of pre denaturation at 94C for 4 min; 35 cycles at 94C for 45 s, 55C for 60 s, 72C for 90 s, and a final incubation at 72C for 7 min.Then the PCR products were subjected to electrophoresis on a 1.5% agarose gel, stained with ethidium bromide, and visualized under ultraviolet (UV) light.The PCR fragments of each sample were excised and purified from the gels using E.Z.N.A® Gel Extraction Kit (Omega Bio-Tek, Inc., Norcross, USA).The purified products of the PCR were ligated to pMD18-T Easy Vector using the appropriate kit (TaKaRa, Tokyo, Japan) and the ligation products were transformed into Escherichia coli DH5α competent cells.The recombinant clones were selected on Liquid Broth media plates containing ampicillin.

PCR amplification and sequencing of the IGS region
The primer pair 18S L (5'-GAACGCCTCTAAGTCAGAATCC-3') and 28S R (5'-ACTGGCAGAATCAACCAGGTA-3') was used to amplify across the IGS region of ribosomal DNA (White et al., 1990).The reaction mixture (25 μl) containing 2 µl of 10x PCR buffer, 0.2 mM dNTPs, 0.5 pmol of each primer, 1U Taq DNA polymerase (Fermentas, Shenzhen, China), and 50 ng genomic DNA template.The PCR conditions were; 95C for 5 min for initial genomic DNA denaturation; 35 cycles of 94C for 1 min, 57C for 45 s, 72C for 1 min, and final extension at 72C for 7 min.Then the PCR products of each sample were excised, purified and cloning were separated and visualized with the same procedure as for ITS.Three positive colonies from each amplified amplicons were selected for sequenced by the Uni-Gene Company (Shanghai, China).The Open Reading Frame Finder (http:// www.ncbi.nlm.nih.gov/gorf/gorf.)was used to verify the credibility of the results and their conformity.Sequence similarity was analyzed using BLAST (http://www.ncbi.nlm.nih.gov/BLAST/).

Sequence analysis
Vector sequences were cleaned and the sequences were aligned using Clustal X version 1.81 (Thompson et al., 1997) with manual adjustments wherever necessary.Gaps were positioned to minimize nucleotide mismatches.The MEGA program version 5.0 (Molecular Evolutionary Genetics Analysis, Tamura et al., 2011) was employed to estimate GC and AT contents, nucleotide substitution, nucleotide diversity (π), estimated values of transition/transversion bias (R), substitutions (r) for each nucleotide pair, and cluster analysis among the four Asteraceae germplasm.We further computed Maximum Composite Likelihood (MCL) Estimate of the pattern of nucleotide substitution according to Tamura et al. (2004).

Phylogenetic analysis
Pair-wise evolutionary distance among four Asteraceae family was determined by Kimura 2-Parameter method (K2P) (Kimura, 1980).The Maximum likelihood (ML) tree phylogenetic tree was conducted using MEGA version 5.

Sequencing analysis for ITS and IGS
In this article, we used a comparative analysis approach using several parameters like nucleotide frequency, nucleotide substitution (r), nucleotide diversity (π), and the estimated values of transition/transversion bias (R) to provide better understanding of the genetic diversity and phylogenetic relations across the studied genotypes of the Asteraceae family.The results of the confrontation between DNA sequence analysis of the isolates and GenBank database ranged from 94 to 99% similarity, through BLAST search (Table 1), supporting good credibility for ITS and IGS.The length of variation for the entire ITS (650 to 750 bp) and IGS (800 to 950 bp) regions showed very distinctive sequences for individual species.Similarly, variation was observed in the nucleotide composition of the ITS and IGS, which may be due to the sequence length variation of the analyzed markers (Table 2).
With regard to ITS sequence divergence among taxa, the averages of nucleotide frequencies were A (25%), T (24%), C (26%), and G (25%) with an average of GC (51%) and AT (49%) contents (Table 2).The highest numbers of nucleotide frequency for ITS sequence was observed in Artemisia monosperma (729 bases), whereas the lowest one was recorded in Achillea santolina L. and Matricaria recutita L. (712 base).The maximum nucleotide percentage for GC content (53%) was observed in Echinops spinosus L. However, the lowest GC content (46%) was recorded in Artemisia monosperma.Within the analysis of IGS sequence divergence among taxa (Table 2), the averages of nucleotide length were A (25.1%), T (26.7%),C (26.1%), and G (22.1%) with an averages of GC (48.2%) and AT (51.8%) contents.The highest numbers of nucleotides for IGS sequence were observed in Achillea santolina L., Artemisia monosperma, and Matricaria recutita L., (640 bases).Whereas, Echinops spinosus L. recorded the minimum number of nucleotide frequency (365 bases).The maximum nucleotide percentage for GC content (53.6%) was observed in Artemisia monosperma.In contrast, the lowest GC content (46.3%) was recorded in Achillea santolina L. The Tajima's Neutrality test (Tajima, 1989) was performed to calculate the nucleotide diversity value (π).There were a total of 754 and 667 positions across the final dataset for ITS and IGS sequences, respectively.The nucleotide diversity rate (π) was observed higher in IGS (0.60) as compared to ITS sequence (0.49) (Table 3).
Within genomes, all organisms have DNA sequences that code for ribosomal RNA (rRNA), an essential component of cellular protein synthesis machinery (Kollipara et al., 1997).Ribosomal RNA typically accounts for about 40% of all transcription within a cell, and ribosomal RNA makes up as much as 80% of cellular RNA (Moss and Stefanovsky, 1995).Owing to relatively rapid evolution, differences in sequence and/or length of rDNA are possible between closely related species of Asteraceae family (Zhao et al., 2010).The IGS sequences, as an intergenic region, may bear functional sequences, such as promoter, enhancer, transcription stop signals, and reproduction start signals (Dutta and Verma, 1990).Meanwhile, the IGS sequences undergo conversion and concerted evolution to reach a homogenization within an array of repeats.Subsequently, the intergenic spacer of the rDNA cluster evolves quickly and is highly polymorphic sequence, providing a useful tool for assessing the sequence phylogeny and genetic variability studies (Singh et al., 2008).

Phylogenetic analysis
Based on the sequence data of the flanking regions of ITS or IGS sequence, a phylogenetic tree was constructed using Maximum likelihood (ML) method (Tamura et al., 2004) (Fig. 2 and 3).Maximum likelihood tree using Kimura two parameter distances (K2P) was created among the four Asteraceae germplasm, to provide a combined graphic representation of the patterns of divergences with ITS rDNA sequence data (Fig. 2).Within the group, two strongly supported clades were clearly distinguished among the four species of Asteraceae family.With regard to the first clade, Achillea santolina L. and Artemisia monosperma were grouped together in the first clade.Within the second clade, Echinops spinosus L. and Matricaria recutita L., were included in a sister clade.With respect to IGS rDNA sequence data (Fig. 3), Achillea santolina L., was closely related to Matricaria recutita L., in the first clade, while Echinops spinosus L. shared individually with the first clade.In contrast, Artemisia monosperma was placed independently in a separate clade (clade 4).
Taxonomic characterization leading to unambiguous identification of species and varieties is critically important for conservation and sustainable utilization of the Asteraceae germplasm.In Asteraceae family, molecular phylogeny at various taxonomic levels has been examined in several earlier studies through application of isozymes and RAPD (Ayers and Ryan, 1999), AFLP (Huang et al., 2009), ISSR and RFLP (Abd El-Twab et al., 2010), SSR (Iqbal et al., 2011), as well as chloroplast DNA and rDNA markers (Sonboli et al., 2012).In context, Dai et al. (2008) found that some closely related cultivars with identical ITS sequences in rice could be clearly discriminated based on the phylogenetic tree constructed by IGS sequences.In subsequent studies, (Plovanich and Panero, 2004;Dai et al., 2008;Li et al., 2010) confirmed that IGS sequences with the fastest rate of evolution could provide more hierarchical distinctions than ITS sequences.Therefore, it was concluded that the IGS region could be more suitable for measuring genetic relationship in different cultivars of subspecies, with more informational sites than ITS sequences in the Asteraceae germplasm.

Transition and transversion
It is a well-known fact that during DNA sequence evolution the rate of transitional changes differs from the rate of transversional changes, with transitions generally occurring more frequent than transversions.This difference is often referred to as transition bias, and estimation of the extent of transition bias may be of interest (Cortey et al., 2011).In Table ( 4) the substation pattern and rates were estimated to compare the similarity matrix under the Tamura-Nei 93 test model (Tamura and Nei, 1993).The highest transition/ transversion rate ratios were recorded among IGS sequence data (k1 = 38.28,purines), (K2= 12.58, pyrimidines), respectively.Meanwhile, the lowest transition/ transversion rate ratios were observed among ITS sequence data (k1 = 2.983, purines), (K2= 2.746, pyrimidines), respectively.Moreover, the overall transition/transversion bias for IGS sequence data (R = 12.10) was superior compared to ITS sequence data (R = 1.43).This reflects that transitions are more dominant than transversion in Asteraceae germplasm across IGS sequence.This is compatible with the results of Wetzer (2001), who reported that transitions occur more frequently than trans-versions, even though for any given nucleotide position twice as many possible transversions may occur as transitions.In the context the results of Wang et al. (2011) elucidate that transitional substitutions at 3'UTR are more common than transversions and transitions are even more frequent than transversions at CpG sites compared with non-CpG sites.The Recent investigation by Kruger et al. (2012) elucidated that in a genome higher frequency of transition occurred than transversions substitutions.
In the existing study, our result from the IGS sequences confirmed the feasibility of utilizing these sequences for the study of species or intraspecies of Asteraceae germplasm than ITS sequence.Consequently, through previous results we can confirm that IGS sequence divergence seems to be the most appropriate regions as a significant molecular marker for classification, taxonomic and identification at the species level and beyond in Asteraceae germplasm.
In conclusion, the assessment of spacer length variation and rDNA polymorphisms in the rDNA genes in Asteraceae germplasm provides new insights in understanding the genetic variability among ecotypes and confirms that this is a useful region for genetic variability studies and phylogenetic relationships in Asteraceae germplasm.Despite the fact that Asteraceae germplasm is wild family, which has not yet been cultivated, its nutritional composition alone makes it an important resource.Therefore, focused research and development efforts are needed if this wild species can be raised from obscurity and improved sufficiently to contribute to the food supply in Egypt.

SUMMARY
Ribosomal DNA genes are organized in clusters of tandem repeated units, each of which consists of coding regions (18S, 5.8S and 28S) and two internal transcribed spacers (ITS), in addition to intergenic spacer (IGS) region.Accordingly this article is focused on clarifying the sequence divergence of complete rDNA of ITS and IGS regions among four wild and endemic species of Asteraceae family in Egypt.Results indicated that there were a total of 754 and 667 positions across the final dataset for ITS and IGS sequences, respectively.IGS regions were superior compared to ITS region in several parameters like nucleotide diversity rate (π = 0.60), the estimated values of transition/transversion rate ratios (k1 = 38.28,purines), (K2 = 12.58, pyrimidines) and the overall transition/transversion bias (R = 12.10), respectively.This reflects that transitions are more dominant than transversion in Asteraceae germplasm across IGS markers.Thus, it was concluded that the IGS region could be more suitable for measuring genetic relationship in different subspecies of Asteraceae, with more informative sites than ITS sequences.Generally ribosomal DNA particularly intergenic spacer of the rDNA cluster evolves quickly and is highly polymorphic, providing a useful tool for assessing genetic diversity, taxonomic and phylogenetic studies in Asteraceae germplasm.Simko, I. (2009) pyrimidines) = Transversion rate ratios.(R) = Transition/transversion bias.

Fig
Fig. (3): ML tree generated among four Asteraceae genotypes based on IGS rDNA data.

Table ( 1
): The eco-geographical localities of the four natural habitats of Asteraceae family.

Table ( 2
): The evolutionary analyses using Tajima test among ITS and IGS sequences.

Table ( 3
): Nucleotide frequencies for ITS and IGS sequence among four natural habitats of Asteraceae family.

Table ( 4
): Maximum composite likelihood estimate of the pattern of nucleotide substitution matrix for combined data of ITS and IGS sequences.Each entry is the probability of substitution (r) from one base (row) to another base (column).The rates of different transitional substitutions are shown in bold and those of transversional substitutions are shown in italics.