Development of new set of microsatellite markers in cultivated tobacco and their transferability in other Nicotiana spp

Scarcity of molecular markers in tobacco has been a limitation, hampering the acceleration of breeding efforts. Development of microsatellite markers is a prerequisite for mapping, tagging of many useful qualitative and quantitative traits and also for the generation of saturated linkage map. Use of microsatellite-enriched genomic libraries an efficient and rapid method for the identification of clones harboring microsatellite motifs leading to the development of microsatellite markers. In the present study, a total of 111 microsatellite motifs was identified from the enriched library, of which, 70 motifs (which includes perfect and imperfect repeat) were used for marker development. These newly developed markers could successfully differentiated different types of tobacco and diverse cultivars of Flue Cured Virginia (FCV) tobacco. The high rate of transferability (95-7% 100%) of these microsatellite markers in a wide range of Nicotiana species indicated their potential as viable resources in the inter-specific gene transfer programme. The set of microsatellite markers developed in this study is a valuable addition to the already available DNA marker resources in tobacco.


Background
Tobacco (Nicotiana tabacum L) is one of the major commercial crops, belonging to the family Solanaceae, which contains many genuses, including Nicotiana. Several efforts are being undertaken to improve this crop, which could not be accelerated due to the non-availability of sufficient number of molecular markers. It is necessary to develop adequate molecular markers for reducing the time, effort and cost in breeding programs like gene targeting, selection and pyramiding. These molecular genetic markers have become useful tools to provide a relatively unbiased estimation of genetic diversity in plants. PCR-based markers like randomly amplified polymorphic DNA (RAPDs), amplified fragment length polymorphism (AFLP) and simple sequence repeats (SSRs) have an apparent advantage as cultivar descriptors since they are unaffected by environmental or physiological factors (Bowditch et al., 1993;Khan et al., 2008). Among the different types of molecular markers, SSRs are more abundant, ubiquitous in presence, hyper variable in nature and exhibit high polymorphism (Gupta et al., 1996;Senan et al., 2014). Moreover, they are highly reproducible, easy to apply, less cost, less laborious and hence used for cultivar identification in many crop plants such as barley (Russel et al., 1997;Maniruzzaman et al., 2014), soybean (Song et al., 1999;Bisen et al., 2015), potato (Barandalla et al., 2006;Carputo et al., 2013), Solanum melongena (Behera et al., 2006), rice (Yadav et al., 2008;Molla et al., 2015) and jute (Khan et al., 2008). Microsatellites have been shown to be almost twice as informative as dominant markers (RAPD and AFLP) and much more informative than RFLPs in soybean (Powell et al., 1996) and approximately six times more informative than RAPD (Rajora and Rahman, 2003). In addition to their use in determining the genomic structure and organization, these markers are also useful for the development of improved variety through marker aided selection. Through some microsatellite markers were reported by few earlier studies in tobacco (Bindler et al., 2007;Tong et al., 2012;Hughes et al., 2014), their number is limited as compared to other commercial crops, thereby demanding the development of more markers.
Generally, tobacco cultivars are distinguished by Molecular Plant Breeding 2015, Vol.6, No.15, 1-13 http://mpb.biopublisher.ca morpho-physio-chemical characters, but this method is slow, unrealistic and identification based on morphological traits is prone to environmental variations (Nelson, 1985). At the same time, a low level of genetic diversity exists within and among cultivated tobacco types (Ren and Timko, 2001). Cultivars that are closely related or less diverse cannot be easily distinguished by morphological indices (Degani et al., 1998). Accurate identification of cultivars has become increasingly important in realm of Plant Breeders Rights and cultivar registration. Hence, the molecular marker-based diversity analysis among the cultivars will provide the precise data for cultivar identification. The frequency of microsatellites in plant genomes is low as compared to animal genomes (Maguire et al., 2000;Squirrell et al., 2003) and this feature makes the isolation of microsatellites, a technically demanding task. With an objective of developing more SSR markers, the present study helped in the identification of microsatellite motifs using microsatellite enriched library, developed 70 SSR markers and demonstrated their use in the understanding of genetic diversity among different types of tobacco. In addition to this, transferability of these markers across different species of Nicotiana was tested and a minimal set of markers that could differentiate various species of Nicotiana was identified.

Identification of microsatellites loci from the tobacco enriched library
Screening of microsatellite-enriched library of N. tabacum cv. Jayasree resulted in the identification of 1152 putative recombinants, which were then sequenced. Good quality sequences were obtained for 946 clones and all sequences (100%) contained microsatellite motifs. However, only 224 sequences were considered for designing primers. Considering the 224 clones possessing class I and class II SSRs as described by Mc Couch et al., 2002, AG/TC repeats were found to be abundant (56 clones, 25.3%) followed by CT/GA (38 clones, 17.2%), AC/TG (25 clones, 11.3 %), GT/CA (20 clones, 9.0%) and rest of the clones contained AG and CT compound repeats. The repeat length was highly variable at these loci with maximum for TbM13 with (GA) 29 and TbM22 with (AGA) 24 . These sequences were submitted to NCBI Genbank and details of the sequence with accession numbers were tabulated (Table 2).Seventy motifs containing class I SSRs were targeted for marker development, of which, thirty nine (55.8%) were perfect repeats [(TC/AG) n , (GT/CA) n , (GA/CT) n and (AC/TG) n, ], five (7.1%) were compound repeats and twenty six (37.1%) were interrupted repeats. The homology searches with the primers of these markers against the primers of markers reported earlier by Bindler et al., 2011;Tong et al., 2012 showed no significant homology.

Diversity among Flue Cured Virginia (FCV) tobacco
All the 70 SSR markers had shown clear and robust amplification with expected product size. Further, these markers were used in the diversity analysis of 24 FCV varieties of N. tabaccum. The pair-wise similarity measures among the tested varieties ranged from 0.54 to 0.91 revealing a broad genetic base ( Figure 1). The varieties were mainly grouped into two main clusters. Cluster-A consisted of varieties developed indigenously by different breeding methods, while Cluster-B consisted of exotic introductions and their derivatives. The mean genetic similarity among the indigenously developed cultivars was 0.72.

Characterization of different types of tobacco:
The microsatellite marker analysis of different types of Indian tobaccos revealed that the varieties belonging to the same category were grouped together. There were different clusters for each tobacco type and the clustering pattern supports the traditional classification, except for the grouping of GC-2 (chewing type) and Dharla (Hookah type) in the same   Figure 2). The genetic diversity analysis of 12 accessions from eight types of cultivated tobacco using 35 microsatellite markers revealed that all markers were polymorphic exhibiting an average polymorphic information content (PIC) of 0.55 (range 0.15-0.99) ( Table 3). A total of 260 alleles were produced by these loci with an average of 7.4 alleles per marker in the size range of 150 to 500 bp. Among these markers, TbM28 had the highest PIC value of 0.99 followed by TbM9, TbM10, TbM14, TbM16 and TbM30 (0.97). It was observed that the mean PIC value was higher for markers developed by targeting perfect repeats (0.57), while it was lower for markers targeting imperfect repeats (0.19). The markers with higher PIC values will be useful for diversity analysis, varietal identification, mapping of traits, etc. The type specific markers identified in the present study would be useful in testing of purity of different types of tobaccos (Table 4).

Understanding the diversity among the genus Nicotiana
A total of 455 alleles was obtained from all 70 markers amplified using 24 Nicotiana spp. with an average of 6.4 alleles per locus (Table 2). A maximum of nine alleles was detected for four markers (TbM46, TbM52, TbM56 and TbM59) while a minimum of four alleles was detected for TbM4 with the allele size ranging from 150 to 700 bp.
All the 70 markers were showed high observed heterozygosity (H o ) than expected heterozygosity (H e ). The expected heterozygosity ranged from 0 to 0.38 (mean 0.18) while the observed heterozygosity ranged from 0 to 0.50 (mean 0.43) ( Table 2). The markers Figure 2 Characterization of different types of tobacco using SSR markers. revealed that the Nicotiana species studied were separated into three main clusters (Figure 3). The cluster-1 is formed by the 11 species; all the species are belongs to the subgenus Rustica and Petunoides.
In this cluster the species gluaca and paniculta formed small sub cluster (Ia), similarly, thrysiflora, rustica and undulate formed sub cluster (Ib). Among the cluster-1, rependa and nesophila were showed higher similarity (91%), since both are belong to the section Rependae and have the similar chromosome number 24. The second cluster formed with 7 species of Nicotiana, in which all the species are belongs to the subgenus Petunioides and section Suaveolentes except N. tabaccum which belongs to the subgenus Tabacum. The third cluster is formed by 6 species, among those 3 species belongs to the section Suaveolentes and rest are belongs to the different sections of subgenus Petunoides.

Transferability of tobacco SSRs to other Nicotiana species
Amplification of the 70 SSRs in 23 species of Nicotiana revealed an amplification of at least one allele and the cross transferability of these markers ranged from 95.7% to 100% (Table 5). All the markers developed in the present study were transferable to eight species and 26 markers showed 46 exclusive alleles. However, no exclusive alleles were identified for N. undulata, N. nesophila and N. debanyi (Table 4).

Discussions
Microsatellite markers are considered as versatile tools for genetic analysis and breeding applications  due to their abundance in the genome, high degree of variability and reduced time, effort and cost etc. However, the availability of microsatellite markers is limited in tobacco. Although many molecular markers were recently reported (Bindler et al., 2007;, Tong et al., 2012Hughes et al., 2014), they are not sufficient enough to use in inter specific hybridization programs and to develop a core genetic map like rice (Orjuela et al., 2010). Hence, there is an immense need to develop additional markers for the construction of dense genetic map, which would be the starting point for genetic mapping and eventual cloning of important genes from this crop. Therefore, we have employed a genomic enrichment method to capture the SSR motifs for the development of new microsatellite markers. Earlier, Sethy et al., 2006, developed 74 functional sequence-tagged microsatellite sites (STMS) primer pairs in chickpea by using genomic enrichment method. However, no attempt has been made to isolate microsatellite markers from the tobacco genome using the enrichment procedure, though this provides an attractive choice for targeted microsatellite development.
The enrichment protocol used in the present study has resulted in the identification of clones containing the SSR motifs with 82% success rate, which is significantly higher than any conventional methods exhibiting varying efficiency ranging from 0.045% to 12% (Zane et al., 2002). The genomic enrichment method for capturing SSRs is fast and convenient to handle as compared to the conventional method involving the genomic library preparation and sequencing. Using the enrichment method, high efficiency of microsatellite identification was also achieved in several crops (Gaitan-Solis et al., 2002;Riaz et al., 2004;Lowe et al., 2004) with varying success rates. By employing microsatellite-enrichment procedure, Marinoni et al., 2003, obtained 23% of clones containing SSR motif in chestnut, while Acquadro et al., 2005, obtained 85% in Cynara cardunculus. All the clones harboring SSR motif were made publicly available through GenBank submissions (Table 2).
More number of dinucleotide repeats (AG/TC) was observed than any other repeats, Which supported the fact that di-nucleotide repeats are the most frequently occurring microsatellites in plants (Lagercrantz et al., 1993;Thomas and Scott, 1993;Wang et al., 1994). However, a significant proportion (35.7%) of tri-and hexa-nucleotide repeat motifs were also observed, which could have resulted due to the chance cross hybridization of dinucleotide probes with other repeat region of the genome. Sequence analysis of 952 clones yielded 221 sequences with class I and II SSR motifs (McCouch et al., 2002). About 70 class I microsatellite loci were targeted for marker development.
All the identified SSR motifs showed more observed heterozygosity than expected heterozygosity. Both perfect and imperfect SSR motifs showed less variation of mean H e and H o values. Whereas the mean H o for diand tri-nucleotide repeats were also found to be slight difference. The results with more number of alleles amplified per locus (6.4) combined with the mean observed heterozygosity (Ho) values; suggest that considerable polymorphism is present at these microsatellite loci. The same results were represented in the earlier studies, reported in chickpea (Sethy et al 2006). The highest heterozygosity detected at SSR loci, is potentially meaningful because high heterozygosity would indicate that the plant population likely has a substantial amount of adaptive genetic variation to The occurrence of null alleles has been pointed out as a possible problem associated with the use of microsatellite markers (Callen et al., 1993), which might result in an individual being scored as homozygote instead of heterozygous and thus resulting in loss of information. Our results indicated that out of 70 markers, 19 markers showed less positive values for the frequency of null alleles, might be regarded as loci with a possible occurrence of null alleles (O'Reilly and Wright 1995). On the other hand, the negative value for the frequency of null alleles (r) observed in most loci might associate the homozygotes with negative characters that have been eliminated by humans in cultivated germplasm (Marinoni et al. 2003). In the present study most of the markers showed (51 out of 70 ) negative values for the frequency of null allele occurrence, it indicates that developed markers in the present study are potential tools for the differentiation of tobacco species without losing any information.
In the present study, the probability of identity was moderately high, which represents the power of each microsatellite to contribute to a unique genotype for each species (Dangl et al., 2005). Polymorphic Information Content (PIC) provides an estimate of discriminatory power of a marker to differentiate genotypes based on both the number of alleles expressed and their relative frequencies (Nagal et al., 2010). The average of PIC value was 0.57 for perfect repeats where as for imperfect repeats 0.19, indicates an isolation of highly polymorphic microsatellites in Nicotiana species. These polymorphic markers were should provide sufficient level of genetic diversity to investigate the fine scale population structure and evaluate the breeding strategy (Xu et al., 2009).
In the present study, we have used microsatellite markers for revealing polymorphism among the different Nicotiana species and also assessed for their cross-species transferability.
All the species of Nioctiana principally divided based on the subgenus to which they belong and the number of chromosomes. The species of subgenus Petunoides have 9 sections, mostly species belong to the same section, were grouped together. For instance, two species of Rependae section and two species of Alatae section were grouped in the same cluster (Ib) with higher percentage of similarity. Similarly 6 species of section Suaveolentes used in the present study were grouped in cluster II.
The clustering pattern of the tobacco species based on the nuclear SSR profile supports the traditional classification of genus Nicotiana with exception of grouping of N. rustica and N. undulate. Among the subgenus Petunioides, the section Undulatae and Trigonophyllae are considered to be close to the subgenera Rustica, so the species N. undulata was grouped along with the species belongs to the subgenus Rustica as these two species shares the same number (12) of chromosomes (Khan and Narayan, 2007). The microsatellite-based dendrogram was congruent with existing sectional representations based on morphological, chromosome and DNA information (Moon et al., 2008). In general, the transferability rates observed in this study were higher than those reported earlier (Moon et al., 2008), which could be due to high rates of conservation of primer binding sites. A minimal set comprising of 11 markers was constituted for the identification of most of the Nicotiana species. However, it should be pointed out that only one accession was used for each species, as the objective was to demonstrate the feasibility of these markers for studying genetic variation and relationships at species level and a detailed study involving more number of accessions for each species is warranted to validate these markers. Perfect clustering of genotypes belongs to the all the eight tobacco types were observed with these markers. Hence, the type specific markers identified in the present study for various tobacco types as well as various species specific markers belongs to Nicotiana would be useful in testing of purity of different types of tobaccos, studying the genetic variation among varieties, true hybrid identification, monitoring introgression of the target gene(s) and relationships between different tobacco types within cultivated tobacco using the cultivars representing a wide range of morphological diversity present within N. tabacum. The devloped markers alsoo exhibited high PIC values, suggesting their potential for genetic polymorphism In conclusion, a set of 70 new microsatellite markers was developed in tobacco through microsatelliteenrichment method with high success rate, highlighting the efficiency of this method for the isolation of SSR motifs in tobacco as well as other crop species, for which genome sequence is not available. These markers were characterized in tobacco and in other Nicotiana species, thereby demonstrating the utility of these markers in genetic diversity assessment. The study also demonstrated the cross-species transferability of these markers to other Nicotiana spp resulting in the identification of species-specific markers. The microsatellite-enrichment technique used for the capturing of SSRs in tobacco and the markers developed in this study will be useful for fingerprinting, genetic diversity analysis, linkage map construction and genetic mapping studies in tobacco.

Material and Methods
Genomic DNA was extracted from fresh leaves of N. tabacum cv. Jayasree using CTAB method (Murray and Thompson, 1980). Around 25 µg genomic DNA was taken in a nebulizer and passed through inert Nitrogen gas to fractionate into 0.1-1 kb size fragments. The nebulized fragments were run on the gel and the desired size range of fragments was eluted. These fragments were polished to make blunt ends and adapters were ligated to them. Four biotinylated SSR probes [(TC) 15 , (GT) 15 , (GA) 15 , (AC) 15 ] rich in other plant species, were hybridized with adapter-ligated fragments. Approximately 200 ηg of the fragments (100 bp-1000 bp) were added to a single reaction mixture containing 4.2 × SSC (Saline-Sodium Citrate, pH 7.0), 0.07 % SDS (sodium dodecyl sulfate), and 10 ρmol biotinylated probe. The mix was incubated at 95°C for 5 minutes and chilled quickly on ice for 2 minutes. It was then kept at proper annealing temperatures (depending on the melting temperature of each probe) for an hour to perform annealing [60°C for probe (GT) 15 , 37°C for (AT) 15 and 57°C for (GA) 15 ]. Meanwhile, Dynabeads M-280 Streptavidin (10 µg/µl) was prepared by gently shaking the vial to obtain a homogenous slurry. About 20 µl of the beads slurry was transferred to a 1.5 ml tube and washed four times with 300 µl of bead washing buffer (1x TE + 100 mM NaCl). The beads were re-suspended with 50 µl of the same buffer, added into the fragment-probe mix and incubated at room temperature for 30 minutes with constant gentle agitation. After immobilization, the supernatant was removed by applying a magnetic field to precipitate the beads, which were attached to the SSR containing fragments that hybridized to biotinylated probe. The bead-probe-fragment complex was washed three times each for 5 minutes with 400 µl non-stringency washing buffer (1 x TE + 1 M NaCl) at room temperature. The complex was further washed with 400 µl of stringency buffer (0.2 x SSC + 0.1% SDS) for three times each for 5 minutes at room temperature. After the final wash, the washing buffer was completely removed and 40 µl of sterile water was added, tapped gently and incubated at 95°C for 5 minutes. The eluted solution containing single strand SSR-enriched fragments was cloned into the pGEM-T easy vector, and the recombinants were transformed to DH 10 B E. coli electro-competent cell by electroporation.

Microsatellite identification and primer design
The enriched library was screened with pooling strategy by picking up individual clones from the agar plate and kept into 96-well format plates for further growth and subsequent isolation of plasmid DNA. The DNA of all clones was isolated by using 96-well format Qiagen DNA extraction kit (Qiagen, USA). The isolated DNA was pooled into row and column-wise and screened by PCR using anchored PCR. The positive pools were screened by demerging into individual samples. The plasmids were sequenced using the Big Dye Terminator reaction kit on the ABI 3700 prism automated DNA sequencer (Applied Biosystems, USA). The microsatellite motif in each sequence was identified using the software SSRIT and each sequence was compared against the local and NCBI sequence databases in order to identify the redundant clones. The unique sequences were retained and submitted to GenBank. Primers were designed based on the sequences flanking the microsatellite motifs using the PRIMER 3.0 software (Rozen and Skaletsky, 1998) using the following criteria: (1) Primer length 18-25 nucleotide with an optimal length of 20 nucleotides (2) primer Tm=50-60 0 C (3) amplified PCR product size 100-400 bp, and (4) an optimal 40% GC content. The presence of structures such as hairpin or short repeat motifs was also considered while designing the primers. All primers, which flank the perfect repeats of the form (N1N2)X or (N1N2N3)X; imperfect repeats of the form N1N2 N1N2N2N1N2N2N2N1N2; interrupted repeats of the form (N1N2)X(N)Y(N1N2)Z; and compound repeats of the form (N1N2)X(N3N4)Y were synthesized from IDT, Canada and they were standardized with gradient PCR to obtain the perfect annealing temperature to avoid non-specific amplification.

Testing of microsatellite markers
Each primer pair was tested by the amplification of genomic DNA isolated from the tobacco cultivar, Jayasree. PCR amplification was done in 25 µl reaction volume containing 200 mM Tris-HCl (pH 8.0), 500 mM KCl, 2 mM MgCl 2 , 0.25 mM of each dNTPs, 0.5 µM of each primer, 25 ng of genomic DNA and 1.0 U of Taq DNA polymerase (MBI Fermentas, Lithuania). Reactions were carried out in thermal cycler following the thermo-profile of an initial denaturation at 94º C for 5 min followed by 35 cycles of 94º C for 1 min, 55-57º C for 1min, 72º C for 2 min and a final extension of 7 min at 72º C. Amplified products were resolved on 6% polyacrylamide gels and stained with ethidium bromide. Initially, all the 70 primer pairs were used to amplify the DNA from 12 different tobacco types (Table 1) and one wild species of Nicotiana in order to establish their usefulness in the detection of intra-as well as inter-species-specific polymorphism. In order to check the transferability of these markers to other Nicotiana species, these primer pairs were further checked with 24 wild species of Nicotiana (Table 1).

Data analysis
The size of the alleles for each microsatellite locus was estimated by comparison with standard size DNA markers and scored across all the 24 Nioctiana species as well as genotypes from different types of cultivated tobacco using the gel documentation system (Alpha Innotech Corp., USA). A binary matrix was developed, in which 1 (one) represents the presence of an amplicon and 0 (zero) represents the absence of an amplicon. Genetic parameters like alleles per locus, the expected heterozygosity (H e ) were calculated as H e = 1 -∑ pi 2 (Nei 1973), where pi represents the frequency of allele i among the varietal set. The observed heterozygosity (H o ) was obtained by direct calculation. The null allele frequency was calculated as r = (H e -H o )/(1 + H e ) (Brookfield, 1996). The probability of identity was calculated as PI = 1 -∑pi 4 + ∑∑ (2pipj) 2 (Paetkau et al., 1995), where pi and pj represent the frequency of alleles i and j, respectively. Pair-wise genetic similarity was calculated among the 24 accessions of wild species as well as 12 genotypes belonging to different types of tobacco using Jaccard's similarity coefficient. The similarity matrix was used for the construction of dendrogram using the UPGMA (Unweighted Pair Group Method with Arithmetic Averages) algorithm using the software, NTSYS-pc (version 2.1, Rohlf, 1998).

Authors Contribution
MS designed all experiments, carried out the major work and prepared Manuscript KS carried out the validation assays of markers KG developed microsatellite enriched library, BV participated in participated in the data analysis. TGK helped in Nicotiana species diversity analysis and BU: participated in data and statistical analysis. All authors read and approved the final manuscript.